re 和 pandas,重塑列表

zsad512

我有一个列表格式为:

testing_set = ["001,P01", "002,P01,P02", "003,P01,P02,P09", "004,P01,P03"]

我曾经re这样重新格式化列表:

[in] test_set1 = [ re.split(r',', line, maxsplit=5) for line in testing_set]

[out] ["001","P01"]

如何创建索引为 (transaction_id)“001,002,003,004”的数据框,并且每行的 p 值都列在列 (product_id) 中。

DJK

可以这样做,

testing_set = ["001,P01","002,P01,P02","003,P01,P02,P09","004,P01,P03"]

test_set1 = [re.split(r',', line, maxsplit=1) for line in testing_set]
#change maxsplit to 1______________________^

df =pd.DataFrame(test_set1,columns=['transaction_id','product_id'])
df.set_index(['transaction_id'],inplace=True)
df['product_id'] = df['product_id'].apply(lambda row: row.split(','))

这为您提供了这样的数据框

                     Product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章