我有一个列表格式为:
testing_set = ["001,P01", "002,P01,P02", "003,P01,P02,P09", "004,P01,P03"]
我曾经re
这样重新格式化列表:
[in] test_set1 = [ re.split(r',', line, maxsplit=5) for line in testing_set]
[out] ["001","P01"]
如何创建索引为 (transaction_id)“001,002,003,004”的数据框,并且每行的 p 值都列在列 (product_id) 中。
可以这样做,
testing_set = ["001,P01","002,P01,P02","003,P01,P02,P09","004,P01,P03"]
test_set1 = [re.split(r',', line, maxsplit=1) for line in testing_set]
#change maxsplit to 1______________________^
df =pd.DataFrame(test_set1,columns=['transaction_id','product_id'])
df.set_index(['transaction_id'],inplace=True)
df['product_id'] = df['product_id'].apply(lambda row: row.split(','))
这为您提供了这样的数据框
Product_id
transaction_id
001 [P01]
002 [P01, P02]
003 [P01, P02, P09]
004 [P01, P03]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句