这是带有示例数据的数据框:
df = pd.DataFrame({'KEY': ['1','2','3'], 'RECORD': ['1','1','1'], 'SERIAL': ['1470','2321','300'], 'REMARKS': ['FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU','I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I DON\'T LIKE FRUIT[CANTALOPE,HONEYDEW]', 'THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234']})
我需要将水果提取到与KEY,RECORD和SERIAL相关联的新数据框中。完成后应如下所示:
df = pd.DataFrame({'KEY': ['1','1','1','2','2','2','2','2','3','3','3'], 'RECORD': ['1','1','1','1','1','1','1','1','1','1','1'], 'SERIAL': ['1470','1470','1470','2321','2321','2321','2321','2321','300','300','300'], 'FRUIT': ['APPLES','ORANGES','PEARS','BANANAS','CHERRIES','GRAPES','CANTALOPE','HONEYDEW','LEMONS','ORANGES','GRAPEFRUIT'], 'CODE': ['null','null','null','null','null','null','null','null','1234','1234','1234']})
从我完成的研究来看,看起来可以使用str.split和/或str.extract,但是我不确定如何将每个水果与KEY,RECORD和SERIAL匹配。最重要的是,最后一个记录为“ @ 1234”。还需要提取该信息并将其与之前列出的3种水果相匹配。
我猜这个过程的第一步是提取水果,这应该很容易,因为它们都在字符串中。
关于如何解决这个问题有什么建议吗?
谢谢!
尝试这个:
df['FruitList'] = df['REMARKS'].str.extract('\[(.+?)\]').squeeze().str.split(',')
df['CODE'] = df['REMARKS'].str.extract('@\s(\d+)')
df.explode('FruitList')
输出:
KEY RECORD SERIAL REMARKS FruitList CODE
0 1 1 1470 FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU APPLES NaN
0 1 1 1470 FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU ORANGES NaN
0 1 1 1470 FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU PEARS NaN
1 2 1 2321 I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D... BANANAS NaN
1 2 1 2321 I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D... CHERRIES NaN
1 2 1 2321 I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D... GRAPES NaN
2 3 1 300 THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234 LEMONS 1234
2 3 1 300 THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234 ORANGES 1234
2 3 1 300 THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234 GRAPEFRUIT 1234
如果您愿意,可以删除“备注”:
df.explode('FruitList').drop('REMARKS', axis=1))
输出:
KEY RECORD SERIAL FruitList CODE
0 1 1 1470 APPLES NaN
0 1 1 1470 ORANGES NaN
0 1 1 1470 PEARS NaN
1 2 1 2321 BANANAS NaN
1 2 1 2321 CHERRIES NaN
1 2 1 2321 GRAPES NaN
2 3 1 300 LEMONS 1234
2 3 1 300 ORANGES 1234
2 3 1 300 GRAPEFRUIT 1234
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句