我有一个Pandas数据框,其中包含一列带有字符串列表的列。
>>> df.head()
genre
0 [Comedy, Supernatural, Romance]
1 [Comedy, Parody, Romance]
2 [Comedy]
3 [Comedy, Drama, Romance, Fantasy]
4 [Comedy, Drama, Romance]
我该如何为列表中的每个值分配一个唯一的ID,该ID在整个列中都相同?
>>> df.head()
genre
0 [1, 2, 3]
1 [1, 4, 3]
2 [1]
3 [1, 5, 3, 6]
4 [1, 5, 3]
复杂的是,我们要处理一列列表。我们可以通过首先展开行来稍微提高性能。然后使用factorize
并返回原始格式:
v = df['genre'].explode()
v[:] = pd.factorize(v)[0] + 1
df['genre2'] = v.groupby(level=0).agg(list)
df
genre genre2
0 [Comedy, Supernatural, Romance] [1, 2, 3]
1 [Comedy, Parody, Romance] [1, 4, 3]
2 [Comedy] [1]
3 [Comedy, Drama, Romance, Fantasy] [1, 5, 3, 6]
4 [Comedy, Drama, Romance] [1, 5, 3]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句