Python：使用基于嵌套列表中唯一值的列创建熊猫数据框

流努文

我有一个嵌套列表，其中包含每个样本的各个区域。我想制作一个数据框，使每一行（样本）具有或不存在相应区域（列）。例如，数据可能如下所示：

region_list = [['North America'], ['North America', 'South America'], ['Asia'], ['North America', 'Asia', 'Australia']]

最终数据帧将如下所示：

North America    South America     Asia     Australia
1                0                 0        0
1                1                 0        0
0                0                 1        0
1                0                 1        1

我想我可能会想出一种使用嵌套循环和追加的方法，但是还有更多的pythonic方式可以做到这一点吗？也许与numpy.where？

海盗

pandas
str.get_dummies

pd.Series(region_list).str.join('|').str.get_dummies()

   Asia  Australia  North America  South America
0     0          0              1              0
1     0          0              1              1
2     1          0              0              0
3     1          1              1              0

numpy
np.bincount 与 pd.factorize

n = len(region_list)
i = np.arange(n).repeat([len(x) for x in region_list])
f, u = pd.factorize(np.concatenate(region_list))
m = u.size

pd.DataFrame(
    np.bincount(i * m + f, minlength=n * m).reshape(n, m),
    columns=u
)

   North America  South America  Asia  Australia
0              1              0     0          0
1              1              1     0          0
2              0              0     1          0
3              1              0     1          1

定时

%timeit pd.Series(region_list).str.join('|').str.get_dummies()
1000 loops, best of 3: 1.42 ms per loop

%%timeit
n = len(region_list)
i = np.arange(n).repeat([len(x) for x in region_list])
f, u = pd.factorize(np.concatenate(region_list))
m = u.size

pd.DataFrame(
    np.bincount(i * m + f, minlength=n * m).reshape(n, m),
    columns=u
)
1000 loops, best of 3: 204 µs per loop

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-6

我来说两句

0 条评论

登录后参与评论

大熊猫数据框的两列的唯一值

Python熊猫子集第x列的值基于y列中的唯一值

Python：使用基于嵌套列表中唯一值的列创建熊猫数据框

Python：使用基于嵌套列表中唯一值的列创建熊猫数据框

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序