根据字典列表中的元素创建熊猫数据框

shibby 发表于 Dev

卑鄙的

我必须遍历包含字典列表的一列（在现有数据框中）的行，然后从那里的数据中创建两个新的数据框。这些列表之一的一般形状如下所示：

[
 {"a": 10, "type": "square"}, {"type": "square", "b":11}, 
 {"type": "square", "c": 12}, {"d": 13, "type": "square"},
 {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, 
 {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, 
 {"d": 18, "type": "circle"}, {"type": "circle", "e": 19}
]

我有成千上万的人，想要创建两个新的数据框，一个用于圆形，一个用于正方形，从而导致其第一行大致如下所示的数据框：

      type    a  b  c  d  e
0    square   10 11 12 13 14

到目前为止，我已经尝试将整个过程转换为json，该方法可以正常运行，但似乎改变了数据框的性质，因此无法再对其进行操作了。json还创建了一个具有多行的数据框（每个元素一个），而我无法将数据框“展平”到一个键上（在本例中为“类型”）。

我也试过DataFrame.from_records，DataFrame.from_dict以及使用中的大熊猫，没有运气的数据读取各种类似的其他方式。

编辑：抱歉，不清楚，上面的词典示例位于现有数据框的“单元”中，我认为我要寻找的第一步涉及从该“单元”中提取它。到目前为止，我已经尝试了各种方法将对象转换为可用的对象（例如上面的列表），但是没有成功。我将需要创建变量以使其看起来像这样my_list = df.column[0]，例如，然后我可以遍历行。

海盗

让l是你的词典列表

l = [
 {"a": 10, "type": "square"}, {"type": "square", "b":11}, 
 {"type": "square", "c": 12}, {"d": 13, "type": "square"},
 {"type": "square", "e": 14}, {"a": 15, "type": "circle"}, 
 {"type": "circle", "b": 16}, {"type": "circle", "c": 17}, 
 {"d": 18, "type": "circle"}, {"type": "circle", "e": 19}
]

然后让我们定义一个序列s为该列表的10行

s = pd.Series([l] * 10)
print(s)

0    [{'type': 'square', 'a': 10}, {'type': 'square...
1    [{'type': 'square', 'a': 10}, {'type': 'square...
2    [{'type': 'square', 'a': 10}, {'type': 'square...
3    [{'type': 'square', 'a': 10}, {'type': 'square...
4    [{'type': 'square', 'a': 10}, {'type': 'square...
5    [{'type': 'square', 'a': 10}, {'type': 'square...
6    [{'type': 'square', 'a': 10}, {'type': 'square...
7    [{'type': 'square', 'a': 10}, {'type': 'square...
8    [{'type': 'square', 'a': 10}, {'type': 'square...
9    [{'type': 'square', 'a': 10}, {'type': 'square...
dtype: object

现在，我将定义一个函数，该函数使用字典理解来将列表重新排列为更可口的内容pd.Series。实际上，字典的键将是tuples，从而使产生的系列的索引为a pd.MultiIndex。这将使以后更容易分成两个单独的数据帧。

def proc(l):
    return pd.Series(
        {(li['type'], k): v for li in l for k, v in li.items() if k != 'type'})

现在我用 apply

df = s.apply(proc)
df

  circle                 square                
       a   b   c   d   e      a   b   c   d   e
0     15  16  17  18  19     10  11  12  13  14
1     15  16  17  18  19     10  11  12  13  14
2     15  16  17  18  19     10  11  12  13  14
3     15  16  17  18  19     10  11  12  13  14
4     15  16  17  18  19     10  11  12  13  14
5     15  16  17  18  19     10  11  12  13  14
6     15  16  17  18  19     10  11  12  13  14
7     15  16  17  18  19     10  11  12  13  14
8     15  16  17  18  19     10  11  12  13  14
9     15  16  17  18  19     10  11  12  13  14

从这一点上，我可以很容易地分配我的2个数据帧

circle = df.circle
square = df.square

替代方法
除了使用apply之外，我们还可以对s

df = pd.DataFrame(
    {k: {(li['type'], k): v
         for li in l
         for k, v in li.items() if k != 'type'}
     for k, l in s.iteritems()}
).T

时机
多元理解方法似乎更快

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-05-23

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

根据字典列表中的元素创建熊猫数据框

根据字典列表中的元素创建熊猫数据框

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序