将DataFrame附加到索引熊猫

Kartik 发表于 Dev

卡蒂克

我的数据中有很多嵌套。我有6个时间段（但不必担心），每个时间段有19个分位数，每个分位数都有一个51x51的协方差矩阵（适用于美国所有州和美国哥伦比亚特区）。如果以字典的形式表示，我将有：

my_data = {'time_pd_1' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
           'time_pd_2' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)},
            ...
           'time_pd_6' : {0.05 : pd.DataFrame(data=cov_var(data_for_0.05), columns=states, index=states),
                         {0.10 : pd.DataFrame(data=cov_var(data_for_0.10), columns=states, index=states),
                          ...
                         {0.90 : pd.DataFrame(data=cov_var(data_for_0.90), columns=states, index=states),
                         {0.95 : pd.DataFrame(data=cov_var(data_for_0.95), columns=states, index=states)}}

足够简单，但是不会像这样创建数据。我有两个for循环来完成这项工作：

for tpd in time_periods:
    for q in quantiles:
        tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)

如果要打印，tdf它看起来像这样：

ST              Alabama         Alaska          Arizona         ...     West Virginia   Wisconsin   Wyoming
ST                                                                                                             
Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
...             ...             ...             ...             ...     ...             ...         ...
West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619

现在，我想要的是：

cov = {}
for tpd in time_periods:
    cov[tpd] = pd.DataFrame(index=[str(round(q,2)) for q in quantiles])
    for q in quantiles:
        tdf = pd.DataFrame(data=cov_var(data_for_q), index=states, columns=states)
        cov[tpd].loc[str(round(q,2)), :] = tdf

因此，如果我打印cov[tpd]出来，它应该看起来像：

        ST              Alabama         Alaska          Arizona         ...     West Virginia   Wisconsin   Wyoming
q       ST                                                                                                             
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.05    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.10    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
...     ...             ...             ...             ...             ...     ...             ...         ...
...     ...             ...             ...             ...             ...     ...             ...         ...
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.90    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619
        Alabama         288.867628      50.000000       -100.062576     ...     37.719317       0           -75.000000
        Alaska          50.000000       280.929272      -229.365427     ...     57.514555       0           -136.365512
        Arizona         -100.062576     -229.365427     946.563177      ...     -113.805612     0           291.897723
0.95    ...             ...             ...             ...             ...     ...             ...         ...
        West Virginia   37.719317       57.514555       -113.805612     ...     342.195976      0           -214.243277
        Wisconsin       0.000000        0.000000        0.000000        ...     0.000000        0           0.000000
        Wyoming         -75.000000      -136.365512     291.897723      ...     -214.243277     0           684.146619

拥有这种最终结构将使我的生活变得更加轻松，以至于我愿意为买啤酒的人买啤酒。除此之外，我尝试了各种方法：

cov[tpd].loc[str(round(q,2)), :] = tdf # Raises ValueError: Incompatible indexer with DataFrame
cov[tpd].loc[str(round(q,2)), :].append(tdf) # Almost gives me the frame I need, but removes the index level q, and inserts a column 0 with NaNs
cov[tpd].loc[str(round(q,2)), :].join(tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'join'
pd.merge(cov[tpd].loc[str(round(q,2)), :], tdf, how='outer') # Raises AttributeError: 'Series' object has no attribute 'columns'

我了解所有错误消息，并且也有可能的解决方法，其中涉及以所需cov[tpd]的方式预先创建DataFrame ，然后使用索引从中插入输出cov_var()。但这是用于创建多索引cov[tpd]然后插入数据的几行代码。有谁知道更好的方法？

注意：这cov_var()是我编写的一个简单的协方差计算函数，因为我的情况有点特殊，并且不能使用诸如的内置函数np.cov()。

卡蒂克

所以我终于屈服了，并使用了我在上述问题中暗示的方法。实际上，它似乎比我坚持尝试的方法要快。一切都好。这是我最终要做的事情：

cov = {}
ind_lev_1 = [str(round(q,2)) for q in quantiles]
ind_lev_2 = states
index = pd.MultiIndex.from_product([ind_lev_1, ind_lev_2], names=['QUANTILE', 'STATE'])
columns = pd.Index(ind_lev_2, name='STATE')

for tpd in time_periods:
    cov[tpd] = pd.DataFrame(index=index, columns=columns)
    for q in quantiles:
        q = str(round(q,2))
        cov[tpd].loc[(q,), :] = cov_var(arr=data_for_q, means=pop_means_for_q)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-04-8

我来说两句

0 条评论

登录后参与评论

上一篇：将节点添加到C中的链表时，EXC_BAD访问

将DataFrame附加到索引熊猫

将DataFrame附加到索引熊猫

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用