Python Pandas合并和更新数据框

KaiWei 发表于 Dev

KaiWei

我目前正在使用Python和Pandas构成股票价格“数据库”。我设法找到一些代码来下载股票价格。

df1是我现有的数据库。每次下载股价时，它看起来都像df2和df3。然后，我需要合并df1，df2和df3数据，使其看起来像df4。

每个股票都有其自己的列。每个日期都有其自己的行。

df1：现有数据库

+----------+-------+----------+--------+
|   Date   | Apple | Facebook | Google |
+----------+-------+----------+--------+
| 1/1/2018 |   161 |       58 |   1000 |
| 2/1/2018 |   170 |       80 |        |
| 3/1/2018 |   190 |       84 |    100 |
+----------+-------+----------+--------+

df2：Google的新数据（2/1/2018和4/1/2018）和更新的数据（3/1/2018）。

+----------+--------+
|   Date   | Google |
+----------+--------+
| 2/1/2018 |    500 |
| 3/1/2018 |    300 |
| 4/1/2018 |    200 |
+----------+--------+

df3：Amazon的新数据

+----------+--------+
|   Date   | Amazon |
+----------+--------+
| 1/1/2018 |   1000 |
| 2/1/2018 |   1500 |
| 3/1/2018 |   2000 |
| 4/1/2018 |   3000 |
+----------+--------+

df4最终输出：基本上，它将所有数据合并并更新到数据库中。（df1 + df2 + df3）->这将是df1的更新数据库

+----------+-------+----------+--------+--------+
|   Date   | Apple | Facebook | Google | Amazon |
+----------+-------+----------+--------+--------+
| 1/1/2018 |   161 |       58 |   1000 |   1000 |
| 2/1/2018 |   170 |       80 |    500 |   1500 |
| 3/1/2018 |   190 |       84 |    300 |   2000 |
| 4/1/2018 |       |          |    200 |   3000 |
+----------+-------+----------+--------+--------+

我不知道如何结合df1和df3。

而且我不知道如何合并df1和df2（添加新行：4/1/2018），同时更新数据（2/1/2018->原始数据：NaN；修改后的数据：500 | 3/1 / 2018->原始数据：100;修改后的数据：300）并保留现有的完整数据（1/1/2018）。

谁能帮助我获得df4？=）

谢谢。

编辑：基于Sociopath的建议，我将代码修改为：

dataframes = [df2, df3]
df4 = df1

for i in dataframes:
    # Merge the dataframe
    df4 = df4.merge(i, how='outer', on='date')

    # Get the stock name
    stock_name = i.columns[1]

    # To check if there is any column with "_x", if have, then combine these columns
    if stock_name+"_x" in df4.columns:
        x = stock_name+"_x"
        y = stock_name+"_y"
        df4[stock_name] = df4[y].fillna(df4[x])
        df4.drop([x, y], 1, inplace=True)

社交病

您需要merge：

df1 = pd.DataFrame({'date':['2/1/2018','3/1/2018','4/1/2018'], 'Google':[500,300,200]})
df2 = pd.DataFrame({'date':['1/1/2018','2/1/2018','3/1/2018','4/1/2018'], 'Amazon':[1000,1500,2000,3000]})
df3 = pd.DataFrame({'date':['1/1/2018','2/1/2018','3/1/2018'], 'Apple':[161,171,181], 'Google':[1000,None,100], 'Facebook':[58,75,65]})

如果该列在当前数据库中不存在，则只需使用merge以下方法

df_new = df3.merge(df2, how='outer',on=['date'])

如果数据库中存在该列，则用于fillna更新以下值：

df_new = df_new.merge(df1, how='outer', on='date')
#print(df_new)
df_new['Google'] = df_new['Google_y'].fillna(df_new['Google_x'])
df_new.drop(['Google_x','Google_y'], 1, inplace=True)

输出：

    date       Apple    Facebook    Amazon  Google
0   1/1/2018    161.0   58.0        1000    1000.0
1   2/1/2018    171.0   75.0        1500    500.0
2   3/1/2018    181.0   65.0        2000    300.0
3   4/1/2018    NaN     NaN         3000    200.0

编辑

下一部分将提供更通用的解决方案。

dataframes = [df2, df3, df4]  

for i in dataframes:
    stock_name = list(i.columns.difference(['date']))[0]
    df_new = df_new.merge(i, how='outer', on='date')
    x = stock_name+"_x"
    y = stock_name+"_y"

    df_new[stock_name] = df_new[y].fillna(df_new[x])
    df_new.drop([x,y], 1, inplace=True)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。