满足特定条件时合并两个数据帧

冬天

我的代码如下所示：

import pandas as pd 

df = pd.DataFrame ({
    'IP':['1.1.1.1','2.2.2.2','3.3.3.3','4.4.4.4','5.5.5.5'],
    'ID':['101','202','303','404','505'],
    'Name':['aqua','noctua','ytube','tech','logi'],
    'Price':[100,200,300,400,500]
    })

df1 = pd.DataFrame ({
    'IP':['1.1.1.1','2.2.2.2','3.3.3.3','4.4.4.4','6.6.6.6'],
    'ID':['101','202','303','404','606'],
    'Name':['atlas','noctua','ytube','tech','smash'],
    'Price':[600,700,800,900,990]

    })
print(df)
        IP   ID    Name  Price
0  1.1.1.1  101    aqua    100
1  2.2.2.2  202  noctua    200
2  3.3.3.3  303   ytube    300
3  4.4.4.4  404    tech    400
4  5.5.5.5  505    logi    500

print(df1)
        IP   ID    Name  Price
0  1.1.1.1  101   atlas    600
1  2.2.2.2  202  noctua    700
2  3.3.3.3  303   ytube    800
3  4.4.4.4  404    tech    900
4  6.6.6.6  606   smash    990

new=df1.merge(df,indicator=True,how='left').loc[lambda x : x['_merge']=='left_only']
print(new)
        IP   ID    Name  Price     _merge
0  1.1.1.1  101   atlas    600  left_only
1  2.2.2.2  202  noctua    700  left_only
2  3.3.3.3  303   ytube    800  left_only
3  4.4.4.4  404    tech    900  left_only
4  6.6.6.6  606   smash    990  left_only

当 IP 和 ID 的组合在两个数据帧中唯一时，名为new的新数据帧应仅包含来自 df1 的数据（我不关心其他列）。因此正确的输出是：

        IP   ID    Name  Price     _merge
0  6.6.6.6  606   smash    990  left_only

我需要在代码中更改什么才能获得此输出？谢谢你。

布鲁诺·梅洛

您可以在 pandas 合并中使用 how 参数，它根据您要合并的列获取：

new=df1.merge(df,indicator=True,how='left', on=['IP', 'ID']).loc[lambda x : x['_merge']=='left_only']

print(new)
 IP   ID Name_x  Price_x Name_y              Price_y     _merge
4  6.6.6.6  606  smash      990    NaN                  nan  left_only

如果你不通过它，pandas 会尝试根据数据帧来推断我导致了一些这样的问题，所以我总是喜欢通过它来防止错误，

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。