在两列的基础上合并两个数据框，同时考虑 nan 值

Syed Md Ismail |

我想合并两个数据帧 - Lifetime_df和Input_DataFrame2。最后的Lifetime_df应包含它所拥有的所有内容，但替换为Input_DataFrame2 的计数以匹配列 ['Identifier_column', 'lifetime']

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     1
2      138122               3   NaN
3      138122               4   NaN
4      138122               5     0
5      138122               6     1
6      138122               7   NaN
7      138122               8     0
8      138122               9     1

Input_DataFrame2

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               6     1
3      138122               9     1

期望输出：

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               3   NaN
3      138122               4   NaN
4      138122               5     0
5      138122               6     1
6      138122               7   NaN
7      138122               8     0
8      138122               9     1

以下命令的输出不满足要求

Input_DataFrame3 = pd.merge(Lifetime_df, 
                                Input_DataFrame2, 
                                how='left', 
                                on=[Identifier_column, lifetime])

Lifetime_df['count'] = Input_DataFrame3['count_y']

获得：

Lifetime_df

    Identifier_column  lifetime count
0      138122               1     1
1      138122               2     4
2      138122               3   NaN
3      138122               4   NaN
4      138122               5   NaN
5      138122               6     1
6      138122               7   NaN
7      138122               8   NaN
8      138122               9     1

二凡

与好老merge和fillna：

Input_DataFrame3  = Lifetime_df.merge(Input_DataFrame2, 
                                      on=['Identifier_column', 'lifetime'], 
                                      how='left', 
                                      suffixes=['_x', ''])

Input_DataFrame3['count'] = Input_DataFrame3['count'].fillna(Input_DataFrame3['count_x'])
Input_DataFrame3 = Input_DataFrame3.drop(columns='count_x')

   Identifier_column  lifetime  count
0             138122         1    1.0
1             138122         2    4.0
2             138122         3    NaN
3             138122         4    NaN
4             138122         5    0.0
5             138122         6    1.0
6             138122         7    NaN
7             138122         8    0.0
8             138122         9    1.0

或者受到 YOBEN 回答的启发，pd.concat并且drop_duplicates：

key_cols = ['Identifier_column', 'lifetime']
pd.concat([Input_DataFrame2, Lifetime_df]).drop_duplicates(key_cols).sort_values(key_cols)

   Identifier_column  lifetime  count
0             138122         1    1.0
1             138122         2    4.0
2             138122         3    NaN
3             138122         4    NaN
4             138122         5    0.0
5             138122         6    1.0
6             138122         7    NaN
7             138122         8    0.0
8             138122         9    1.0

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。