熊猫多列相交

Python Spark

``````data={'NAME':['JOHN','MARY','CHARLIE'],
'A':[[1,2,3],[2,3,4],[3,4,5]],
'B':[[2,3,4],[3,4,5],[4,5,6]],
'C':[[2,4],[3,4],[6,7]]  }
df=pd.DataFrame(data)
df=df[['NAME','A','B','C']]
NAME          A          B            C
0   JOHN    [1, 2, 3]   [2, 3, 4]   [2, 4]
1   MARY    [2, 3, 4]   [3, 4, 5]   [3, 4]
2   CHARLIE [3, 4, 5]   [4, 5, 6]   [6, 7]
``````

``````df['D']=list(set(df['A'])&set(df['B'])&set(df['C']))
``````

``````    NAME            A         B         C       D
0   JOHN    [1, 2, 3]   [2, 3, 4]   [2, 4]  [2]
1   MARY    [2, 3, 4]   [3, 4, 5]   [3, 4]  [3, 4]
2   CHARLIE [3, 4, 5]   [4, 5, 6]   [6, 7]  []
``````

选项1：

``````df.assign(D=df.transform(
lambda x: list(set(x.A)&set(x.B)&set(x.C)),
axis=1))
``````

选项2：

``````df.assign(D=df.transform(
lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
axis=1))
``````

``````df.assign(D=df.apply(
lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
axis=1))
``````

选项3：

``````df.assign(D=df.transform(
lambda x: list(reduce(set.intersection, map(set,x.tolist()[1:]))),
axis=1))
``````

• 使用`set(x.A).intersection(set(x.B))..`每一行的链条获取交点
• 将结果转换为列表
• 对数据框中的每一行执行此操作

``````In [76]: df.assign(D=df.transform(
...:     lambda x: list(set(x.A).intersection(set(x.B)).intersection(set(x.C))),
...:     axis=1))
Out[76]:
NAME          A          B       C       D
0     JOHN  [1, 2, 3]  [2, 3, 4]  [2, 4]     [2]
1     MARY  [2, 3, 4]  [3, 4, 5]  [3, 4]  [3, 4]
2  CHARLIE  [3, 4, 5]  [4, 5, 6]  [6, 7]      []
``````

0 条评论