如何在 Apache Spark 的 Left join 中获取 join key

abc_spark

我有下面的 DF -

scala> val df1=Seq( ("1","1_10"), ("1","1_11"), ("2","2_20"), ("3","3_30"), ("3 ","3_31") )toDF("c1","c2")

+---+----+
| c1|  c2|
+---+----+
|  1|1_10|
|  1|1_11|
|  2|2_20|
|  3|3_30|
|  3|3_31|
+---+----+

val df2=Seq(("2","200"), ("3","300"))toDF("c1","val")

+---+---+
| c1| val|
+---+---+
|  2|200|
|  3|300|
+---+---+

如果，我进行 left join ，我将得到如下结果。

  scala> df1.join(df2,Seq("c1"),"left").select(df1("c1").alias("df1_c1"),df1("c2"),df2("val")).show
+------+----+----+
|df1_c1|  c2| val|
+------+----+----+
|     1|1_10|null|
|     1|1_11|null|
|     2|2_20| 200|
|     3|3_30| 300|
|     3|3_31| 300|
+------+----+----+

但是，我怎么能得到右表的连接键 val 呢？

预期输出 -

+------+----+----+------+
|df1_c1|  c2| val|df2_c1|
+------+----+----+------+
|     1|1_10|null|  null|
|     1|1_11|null|  null|
|     2|2_20| 200|     2|
|     3|3_30| 300|     3|
|     3|3_31| 300|     3|
+------+----+----+------+

If I try ,  df1.join(df2,Seq("c1"),"left").select(df1("c1").alias("df1_c1"),df1("c2"),df2("val"),df2("c1")).show,

我会收到以下错误 -

org.apache.spark.sql.AnalysisException: Resolved attribute(s) c1#19639 missing from c1#19630,c2#19631,val#19640 in operator !Project [c1#19630 AS df1_c1#19667, c2#19631, val#19640, c1#19639]. Attribute(s) with the same name appear in the operation: c1. Please check if the right attribute(s) are used.;

猎鹰-le0

如果您使用列序列 Seq("c1")进行连接，Spark 会删除重复的列。您可以改用自定义联接表达式：

df1.as("df1").join(df2.as("df2"), expr("df1.c1 == df2.c1"),"left").select($"df1.c1".alias("df1_c1"), $"df1.c2", $"df2.c1".as("df2_c1"), $"df2.val").show(false)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-17

我来说两句

0 条评论

登录后参与评论

上一篇：SQL：查询以获取用户在给定月份、年份的至少 1 天内出现的行

如何在 LEFT JOIN 中嵌套 INNER JOIN

如何在 Apache Spark 的 Left join 中获取 join key

如何在 Apache Spark 的 Left join 中获取 join key

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序