如何将两列合并到新的DataFrame中？

Markus 发表于 Dev

马库斯

我有两个DataFrame（Spark 2.2.0和Scala 2.11.8）。第一个DataFramedf1有一个称为的列col1，第二个DataFrame也有一个称为的df2列col2。两个数据帧中的行数相等。

如何将这两列合并到新的DataFrame中？

我尝试过join，但是我认为应该有其他方法可以做到。

另外，我尝试应用withColumm，但无法编译。

val result = df1.withColumn(col("col2"), df2.col1)

更新：

例如：

df1 = 
col1
1
2
3

df2 = 
col2
4
5
6

result = 
col1  col2
1     4
2     5
3     6

滴滴

如果这两列之间没有实际关系，则听起来您需要并运算符，该运算符将只返回这两个数据帧的并集：

var df1 = Seq("a", "b", "c").toDF("one")
var df2 = Seq("d", "e", "f").toDF("two")

df1.union(df2).show

+---+ 
|one| 
+---+ 
| a | 
| b | 
| c | 
| d | 
| e | 
| f | 
+---+

[edit]现在您已经清楚地表明只需要两列，然后使用DataFrames，可以使用技巧，通过函数monotonically_increasing_id（）添加行索引并加入该索引值：

import org.apache.spark.sql.functions.monotonically_increasing_id

var df1 = Seq("a", "b", "c").toDF("one")
var df2 = Seq("d", "e", "f").toDF("two")

df1.withColumn("id", monotonically_increasing_id())
    .join(df2.withColumn("id", monotonically_increasing_id()), Seq("id"))
    .drop("id")
    .show

+---+---+ 
|one|two|
+---+---+ 
| a | d | 
| b | e | 
| c | f |
+---+---+

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。