collect_list
在分组时使用,然后使用concat_ws
函数从列表中生成字符串。
df.show(false)
+--------------------------------------+------+---------------+---------------+----------------+-------+
|Errors |userid|associationtype|associationrank|associationvalue|sparkId|
+--------------------------------------+------+---------------+---------------+----------------+-------+
|Primary Key Constraint Violated |3 |Brand5 |error |Lee |4 |
|Incorrect datatype in associationrank|3 |Brand5 |error |Lee |4 |
+--------------------------------------+------+---------------+---------------+----------------+-------+
df.groupBy("userid", "associationtype", "associationrank", "associationvalue", "sparkId")
.agg(collect_list("Errors").as("Errors"))
.withColumn("Errors", concat_ws(", ", col("Errors")))
.show(false)
+------+---------------+---------------+----------------+-------+-----------------------------------------------------------------------+
|userid|associationtype|associationrank|associationvalue|sparkId|Errors |
+------+---------------+---------------+----------------+-------+-----------------------------------------------------------------------+
|3 |Brand5 |error |Lee |4 |Primary Key Constraint Violated, Incorrect datatype in associationrank|
+------+---------------+---------------+----------------+-------+-----------------------------------------------------------------------+
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句