我正在尝试在数据框中添加一列,其中包含同一数据框中一列的所有值的总和。
例如:在图片中有列 - UserID、MovieID、Rating、Unixtimestamp。现在我想添加一个名为 Sum 的列,它将包含 Rating Column 的所有值的总和。
我有一个评级数据框
Ratings DataFrame 列名称:UserID、MovieID、Ratings、UnixTimeStamp。
+------+-------+------+-------------+
|UserID|MovieID|Rating|UnixTimeStamp|
+------+-------+------+-------------+
| 196| 242| 3| 881250949|
| 186| 302| 3| 891717742|
| 22| 377| 1| 878887116|
| 244| 51| 2| 880606923|
| 166| 346| 1| 886397596|
+------+-------+------+-------------+
只显示前 5 行
我必须计算 wa 等级并将其存储到数据框中。
wa_rating=(评分>3)/总评分
请帮我找到 wa_rating 数据框,其中包含一个使用 scala spark 的新列
看一下这个:
scala> val df = Seq((196,242,3,881250949),(186,302,3,891717742),(22,377,1,878887116),(244,51,2,880606923),(166,346,1,886397596)).toDF("userid","movieid","rating","unixtimestamp")
df: org.apache.spark.sql.DataFrame = [userid: int, movieid: int ... 2 more fields]
scala> df.show(false)
+------+-------+------+-------------+
|userid|movieid|rating|unixtimestamp|
+------+-------+------+-------------+
|196 |242 |3 |881250949 |
|186 |302 |3 |891717742 |
|22 |377 |1 |878887116 |
|244 |51 |2 |880606923 |
|166 |346 |1 |886397596 |
+------+-------+------+-------------+
scala> import org.apache.spark.sql.expressions._
import org.apache.spark.sql.expressions._
scala> val df2 = df.withColumn("total_rating",sum('rating).over())
df2: org.apache.spark.sql.DataFrame = [userid: int, movieid: int ... 3 more fields]
scala> df2.show(false)
19/01/23 08:38:46 WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
+------+-------+------+-------------+------------+
|userid|movieid|rating|unixtimestamp|total_rating|
+------+-------+------+-------------+------------+
|22 |377 |1 |878887116 |10 |
|244 |51 |2 |880606923 |10 |
|166 |346 |1 |886397596 |10 |
|196 |242 |3 |881250949 |10 |
|186 |302 |3 |891717742 |10 |
+------+-------+------+-------------+------------+
scala> df2.withColumn("wa_rating",coalesce( when('rating >= 3,'rating),lit(0))/'total_rating).show(false)
19/01/23 08:47:49 WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
+------+-------+------+-------------+------------+---------+
|userid|movieid|rating|unixtimestamp|total_rating|wa_rating|
+------+-------+------+-------------+------------+---------+
|22 |377 |1 |878887116 |10 |0.0 |
|244 |51 |2 |880606923 |10 |0.0 |
|166 |346 |1 |886397596 |10 |0.0 |
|196 |242 |3 |881250949 |10 |0.3 |
|186 |302 |3 |891717742 |10 |0.3 |
+------+-------+------+-------------+------------+---------+
scala>
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句