我有一个数据框
number time td
1 1 1
1 1 11
1 2 2
1 10 9
1 14 10
2 1 1
2 11 10
2 15 20
2 15 21
2 16 21
2 17 21
2 18 21
如果当前 td > 10 我需要删除第 n+1 行然后移动到第 n+2 行并重复此步骤
结果
number time td
1 1 1
1 1 11
1 10 9
1 14 10
2 1 1
2 11 10
2 15 20
2 16 21
2 18 21
我怎样才能做到这一点?
检查下面的代码。
scala> df.show(false)
+------+----+---+
|number|time|td |
+------+----+---+
|1 |1 |1 |
|1 |1 |11 |
|1 |2 |2 |
|1 |10 |9 |
|1 |14 |10 |
|2 |1 |1 |
|2 |11 |10 |
|2 |15 |20 |
|2 |15 |21 |
|2 |16 |21 |
|2 |17 |21 |
|2 |18 |21 |
+------+----+---+
scala> :paste
df
.withColumn("rno",monotonically_increasing_id)
.withColumn("isdeleted",when(
!(when(lag($"td",1).over(Window.orderBy($"number".asc)) > 10, true).otherwise(false)===true && $"rno" % 2 === 0),
false
)
.otherwise(true)
)
.filter($"isdeleted" === false)
.drop("isdeleted","rno")
.show(false)
输出
+------+----+---+
|number|time|td |
+------+----+---+
|1 |1 |1 |
|1 |1 |11 |
|1 |10 |9 |
|1 |14 |10 |
|2 |1 |1 |
|2 |11 |10 |
|2 |15 |20 |
|2 |16 |21 |
|2 |18 |21 |
+------+----+---+
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句