我有一个包含键,值对数组的RDD。我想获取一个带有键的元素(例如4)。
scala> val a = sc.parallelize(List("dog","tiger","lion","cat","spider","eagle"),2)
a: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:27
scala> val b = a.keyBy(_.length)
b: org.apache.spark.rdd.RDD[(Int, String)] = MapPartitionsRDD[1] at keyBy at <console>:29
我试图对其应用过滤器,但出现错误。
scala> val c = b.filter(p => p(0) = 4);
<console>:31: error: value update is not a member of (Int, String)
val c = b.filter(p => p(0) = 4);
我想将具有特定键(例如4)的键,值对打印为 Array((4,lion))
数据总是以键,值对数组的形式出现
使用p._1
而不是p(0)
。
val rdd = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 1)
val kvRdd: RDD[(Int, String)] = rdd.keyBy(_.length)
val filterRdd: RDD[(Int, String)] = kvRdd.filter(p => p._1 == 4)
//display rdd
println(filterRdd.collect().toList)
List((4,lion))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句