值toDF不是org.apache.spark.rdd.RDD的成员

elelias

我已经在其他SO帖子中阅读了有关此问题的信息,但我仍然不知道自己在做什么错。原则上,添加以下两行:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

应该可以解决问题,但错误仍然存​​在

这是我的build.sbt:

name := "PickACustomer"

version := "1.0"

scalaVersion := "2.11.7"


libraryDependencies ++= Seq("com.databricks" %% "spark-avro" % "2.0.1",
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.spark" %% "spark-core" % "1.6.0")

我的scala代码是:

import scala.collection.mutable.Map
import scala.collection.immutable.Vector

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._


    object Foo{

    def reshuffle_rdd(rawText: RDD[String]): RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]]  = {...}

    def do_prediction(shuffled:RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]], prediction:(Vector[(Double, Double, String)] => Map[String, Double]) ) : RDD[Map[String, Double]] = {...}

    def get_match_rate_from_results(results : RDD[Map[String, Double]]) : Map[String, Double]  = {...}


    def retrieve_duid(element: Map[String,(Vector[(Double, Double, String)], Map[String,Double])]): Double = {...}




    def main(args: Array[String]){
        val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
        if (!conf.getOption("spark.master").isDefined) conf.setMaster("local")

        val sc = new SparkContext(conf)

        //This should do the trick
        val sqlContext = new org.apache.spark.sql.SQLContext(sc)
        import sqlContext.implicits._

        val PATH_FILE = "/mnt/fast_export_file_clean.csv"
        val rawText = sc.textFile(PATH_FILE)
        val shuffled = reshuffle_rdd(rawText)

        // PREDICT AS A FUNCTION OF THE LAST SEEN UID
        val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid)
        results.cache()

        case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

        val summary = results.map(x => Summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess")))


        //PROBLEMATIC LINE
        val sum_df = summary.toDF()

    }
    }

我总是得到:

值toDF不是org.apache.spark.rdd.RDD的成员[摘要]

现在有点迷路了。有任何想法吗?

mattinbits

将案例类移至main

object Foo {

  case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)

  def main(args: Array[String]){
    ...
  }

}

关于它的作用域的问题阻止了Spark能够处理for模式的自动派生Summary仅供参考,我实际上收到了与以下错误不同的错误sbt

没有可用于摘要的TypeTag

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

值toDS不是org.apache.spark.rdd.RDD的成员

值联接不是org.apache.spark.rdd.RDD的成员

值 collectAsMap 不是 org.apache.spark.rdd.RDD 的成员

值reduceByKey不是org.apache.spark.rdd.RDD的成员

值查找不是org.apache.spark.rdd.RDD的成员

sortBy不是org.apache.spark.rdd.RDD的成员

值联接不是org.apache.spark.rdd.RDD [(Long,T)]的成员

Scala:出现错误 - mapPartitionsWithIndex 不是 org.apache.spark.rdd.RDD[Int] 的成员

saveAsTextFile不是Array [String] spark RDD的成员

Apache Spark RDD拆分“ |”

Apache Spark RDD替代

Scala-如何过滤RDD org.apache.spark.rdd.RDD [String]]

Apache Spark: reading RDD from Spark Cluster

Apache Spark:按键将RDD对拆分为多个RDD以保存值

Apache Spark-使用2个RDD:RDD的补充

无法从现有 RDD 创建 RDD - Apache Spark

Spark MLlib如何将org.apache.spark.rdd.RDD [Array [Double]]转换为Array [Double]

等效于Apache Spark RDD中的getLines

Apache Spark-如何压缩多个RDD

Apache Spark:指向父RDD的引用指针

apache spark-从RDD迭代跳过

如何减少RDD在Apache Spark中的工作

Spark 2.0 Scala-RDD.toDF()

使用Scala将org.apache.spark.mllib.linalg.Vector RDD转换为Spark中的DataFrame

值avro不是org.apache.spark.sql.DataFrameReader的成员

Zeppelin java.lang.NoClassDefFoundError:无法初始化类org.apache.spark.rdd.RDDOperationScope $

如何将org.apache.spark.ml.linalg.Vector的RDD转换为数据集?

Spark:将RDD(键,列表)扩展为RDD(键,值)

在Apache Spark中,如何按两个共享值对RDD的所有行进行分组?