我已经在其他SO帖子中阅读了有关此问题的信息,但我仍然不知道自己在做什么错。原则上,添加以下两行:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
应该可以解决问题,但错误仍然存在
这是我的build.sbt:
name := "PickACustomer"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq("com.databricks" %% "spark-avro" % "2.0.1",
"org.apache.spark" %% "spark-sql" % "1.6.0",
"org.apache.spark" %% "spark-core" % "1.6.0")
我的scala代码是:
import scala.collection.mutable.Map
import scala.collection.immutable.Vector
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql._
object Foo{
def reshuffle_rdd(rawText: RDD[String]): RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]] = {...}
def do_prediction(shuffled:RDD[Map[String, (Vector[(Double, Double, String)], Map[String, Double])]], prediction:(Vector[(Double, Double, String)] => Map[String, Double]) ) : RDD[Map[String, Double]] = {...}
def get_match_rate_from_results(results : RDD[Map[String, Double]]) : Map[String, Double] = {...}
def retrieve_duid(element: Map[String,(Vector[(Double, Double, String)], Map[String,Double])]): Double = {...}
def main(args: Array[String]){
val conf = new SparkConf().setAppName(this.getClass.getSimpleName)
if (!conf.getOption("spark.master").isDefined) conf.setMaster("local")
val sc = new SparkContext(conf)
//This should do the trick
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val PATH_FILE = "/mnt/fast_export_file_clean.csv"
val rawText = sc.textFile(PATH_FILE)
val shuffled = reshuffle_rdd(rawText)
// PREDICT AS A FUNCTION OF THE LAST SEEN UID
val results = do_prediction(shuffled.filter(x => retrieve_duid(x) > 1) , predict_as_last_uid)
results.cache()
case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)
val summary = results.map(x => Summary(x("match"), x("t_to_last"), x("nflips"), x("d_uid"), x("truth"), x("guess")))
//PROBLEMATIC LINE
val sum_df = summary.toDF()
}
}
我总是得到:
值toDF不是org.apache.spark.rdd.RDD的成员[摘要]
现在有点迷路了。有任何想法吗?
将案例类移至main
:
object Foo {
case class Summary(ismatch: Double, t_to_last:Double, nflips:Double,d_uid: Double, truth:Double, guess:Double)
def main(args: Array[String]){
...
}
}
关于它的作用域的问题阻止了Spark能够处理for模式的自动派生Summary
。仅供参考,我实际上收到了与以下错误不同的错误sbt
:
没有可用于摘要的TypeTag
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句