Spark读取json对象数据作为MapType

马欣达

我编写了一个示例spark应用程序，在其中使用MapType创建数据框并将其写入磁盘。然后，我正在读取同一文件并打印其架构。但是，与输入模式相比，输出文件模式有所不同，我在输出中看不到MapType。如何使用MapType读取输出文件

码

import org.apache.spark.sql.{SaveMode, SparkSession}

case class Department(Id:String,Description:String)
case class Person(name:String,department:Map[String,Department])

object sample {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder.master("local").appName("Custom Poc").getOrCreate
    import spark.implicits._

    val schemaData = Seq(
      Person("Persion1", Map("It" -> Department("1", "It Department"), "HR" -> Department("2", "HR Department"))),
      Person("Persion2", Map("It" -> Department("1", "It Department")))
    )
    val df = spark.sparkContext.parallelize(schemaData).toDF()
    println("Input schema")
    df.printSchema()
    df.write.mode(SaveMode.Overwrite).json("D:\\save\\output")

    println("Output schema")
    spark.read.json("D:\\save\\output\\*.json").printSchema()
  }
}

输出

Input schema
root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)
Output schema
root
 |-- department: struct (nullable = true)
 |    |-- HR: struct (nullable = true)
 |    |    |-- Description: string (nullable = true)
 |    |    |-- Id: string (nullable = true)
 |    |-- It: struct (nullable = true)
 |    |    |-- Description: string (nullable = true)
 |    |    |-- Id: string (nullable = true)
 |-- name: string (nullable = true)

杰森文件

{"name":"Persion1","department":{"It":{"Id":"1","Description":"It Department"},"HR":{"Id":"2","Description":"HR Department"}}}
{"name":"Persion2","department":{"It":{"Id":"1","Description":"It Department"}}}

编辑：仅出于解释我的要求，我在上面添加了保存文件部分。在实际情况下，我将仅读取上面提供的JSON数据并在该数据帧上工作

狗

您可以通过schema从prevous数据框中，同时读取json数据

println("Input schema")
df.printSchema()
df.write.mode(SaveMode.Overwrite).json("D:\\save\\output")

println("Output schema")
spark.read.schema(df.schema).json("D:\\save\\output")

输入模式

root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)

输出架构

root
 |-- name: string (nullable = true)
 |-- department: map (nullable = true)
 |    |-- key: string
 |    |-- value: struct (valueContainsNull = true)
 |    |    |-- Id: string (nullable = true)
 |    |    |-- Description: string (nullable = true)

希望这可以帮助！

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-29

我来说两句

0 条评论

登录后参与评论

上一篇：使用SPI与ADXL355进行通信

Spark读取json对象数据作为MapType

Spark读取json对象数据作为MapType

构建类似于Jarvis的本地语言应用程序

在 Avalonia 中是否有带有柱子的 TreeView 或类似的东西？

Qt Creator Windows 10 - “使用 jom 而不是 nmake”不起作用

SQL Server中的非确定性数据类型

使用next.js时出现服务器错误，错误：找不到react-redux上下文值；请确保组件包装在<Provider>中

Swift 2.1-对单个单元格使用UITableView

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

HttpClient中的角度变化检测

如何了解DFT结果

错误：找不到存根。请确保已调用spring-cloud-contract：convert

Embers js中的更改侦听器上的组合框

在Wagtail管理员中，如何禁用图像和文档的摘要项？

如何避免每次重新编译所有文件？

Java中的循环开关案例

ng升级性能注意事项

Swift中的指针替代品？

如何使用geoChoroplethChart和dc.js在Mapchart的路径上添加标签或自定义值？

使用分隔符将成对相邻的数组元素相互连接

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

ggplot：对齐多个分面图-所有大小不同的分面

完全禁用暂停（在内核级别？-必须与使用的DE和登录状态无关！）