How to make first line of text file as header and skip second line in spark scala

user3391694

I am trying to figure out how to use the first line of text file as header and skip seconds line. So far I have tried this:

scala> val file = spark.sparkContext.textFile("/home/webwerks/Desktop/UseCase-03-March/Temp/temp.out")  
file: org.apache.spark.rdd.RDD[String] = /home/webwerks/Desktop/UseCase-03-March/Temp/temp.out MapPartitionsRDD[40] at textFile at <console>:23

scala> val clean = file.flatMap(x=>x.split("\t")).filter(x=> !(x.contains("-")))
clean: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[42] at filter at <console>:25

scala> val df=clean.toDF()
df: org.apache.spark.sql.DataFrame = [value: string]

scala> df.show
+--------------------+
|               value|
+--------------------+
|time         task...|
|03:27:51.199 FCPH...|
|03:27:51.199 PORT...|
|03:27:51.200 PORT...|
|03:27:51.200 PORT...|
|03:27:59.377 PORT...|
|03:27:59.377 PORT...|
|03:27:59.377 FCPH...|
|03:27:59.377 FCPH...|
|03:28:00.468 PORT...|
|03:28:00.468 PORT...|
|03:28:00.469 FCPH...|
|03:28:00.469 FCPH...|
|03:28:01.197 FCPH...|
|03:28:01.197 FCPH...|
|03:28:01.197 PORT...|
|03:28:01.198 PORT...|
|03:28:09.380 PORT...|
|03:28:09.380 PORT...|
|03:28:09.380 FCPH...|

Here I want first line as header and data should be separate by tab

data is like:

time         task       event   port cmd  args
--------------------------------------------------------------------------------------
03:27:51.199 FCPH       seq      13   28  00300000,00000000,00000591,00020182,00000000
03:27:51.199 PORT       Rx       11    0  c0fffffd,00fffffd,0ed10335,00000001
03:27:51.200 PORT       Tx       13   40  02fffffd,00fffffd,0ed3ffff,14000000
03:27:51.200 PORT       Rx       13    0  c0fffffd,00fffffd,0ed329ae,00000001
03:27:59.377 PORT       Rx       15   40  02fffffd,00fffffd,0336ffff,14000000
03:27:59.377 PORT       Tx       15    0  c0fffffd,00fffffd,03360ed2,00000001
03:27:59.377 FCPH       read     15   40  02fffffd,00fffffd,d0000000,00000000,03360ed2
03:27:59.377 FCPH       seq      15   28  22380000,03360ed2,0000052b,0000001c,00000000
03:28:00.468 PORT       Rx       13   40  02fffffd,00fffffd,29afffff,14000000
03:28:00.468 PORT       Tx       13    0  c0fffffd,00fffffd,29af0ed5,00000001
Goutam Pradhan
        scala> val ds = spark.read.textFile("data.txt")  > spark-v2.0
                         (or) 
        val ds = spark.sparkContext.textFile("data.txt")

        scala> val schemaArr = ds.filter(x=>x.contains("time")).collect.mkString.split("\t").toList

        scala> val df = ds.filter(x=> !x.contains("time"))
                          .map(x=>{
                                val cols = x.split("\t")
                                (cols(0),cols(1),cols(2),cols(3),cols(4),cols(5))
                               }).toDF(schemaArr:_*)

        scala> df.show(false)
        +------------+----+-----+----+---+--------------------------------------------+
        |time        |task|event|port|cmd|args                                        |
        +------------+----+-----+----+---+--------------------------------------------+
        |03:27:51.199|FCPH|seq  |13  |28 |00300000,00000000,00000591,00020182,00000000|
        |03:27:51.199|PORT|Rx   |11  | 0 |c0fffffd,00fffffd,0ed10335,00000001         |
        |03:27:51.200|PORT|Tx   |13  |40 |02fffffd,00fffffd,0ed3ffff,14000000         |
        |03:27:51.200|PORT|Rx   |13  | 0 |c0fffffd,00fffffd,0ed329ae,00000001         |
        |03:27:59.377|PORT|Rx   |15  |40 |02fffffd,00fffffd,0336ffff,14000000         |
        |03:27:59.377|PORT|Tx   |15  | 0 |c0fffffd,00fffffd,03360ed2,00000001         |
        |03:27:59.377|FCPH|read |15  |40 |02fffffd,00fffffd,d0000000,00000000,03360ed2|
        |03:27:59.377|FCPH|seq  |15  |28 |22380000,03360ed2,0000052b,0000001c,00000000|
        |03:28:00.468|PORT|Rx   |13  |40 |02fffffd,00fffffd,29afffff,14000000         |
        |03:28:00.468|PORT|Tx   |13  | 0 |c0fffffd,00fffffd,29af0ed5,00000001         |
        +------------+----+-----+----+---+--------------------------------------------+

please try something like above and if you want schema then apply to it by using costume schema

Este artículo se recopila de Internet, indique la fuente cuando se vuelva a imprimir.

En caso de infracción, por favor [email protected] Eliminar

Editado en
0

Déjame decir algunas palabras

0Comentarios
Iniciar sesiónRevisión de participación posterior

Artículos relacionados

How to skip the first line of a CSV file and make the second line the header

Regex: How to insert text before the first line and after the last line of file

How to make a line break for text box

How to add first line and last line value into existing XML file

How to read content of sftp text file line by line and save into database?

Laravel 5.6 how to read text file line by line

How to read text file line by line in vhdl by clk?

Removing everything from a text file except first line

How to add second word as a number of each line in a file?

Make the second instance of a Qt5 application transfer command line arguments to the first instance

Read file line by line, split its content, skip empty lines

How do I loop though a text file, line by line, and append each line to an array?

Go to first line in a file in vim?

How to make a very simple coloured 1-line text area?

ul li list text goes to second line

Div Not Floating When Text Wraps to Second Line?

Remove line if the second or second to last character is a space in the first column of a CSV

How to make flask stream output line by line?

Replace the "pattern" on second-to-last line of a file

How to place the text to the right of the first input, and make the text near the second input wider?

How to delete a line of the text file from the output of checklist

How to detect what kind of break line in a text file in python?

In java, how do i edit 1 line of a text file?

how to write a content to the text file with multiple inputs with new line?

how to write a content to the text file with multiple inputs with new line?

How can I switch the position of a line with the line after it in a text file using a Scanner?

How to read file line by line by CRLF

in python, how I get the value of the fifth line value , not the second line

JavaScript - How to get the line after the second comma

TOP Lista

CalienteEtiquetas

Archivo