Convert string list to array type

merkle

I have a dataframe with a column of string datatype, but the actual representation is array type.

import pyspark
from pyspark.sql import Row
item = spark.createDataFrame([Row(item='fish',geography=['london','a','b','hyd']),
                              Row(item='chicken',geography=['a','hyd','c']),
                              Row(item='rice',geography=['a','b','c','blr']),
                              Row(item='soup',geography=['a','kol','simla']),
                              Row(item='pav',geography=['a','del']),
                              Row(item='kachori',geography=['a','guj']),
                              Row(item='fries',geography=['a','chen']),
                              Row(item='noodles',geography=['a','mum'])])
item.show()
# +-------+-------------------+
# |   item|          geography|
# +-------+-------------------+
# |   fish|[london, a, b, hyd]|
# |chicken|        [a, hyd, c]|
# |   rice|     [a, b, c, blr]|
# |   soup|    [a, kol, simla]|
# |    pav|           [a, del]|
# |kachori|           [a, guj]|
# |  fries|          [a, chen]|
# |noodles|           [a, mum]|
# +-------+-------------------+

print(item.printSchema())
#  root
#  |-- item: string (nullable = true)
#  |-- geography: string (nullable = true)

How to convert the geography column in the above dataset to array type?

ZygD
F.expr("regexp_extract_all(geography, '(\\\\w+)', 1)")

regexp_extract_all is available from Spark 3.1+

regexp_extract_all(str, regexp[, idx]) - Extract all strings in the str that match the regexp expression and corresponding to the regex group index.

from pyspark.sql import Row, functions as F

item = spark.createDataFrame([Row(item='fish',geography="['london','a','b','hyd']"),
                              Row(item='noodles',geography="['a','mum']")])
item.printSchema()
# root
#  |-- item: string (nullable = true)
#  |-- geography: string (nullable = true)


item = item.withColumn('geography', F.expr("regexp_extract_all(geography, '(\\\\w+)', 1)"))

item.printSchema()
# root
#  |-- item: string (nullable = true)
#  |-- geography: array (nullable = true)
#  |    |-- element: string (containsNull = true)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

ValueError: could not convert string to float---how to convert a list of lists of strings into a numpy array type float?

How to convert string into list of array

Convert contents of Array List to String

Convert string into array or list of dicts

Convert array of strings to List<string>

Convert object of type List<object> to List<string>

Type mismatch: convert from String to List<String>

Convert string TYPE into array of chars TYPE

Convert Array[(String,String)] type to RDD[(String,String)] type in spark

How to convert string type array into array in swift?

Convert from List of string array to List of objects

C# - Convert String array to String List

Convert object array string to list string

Convert array to string from array list

FLUUTER How to convert List <string>to List<custom type>? Event type

Python Dataframe Convert string to list type

how do convert a string List into type IEnumerable?

Convert List of custom type to String in Objectbox

How to convert model type of list to array in flutter?

Convert default parameter value of string to type array

Convert any array of any type into string

How to convert a type of a string field to array in MongoDB

How to convert a string type to an array of chars

Convert an array of strings into string literal union type

Convert type [key: string]: number to this type Array< [string, number]>

How to convert string type array to variant type array - Excel VBA

How to convert integer type array (with some NaN) to string type array

convert string with square brackets to array or list in Javascript

How I convert string response into Array List