In spark MLlib, How do i convert string to integer in spark scala?

DaehyunPark

As i know, MLlib supports only interger.
Then i want to convert string to interger in scala. For example, I have many reviewerID, productID in txtfile.

reviewerID    productID
03905X0912    ZXASQWZXAS
0325935ODD    PDLFMBKGMS
...
sourabh

StringIndexer is the solution. It will fit into the ML pipeline with an estimator and transformer. Essentially once you set the input column, it computes the frequency of each category and numbers them starting 0. You can add IndexToString at the end of pipeline to replace by original strings if required.

You can look at ML Documentation for "Estimating, transforming and selecting features" for further details.

In your case it will go like:

import org.apache.spark.ml.feature.StringIndexer 

val indexer = new StringIndexer().setInputCol("productID").setOutputCol("productIndex") 
val indexed = indexer.fit(df).transform(df)
indexed.show()

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

apache spark MLLib: how to build labeled points for string features?

How to serve a Spark MLlib model?

How do I convert from an integer to a string?

How do I convert a String to an integer in Rust?

Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala

How to convert spark DataFrame to RDD mllib LabeledPoints?

Scala Spark : How to create a RDD from a list of string and convert to DataFrame

(Scala) Convert String to Date in Apache Spark

convert string to BigInt dataframe spark scala

How to convert org.apache.spark.sql.ColumnName to string,Decimal type in Spark Scala?

How to convert a mllib matrix to a spark dataframe?

Spark Scala MLlib assignment syntax

How to convert Array[String] to Array[Any] in Spark/Scala

Convert dataframe into Spark mllib matrix in Scala

Scala Spark How can I convert a column array[string] to a string with JSON array in it?

How do I convert array<FloatType> to BinaryType in spark dataframes using Scala

Convert String to DataFrame using Spark/scala

How do I convert string to integer in this case?

How do I convert an integer to string in CakePHP?

Convert string to timestamp for Spark using Scala

Non-integer ids in Spark MLlib ALS

How to get StratifiedKFold in Scala Spark MLLib

Spark mllib Classification using scala

How to convert Spark Dense Vector to String and back to Vector in Scala?

How can I vectorize Tweets using Spark's MLLib?

Spark scala how to convert a Integer column in dataframe to hex uppercase string?

Convert Date From String To Datetime in spark scala

Scala Spark: How to convert an Array[(String,String,String)] to Map[String,Map[String,String]]

Spark-scala: Converting dataframe to mllib Matrix