As i know, MLlib supports only interger.
Then i want to convert string to interger in scala. For example, I have many reviewerID, productID in txtfile.
reviewerID productID
03905X0912 ZXASQWZXAS
0325935ODD PDLFMBKGMS
...
StringIndexer
is the solution. It will fit into the ML pipeline with an estimator and transformer. Essentially once you set the input column, it computes the frequency of each category and numbers them starting 0. You can add IndexToString
at the end of pipeline to replace by original strings if required.
You can look at ML Documentation for "Estimating, transforming and selecting features" for further details.
In your case it will go like:
import org.apache.spark.ml.feature.StringIndexer
val indexer = new StringIndexer().setInputCol("productID").setOutputCol("productIndex")
val indexed = indexer.fit(df).transform(df)
indexed.show()
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments