I have a dataframe as:
df =
A B C D E
--- --- --- --- ---
0 J969 I279 D65 -1 -1
1 C56 A419 I279 C221 -1
2 R068 D65 N009 -1 -1
3 C56 T107 J969 R068 N009
I need to be able to encode the labels in all of the columns. If a label matches another label (e.g column A row 0 and column C row 3) anywhere in the dataframe they must be encoded to the same number. As such:
A B C D E
--- --- --- --- ---
0 0 3 7 -1 -1
1 1 2 6 15 -1
2 4 7 10 -1 -1
3 1 8 0 4 10
I have tried pandas.factorize
, pandas.Categorize
, Scikit-learn LabelEncoder
from examples on stackoverflow, but nothing seems to work.
Thanks.
You can use:
m = {d: i for i, d in enumerate(pd.unique(df.as_matrix().flatten()))}
new_df = pd.DataFrame({c: df[c].map(m) for c in df.columns})
m
is a map mapping the unique elements in the DataFrame to indices, according to some arbitrary order.Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments