convert same labels over multiple columns in a data frame to numbers

gedman

I have a dataframe as:

df =
    A     B     C    D      E
   ---   ---   ---  ---    ---
0  J969  I279  D65   -1    -1
1  C56   A419  I279  C221  -1
2  R068  D65   N009  -1    -1
3  C56  T107  J969  R068  N009

I need to be able to encode the labels in all of the columns. If a label matches another label (e.g column A row 0 and column C row 3) anywhere in the dataframe they must be encoded to the same number. As such:

    A     B     C    D      E
   ---   ---   ---  ---    ---
0   0     3     7    -1     -1
1   1     2     6    15     -1
2   4     7     10   -1     -1
3   1     8     0     4     10

I have tried pandas.factorize, pandas.Categorize, Scikit-learn LabelEncoder from examples on stackoverflow, but nothing seems to work.

Thanks.

Ami Tavory

You can use:

m = {d: i for i, d in enumerate(pd.unique(df.as_matrix().flatten()))}
new_df = pd.DataFrame({c: df[c].map(m) for c in df.columns})
  • m is a map mapping the unique elements in the DataFrame to indices, according to some arbitrary order.
  • The dictionary comprehensio goes over columns, and translates each one according to the map.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Convert string into float for multiple columns in data frame

Convert pandas data frame (with multiple columns) to series

convert multiple data frame columns to numeric vector

Replace multiple values over multiple columns of a data frame in R

Look up value in data frame stored over multiple columns

R aggregate rows of a data frame over multiple columns with different operators

rstatix::anova_test() over multiple columns of data frame

Add multiple columns to multiple data tables (frame) at the same time

R checking if the same numbers occur in multiple rows of a data frame

Multiple columns data frame

how do you convert data frame to json with multiple columns in R

Convert matrix column in R data frame to multiple columns

Convert a list to data.frame with multiple columns in R

Convert multiple columns of a data frame from string to numeric in R

Convert all data frame's columns to multiple vectors in R

Multiple data frame columns plotted in the same bar without overlapping

Perform Chi Square Tests on Multiple Columns from the Same Data Frame

in R how to apply a value of a column to multiple columns in the same data frame

reshape data frame with multiple columns but same column names

Sort a data frame in R by multiple columns at the same time

Append same values to multiple columns of a data.frame

How to create a new column in a data frame depending on multiple criteria from multiple columns from the same data frame

R: convert nested JSON in a data frame column to addtional columns in the same data frame

R: Assign variable labels of data frame columns

Convert string to columns - Data Frame

convert data frame of "missed" numbers into data frame of numbers "hit"

Summarising data frame by multiple columns

Convert data frame of N columns into a data frame of two 'stacked' columns

iterate over certain columns in data frame