How to convert strings in a Pandas Dataframe to a list or an array of characters?

Nik

I have a dataframe called data, a column of which contains strings. I want to extract the characters from the strings because my goal is to one-hot encode them and make the usable for classification. The column containing the strings is stored in predictors as follows:

predictors = pd.DataFrame(data, columns = ['Sequence']).to_numpy()

The result upon printing is:

[['DKWL']
 ['FCHN']
 ['KDQP']
 ...
 ['SGHC']
 ['KIGT']
 ['PGPT']]

,while my goal is to get somehing like:

[['D', 'K', 'W', 'L']
 ...
 ['P', 'G', 'P, 'T']]

which from my understanding is a more appropriate form for one-hot encoding.

I have already tried answers provided here How do I convert string characters into a list? or here How to create a list with the characters of a string? to no success.

Specifically, I also tried this:

for row in predictors:
    row = list(row)

but the result is in the same form as predictors, i.e.

 [['DKWL']
 ['FCHN']
 ['KDQP']
 ...
 ['SGHC']
 ['KIGT']
 ['PGPT']]
jezrael

You can convert values to letters by list comprehension with list and then to array if necessary:

predictors = np.array([list(x) for x in data])

Or convert column predictors['Sequence']:

a = np.array([list(x) for x in predictors['Sequence']])
print(a)
[['D' 'K' 'W' 'L']
 ['F' 'C' 'H' 'N']
 ['K' 'D' 'Q' 'P']
 ['S' 'G' 'H' 'C']
 ['K' 'I' 'G' 'T']
 ['P' 'G' 'P' 'T']]

For Series use:

s = predictors['Sequence'].apply(list)
print(s)
0    [D, K, W, L]
1    [F, C, H, N]
2    [K, D, Q, P]
3    [S, G, H, C]
4    [K, I, G, T]
5    [P, G, P, T]
Name: Sequence, dtype: object

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to convert list of model objects to pandas dataframe?

How to convert list to row dataframe with Pandas

how to convert a list into a pandas dataframe

How to convert a list of strings into a numeric numpy array?

How to convert list to pandas DataFrame?

How to convert or decode the Unicode characters in pandas DataFrame?

Convert pandas DataFrame to list of JSON-strings

how to convert a pandas column containing list into dataframe

How to get a list of strings for a pandas dataframe?

How to filter pandas DataFrame with a list of strings

How to convert a defaultdict(list) to Pandas DataFrame

How to convert list of nested dictionary to pandas DataFrame?

How to Convert Pandas Dataframe to Single List

Is there a fast way to convert a Pandas dataframe of columns to a list of strings?

How to convert a pandas dataframe into a list of multiple NamedTuple

How to convert list of strings in .txt file into a dataframe

How can I convert a list of character strings to unique single characters?

How to quickly convert a pandas dataframe to a list of tuples

How to convert a python list to pandas Dataframe in python

How to convert flat items/list to Pandas dataframe

How to Convert a Nested List to a DataFrame in Pandas

convert pandas dataframe of strings to numpy array of int

Convert a list of dictionaries within a dataframe to a list of strings - pandas

How to convert list of values in a series into dataframe pandas

How to convert a random length of list to dataframe in pandas?

Convert List object in a pandas dataframe to numpy array

How to convert array of dictionary of array to pandas dataframe?

How to convert list of enums into array of strings?

Convert string column to array of fixed length strings in pandas dataframe