pandas read a text file and split the names into columns based on the first character

krock1516

Hi I'm looking to forward to see if we can read a text file and place them into separate columns based on the first character with pandas.

Below is the text file

$ cat file.txt
AAAAAA
AAAAAA
AAAAAA
AAAAAA
AAAAAA
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
CCCCCC
CCCCCC
CCCCCC
CCCCCC
CCCCCC
DDDDDD
DDDDDD
DDDDDD
DDDDDD
DDDDDD
EEEEEE
EEEEEE
EEEEEE
EEEEEE
EEEEEE
FFFFFF
FFFFFF
FFFFFF
FFFFFF
FFFFFF

Desired:

COL_1   COL_2   COL_3   COL_4   COL_5   COL_6
AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
Quang Hoang

Probably not the best way:

# notice the header=None option
df = pd.read_csv('file.txt', header=None)

# extract the first character of the string
df['start'] = df[0].str[0]

# group by the first character of the string
# cumcount gives you the order/rank of the row within its group
df['idx'] = df.groupby('start').cumcount()

# pivot - search StackOverflow for 47152691
df.pivot(index='idx', columns='start', values=0)

Output:

start       A       B       C       D       E       F
idx                                                  
0      AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
1      AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
2      AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
3      AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF
4      AAAAAA  BBBBBB  CCCCCC  DDDDDD  EEEEEE  FFFFFF

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Read csv file and split in columns keeping column names. Pandas

Python read text file and split on control character

Split text file to rows and columns based on delimeters

How to read a text file using pandas in Python and split each character/letter of the data frame

Split a text(with names and values) column into multiple columns in Pandas DataFrame

How to split file based on first character in Linux shell

Split dataframe character into columns based on value of character

How to efficiently read the first character from each line of a text file?

Rename tgz files into another names based on a two columns in text file

split character columns and get names of field in string

Read each line of a text file and get the first split string

A way to split a text file into arbitrary blocks based on first column?

I want to read a text file and split it based on column value

Use pandas to read in text file with row as column names

Sort columns by their single character names with vowels first

Python/Pandas - split text into columns by delimiter ; and create a csv file

Separate rows into columns using the first split character

Split list based on first character - Python

Split text file into multiple text files based on column values with sequential names in unix

How to read Text file with multi delimiter and arrange the columns accordingly in pandas

Pandas: Read a text file and add values to multiple columns in a dataframe

split columns in text file into array

Python : read text file character by character in loop

Split text based on a character in Excel (with functions or natively)

How to Split Columns in R based on First Space

Split a text file based on title

Find file names based on multiple occurrences of a character

Split dataset based on file names in pytorch Dataset

text to columns: split at the first number in the value