Hi I'm looking to forward to see if we can read a text file and place them into separate columns based on the first character with pandas.
Below is the text file
$ cat file.txt
AAAAAA
AAAAAA
AAAAAA
AAAAAA
AAAAAA
BBBBBB
BBBBBB
BBBBBB
BBBBBB
BBBBBB
CCCCCC
CCCCCC
CCCCCC
CCCCCC
CCCCCC
DDDDDD
DDDDDD
DDDDDD
DDDDDD
DDDDDD
EEEEEE
EEEEEE
EEEEEE
EEEEEE
EEEEEE
FFFFFF
FFFFFF
FFFFFF
FFFFFF
FFFFFF
COL_1 COL_2 COL_3 COL_4 COL_5 COL_6
AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
Probably not the best way:
# notice the header=None option
df = pd.read_csv('file.txt', header=None)
# extract the first character of the string
df['start'] = df[0].str[0]
# group by the first character of the string
# cumcount gives you the order/rank of the row within its group
df['idx'] = df.groupby('start').cumcount()
# pivot - search StackOverflow for 47152691
df.pivot(index='idx', columns='start', values=0)
Output:
start A B C D E F
idx
0 AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
1 AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
2 AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
3 AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
4 AAAAAA BBBBBB CCCCCC DDDDDD EEEEEE FFFFFF
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments