I've got some data that looks like
tweet_id worker_id option
397921751801147392 A1DZLZE63NE1ZI pro-vaccine
397921751801147392 A3UJO2A7THUZTV pro-vaccine
397921751801147392 A3G00Q5JV2BE5G pro-vaccine
558401694862942208 A1G94QON7A9K0N other
558401694862942208 ANMWPCK7TJMZ8 other
What I would like is a single line for each tweet id, and three 6 columns identifying the worker id and the option.
It the desired output is something like
tweet_id worker_id_1 option_1 worker_id_2 option_2 worker_id_3 option 3
397921751801147392 A1DZLZE63NE1ZI pro-vaccine A3UJO2A7THUZTV pro_vaccine A3G00Q5JV2BE5G pro_vaccine
How can I achieve this with pandas?
This is about reshaping data from long to wide format. You can create a grouped count column as id to spread as new column headers and then use pivot_table()
, finally rename the columns by pasting the multi-level together.
df['count'] = df.groupby('tweet_id').cumcount() + 1
df1 = df.pivot_table(values = ['worker_id', 'option'], index = 'tweet_id',
columns = 'count', aggfunc='sum')
df1.columns = [x + "_" + str(y) for x, y in df1.columns]
An alternative option to pivot_table()
is unstack()
:
df['count'] = df.groupby('tweet_id').cumcount() + 1
df1 = df.set_index(['tweet_id', 'count']).unstack(level = 1)
df1.columns = [x + "_" + str(y) for x, y in df1.columns]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments