Dynamically merge lines that share the same key into one

Revolucion for Monica

I have a Dataframe and would like to make another column that combines the columns whose name begins with the same value in Answer and QID.

That is to say, here is an exerpt of the dataframe:

    QID     Category    Text    QType   Question    Answer0     Answer1
0   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I own a car/cars
1   16  Automotive  Access to car   Single  Do you have access to a car?    I lease/ have a company car     I lease/have a company car
2   16  Automotive  Access to car   Single  Do you have access to a car?    I have access to a car/cars     I have access to a car/cars
3   16  Automotive  Access to car   Single  Do you have access to a car?    No, I don’t have access to a car/cars   No, I don't have access to a car
4   16  Automotive  Access to car   Single  Do you have access to a car?    Prefer not to say   Prefer not to say
5   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Audi    Audi
6   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Alfa Romeo  Alfa Romeo
7   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    BMW     BMW
8   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Cadillac    Cadillac
9   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chevrolet   Chevrolet
10  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chrysler    Chrysler
11  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Citroen     Citroen
12  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Daihatsu    Daihatsu
13  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Fiat    Fiat
14  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Ford    Ford
15  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Honda   Honda
16  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Hyundai     Hyundai
...

And I would like to obtain something like this:

    QID     Category    Text    QType   Question    Answer0     Answer1     Answer3     Answer4     Answer5     Answer6     Answer7     Answer8     Answer9     Answer10    Answer11     Answer12     ...      
4   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I lease/ have a company car     I have access to a car/cars     No, I don’t have access to a car/cars   Prefer not to say       
5   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Audi    Alfa Romeo  BMW     Cadillac    Chevrolet   Chrysler    Citroen     ...

Thanks to Rob Raymond I can combine a given/static number of columns whose name begins with the same value in Answer and QID:

df = pd.DataFrame('path/to/file')

# lazy - want first of all attributes except QID and Answer columns
agg = {col:"first" for col in list(df.columns) if col!="QID" and "Answer" not in col}
# get a list of all answers in Answer0 for a QID
agg = {**agg, **{"Answer0":lambda s: list(s)}}

# helper function for row call.  not needed but makes more readable
def ans(r, i):
    return "" if i>=len(r["AnswerT"]) else r["AnswerT"][i]

# split list from aggregation back out into columns using assign
# rename Answer0 to AnserT from aggregation so that it can be referred to.  
# AnswerT drop it when don't want it any more
dfgrouped = df.groupby("QID").agg(agg).reset_index().rename(columns={"Answer0":"AnswerT"}).assign(
    Answer0=lambda dfa: dfa.apply(lambda r: ans(r, 0), axis=1),
    Answer1=lambda dfa: dfa.apply(lambda r: ans(r, 1), axis=1),
    Answer2=lambda dfa: dfa.apply(lambda r: ans(r, 2), axis=1),
    Answer3=lambda dfa: dfa.apply(lambda r: ans(r, 3), axis=1),
    Answer4=lambda dfa: dfa.apply(lambda r: ans(r, 4), axis=1),
    Answer5=lambda dfa: dfa.apply(lambda r: ans(r, 5), axis=1),
    Answer6=lambda dfa: dfa.apply(lambda r: ans(r, 6), axis=1),
).drop("AnswerT", axis=1)

print(dfgrouped.to_string(index=False))

How can I can combine a dynamic number of columns, where these have names which begin with the same value in Answer and QID?

Rob Raymond
  1. generate a list of answers that belong to same QID
  2. expand this AnswerT list out by building a new dataframe dynamically
  3. merge() it back using an inner join

This is dynamic - columns built in dataframe are fully based on list size

data = """    QID     Category    Text    QType   Question    Answer0     Answer1
0   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I own a car/cars
1   16  Automotive  Access to car   Single  Do you have access to a car?    I lease/ have a company car     I lease/have a company car
2   16  Automotive  Access to car   Single  Do you have access to a car?    I have access to a car/cars     I have access to a car/cars
3   16  Automotive  Access to car   Single  Do you have access to a car?    No, I don’t have access to a car/cars   No, I don't have access to a car
4   16  Automotive  Access to car   Single  Do you have access to a car?    Prefer not to say   Prefer not to say
5   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Audi    Audi
6   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Alfa Romeo  Alfa Romeo
7   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    BMW     BMW
8   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Cadillac    Cadillac
9   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chevrolet   Chevrolet
10  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chrysler    Chrysler
11  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Citroen     Citroen
12  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Daihatsu    Daihatsu
13  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Fiat    Fiat
14  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Ford    Ford
15  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Honda   Honda
16  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Hyundai     Hyundai"""

a = [[t.strip() for t in re.split("  ",l) if t!=""]  for l in [re.sub("([0-9]+[ ])*(.*)", r"\2", l) for l in data.split("\n")]]

df = pd.DataFrame(data=a[1:], columns=a[0])

# lazy - want first of all attributes except QID and Answer columns
agg = {col:"first" for col in list(df.columns) if col!="QID" and "Answer" not in col}
# get a list of all answers in Answer0 for a QID
agg = {**agg, **{"Answer0":lambda s: list(s)}}

# helper function for row call.  not needed but makes more readable
def ans(r, i):
    return "" if i>=len(r["AnswerT"]) else r["AnswerT"][i]

# group by QID and construct new column AnswerT which is list of answers
dfgrouped = df.groupby("QID").agg(agg).reset_index().rename(columns={"Answer0":"AnswerT"})#.assign(

# build a new dataframe from AnswerT by building up standard list / dict structure to constructor
# merge on QID and finally drop the temporary AnswerT columns
dfgrouped = dfgrouped.merge(
    pd.DataFrame(
        [{**{"QID":r[0]},**{f"Answer{i}":v for i,v in enumerate(r[1])}} 
         for r in dfgrouped[["QID","AnswerT"]].values.tolist()]
    ), on="QID", how="inner").drop(columns="AnswerT")

print(dfgrouped.to_string(index=False))

output

QID    Category              Text     QType                                          Question           Answer0                      Answer1                      Answer2                                Answer3            Answer4   Answer5  Answer6   Answer7 Answer8 Answer9 Answer10 Answer11
 16  Automotive     Access to car    Single                      Do you have access to a car?  I own a car/cars  I lease/ have a company car  I have access to a car/cars  No, I don’t have access to a car/cars  Prefer not to say       NaN      NaN       NaN     NaN     NaN      NaN      NaN
 17  Automotive  Make of car/cars  Multiple  If you own/lease a car(s), which brand are they?              Audi                   Alfa Romeo                          BMW                               Cadillac          Chevrolet  Chrysler  Citroen  Daihatsu    Fiat    Ford    Honda  Hyundai

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

TOP Ranking

HotTag

Archive