incompatible index of inserted column with frame index with group by and count

Eliza Romanski

I have data that looks like this:

CHROM    POS REF ALT  ...  is_sever_int is_sever_str is_sever_f encoding_str
0  chr1  14907   A   G  ...             1            1        one          one
1  chr1  14930   A   G  ...             1            1        one          one

These are the columns that I'm interested to perform calculations on (example) :

is_severe       snp _id            encoding 

       1             1                   one
       1             1                    two
       0             1                    one
       1             2                     two
       0             2                      two
       0             2                       one

what I want to do is to count for each snp_id and severe_id how many ones and twos are in the encoding column :

snp_id        is_svere       encoding_one         encoding_two
  1               1             1                        1
  1               0             1                        0
  2                1             0                       1
  2                0            1                         1

I tried this :

df.groupby(["snp_id","is_sever_f","encoding_str"])["encoding_str"].count()

but it gave the error :

 incompatible index of inserted column with frame index

then i tried this:

df["count"]=df.groupby(["snp_id","is_sever_f","encoding_str"],as_index=False)["encoding_str"].count()

and it returned:

Expected a 1D array, got an array with shape (2532831, 3)

how can i fix this? thank you:)

Ynjxsjmh

Let's try groupby with whole columns and get size of each group then unstack the encoding index.

out = (df.groupby(['is_severe', 'snp_id', 'encoding']).size()
       .unstack(fill_value=0)
       .add_prefix('encoding_')
       .reset_index())
print(out)

encoding  is_severe  snp_id  encoding_one  encoding_two
0                 0       1             1             0
1                 0       2             1             1
2                 1       1             1             1
3                 1       2             0             1

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

incompatible index of inserted column with frame index

when using group_by: TypeError: incompatible index of inserted column with frame index

Why is there a "TypeError: incompatible index of inserted column with frame index" error in Pandas version 2.0.0 only

How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

Data frame - adding index with count of values under each column

Add a column to a data frame that index the number of occurrences in a group

count and index characters in a column

Pandas group and sort by index count

How to group by column values into index?

How to find last index in Pandas Data Frame row and count backwards using column information?

Group index values based on other index values in pandas Data Frame

Select column and row values with an index in data frame

Adding a column based on index to a data frame in Pandas

compare index and column in data frame with dictionary

Filter pandas multi index data frame based on index column values

Setting index using commonly occurring column value as index of the data frame

Get column names as index in multi index data frame

Using column values of a data frame to index rows of a multiindex data frame

Use column index instead of name in group_by

How to increment index per column / group

Group by column value and set it as index in Pandas

how to get index of first occurence of group in a column?

Pandas. Group by index and apply max for column

Group one column of dataframe by variable index

Pandas group dataframe by column and index adjacency

Updating column according to index within group

Create an index column for array of objects by group

Split a data frame by column using a list of vectors as the column index

Using a column as a column index to extract value from a data frame in R