I have data that looks like this:
CHROM POS REF ALT ... is_sever_int is_sever_str is_sever_f encoding_str
0 chr1 14907 A G ... 1 1 one one
1 chr1 14930 A G ... 1 1 one one
These are the columns that I'm interested to perform calculations on (example) :
is_severe snp _id encoding
1 1 one
1 1 two
0 1 one
1 2 two
0 2 two
0 2 one
what I want to do is to count for each snp_id and severe_id how many ones and twos are in the encoding column :
snp_id is_svere encoding_one encoding_two
1 1 1 1
1 0 1 0
2 1 0 1
2 0 1 1
I tried this :
df.groupby(["snp_id","is_sever_f","encoding_str"])["encoding_str"].count()
but it gave the error :
incompatible index of inserted column with frame index
then i tried this:
df["count"]=df.groupby(["snp_id","is_sever_f","encoding_str"],as_index=False)["encoding_str"].count()
and it returned:
Expected a 1D array, got an array with shape (2532831, 3)
how can i fix this? thank you:)
Let's try groupby
with whole columns and get size of each group then unstack
the encoding
index.
out = (df.groupby(['is_severe', 'snp_id', 'encoding']).size()
.unstack(fill_value=0)
.add_prefix('encoding_')
.reset_index())
print(out)
encoding is_severe snp_id encoding_one encoding_two
0 0 1 1 0
1 0 2 1 1
2 1 1 1 1
3 1 2 0 1
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments