Drop columns and rows with certain percentage of 0's pandas

user2998764

I have 2-dimensional data (Column-Cell1,Cell2.., Row-Gene1,Gene2..) in which I want to delete rows with 99% zeroes and with the resultant matrix drop columns with 99% zeroes in them. I have written the following code to do the same, however since the matrix is very large, it is taking a long time to run. Is there a better way to approach this issue?

import pandas as pd
import numpy as np

def read_in(matrix_file):
    matrix_df=pd.read_csv(matrix_file,index_col=0)
    return(matrix_df)

def genes_less_exp(matrix_df):
    num_columns=matrix_df.shape[1]
    for index, row in matrix_df.iterrows():
        zero_els=np.count_nonzero(row.values==0)
        gene_per_zero=(float(zero_els)/float(num_columns))*100
        if gene_per_zero >= 99:
            matrix_df.drop([index],axis=0,inplace=True)
    return(matrix_df)

def cells_less_exp(matrix_df):
    num_rows=matrix_df.shape[0]
    for label,content in matrix_df.iteritems():
        zero_els=np.count_nonzero(content.values==0)
        cells_per_zero=(float(zero_els)/float(num_rows))*100
        if cells_per_zero >= 99:
            matrix_df.drop(label,axis=1,inplace=True)
    return(matrix_df)


if __name__ == "__main__":
    matrix_df=read_in("Data/big-matrix.csv")
    print("original:"+str(matrix_df.shape))
    filtered_genes=genes_less_exp(matrix_df)
    print("filtered_genes:"+str(filtered_genes.shape))
    filtered_cells=cells_less_exp(filtered_genes)
    print("filtered_cells:"+str(filtered_cells.shape))
    filtered_cells.to_csv("abi.99.percent.filtered.csv", sep=',')
Christian Sloper

Its easier if you reframe your question to "keep those with less than 99% 0s".

def drop_almost_zero(df, percentage):
    row_cut_off = int(percentage/100*len(df.columns))
    df = df[(df==0).sum(axis='columns') <= row_cut_off]

    column_cut_off = int(percentage/100*len(df)) 
    b = (df == 0).sum(axis='rows')
    df = df[ b[ b <= column_cut_off].index.values ]

    return df


#test
size = 50
percentage = 90

rows = size//2
columns = size

a = np.random.choice(2, size=(rows, columns), p=[(1-0.1), 0.1]) 
df = pd.DataFrame(a, columns=[f'c{i}' for i in range(size)])

df = drop_almost_zero(df,percentage)

assert (df == 0).sum(axis='rows').max() <= percentage/100*rows
assert (df == 0).sum(axis='columns').max() <=  percentage/100*columns

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Pandas - drop all rows with 0 in at least two columns

Drop rows of a Pandas dataframe if the value of range columns 0

Drop Pandas columns with a high percentage of NaN values

Format certain floating dataframe columns into percentage in pandas

Format certain floating dataframe columns into percentage in pandas

Drop rows which are duplicates regarding certain columns

Drop rows WHERE date is a certain condition Pandas

Drop multiple columns that end with certain string in Pandas

Pandas, drop duplicates but merge certain columns

Pandas's drop rows not working

Drop rows where a subset of columns are empty in Pandas

Repair or drop rows with extra columns in pandas

Drop columns if rows contain a specific value in Pandas

Drop percentage of a dataframe [pandas]

Get percentage of rows (strings) that fulfil a certain condition in a pandas data frame

Select rows if the value for certain columns are the same in pandas?

How to match rows based on certain columns in pandas?

Color only certain rows and columns of a Pandas DataFrame

How to drop rows with empty string values in certain columns?

In a certain range of columns, drop all rows with only NaN values

Drop rows if any of multiple columns have duplicates rows in Pandas

pandas drop rows with duplicates in some columns relative to other columns

Finding percentage proportion using specific columns and rows of pandas dataframe

Pandas dataframes: Adding columns with percentage for rows starting with the same string

How to drop rows (data) in pandas dataframe with respect to certain group/data?

Pandas - how to drop rows those are top n% in certain column value?

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Drop rows containing a certain numeric pattern (int64) in pandas

Pandas dataframe drop rows which store certain number of zeros in it

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    pump.io port in URL

  3. 3

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  4. 4

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  5. 5

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  8. 8

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

  9. 9

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  10. 10

    How to remove the extra space from right in a webview?

  11. 11

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  12. 12

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  13. 13

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  14. 14

    java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

  15. 15

    How to use merge windows unallocated space into Ubuntu using GParted?

  16. 16

    flutter: dropdown item programmatically unselect problem

  17. 17

    Pandas - check if dataframe has negative value in any column

  18. 18

    Nuget add packages gives access denied errors

  19. 19

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  20. 20

    Generate random UUIDv4 with Elm

  21. 21

    Client secret not provided in request error with Keycloak

HotTag

Archive