Python Pandas Dataframe manipulation (Excel File)

BrunoA

I'm fairly new to Python and I have a issue with dataframe manipulation using EXCEL:

This is a snippet of the excel:

Excel Snippet

I was able to drop the duplicates for datetime rows, and get a dataframe with only the datatime rows and another with only the descriptions;

I was able to drop the last row as well:

drop duplicates

What I wanted to do is to 'shift' the column A with dates to column B for the row above.

If both Dataframes were 1-1 its easy, but I have a row (in yellow) that does not have any datetime below.

Anyone has any idea how to do it?

To be something like this> enter image description here

    df_cdms_labour = pd.read_excel(test_cdms,
                               header=None,
                               names=['start_date', 'end_date', 'price','percent',
                                      'comment','rate',  'rate_comment','number_1','markup','markup_number'])

    df_cdms_labour.drop(df_cdms_labour.tail().index,inplace=True)
    df_cdms_labour

    def get_rate_text(df):
    return(df.loc[4,'start_date']     
    )

    def get_rates(df):
    flt = df.loc[:,'start_date'].apply(lambda x: isinstance(x, datetime))
    return(df[flt]
           .drop_duplicates()
           .reset_index(drop=True))
    rates = get_rates(df_cdms_labour)
abokey

Here is a proposition using standard pandas frame's functions :

import pandas as pd
import numpy as np

def flag_delete(df):
    df.insert(0, "temp_col",  df.groupby("Col_A")["Col_A"].transform("count"))
    df.loc[df.pop("temp_col").eq(1), df.columns!="Col_A"] = "DELETE"
    return df

def format_dates(df):
    temp_df = df.select_dtypes('datetime64')
    df[temp_df.columns] = temp_df.apply(lambda x: x.dt.strftime('%d-%b-%Y'))
    return df


df= (
        pd.read_excel("BrunoA.xlsx", header=None, dtype=str)
            .assign(Col_A= lambda x: pd.Series(np.where(~x[0].str.contains("\d{4}-\d{2}-\d{2}", regex=True), x[0], np.NaN)).ffill(),
                    Col_B= lambda x: np.where(x[0].str.contains("\d{4}-\d{2}-\d{2}", regex=True), x[0], np.NaN))
            .drop(columns=0)
            .drop_duplicates()
            .apply(lambda _: pd.to_datetime(_, format='%Y-%m-%d', errors="ignore"))
            .pipe(format_dates)
            .pipe(flag_delete)
            .dropna()
            .rename(columns={"Col_A": -1, "Col_B": 0})
            .sort_index(axis=1)
            .reset_index(drop=True)
     )

display(df)

# Output :

enter image description here

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  3. 3

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  4. 4

    pump.io port in URL

  5. 5

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    Do Idle Snowflake Connections Use Cloud Services Credits?

  9. 9

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

  10. 10

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  11. 11

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  12. 12

    Generate random UUIDv4 with Elm

  13. 13

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  14. 14

    Is it possible to Redo commits removed by GitHub Desktop's Undo on a Mac?

  15. 15

    flutter: dropdown item programmatically unselect problem

  16. 16

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  17. 17

    EXCEL: Find sum of values in one column with criteria from other column

  18. 18

    Pandas - check if dataframe has negative value in any column

  19. 19

    How to use merge windows unallocated space into Ubuntu using GParted?

  20. 20

    Make a B+ Tree concurrent thread safe

  21. 21

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

HotTag

Archive