Handling appending to abstraction of dataframe

Phil Published at Dev

Phil

If I have a "reference" to a dataframe, there appears to be no way to append to it in pandas because neither append nor concat support the inplace=True parameter.

An (overly) simple example:

chosen_df, chosen_row = (candidate_a_df, candidate_a_row) if some_test else (candidate_b_df, candidate_b_row)
chosen_df = chosen_df.append(chosen_row)

Now because Python does something akin to copy reference by value, chosen_df will initially be a reference to whichever candidate dataframe passed some_test.

But the update semantics of pandas mean that the referenced dataframe is not updated by the result of the append function; a new label is created instead. I believe, if there was the possibility to use inplace=True this would work, but it looks like that isn't likely to happen, given discussion here https://github.com/pandas-dev/pandas/issues/14796

It's worth noting that with a simpler example using lists rather than dataframes does work, because the contents of lists are directly mutated by append().

So my question is --- How could an updatable abstraction over N dataframes be achieved in Python?

The idiom is commonplace, useful and trivial in languages that allow references, so I'm guessing I'm missing a Pythonic trick, or thinking about the whole problem with the wrong hat on!

Obviously the pure illustrative example can be resolved by duplicating the append in the body of an if...else and concretely referencing each underlying dataframe in turn. But this isn't scalable to more complex examples and it's a generic solution akin to references I'm looking for.

Any ideas?

Phil

There is a simple way to do this specifically for pandas dataframes - so I'll answer my own question.

chosen_df, chosen_row = (candidate_a_df, candidate_a_row) if some_test else (candidate_b_df, candidate_b_row)
chosen_df.loc[max_idx+1] = chosen_row

The calculation of max_idx very much depends on the structure of chosen_df. In the simplest case when it is a dataframe with a sequential index starting at 0, then you can simply use the length of the index to calculate it.

If chosen_df is non-sequential you'll need call max() on the index column rather than rely on the length of the index.

If chosen_df is a slice or groupby object then you'll need to calculate the index off the max parent dataframe to ensure it's truly the max across all rows.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-8

Comments

0 comments

With Angular, should I bind to "domain command" functions (less abstraction) or GUI action handling functions (more abstraction)?

Python: handling NoneType objects when appending to a list

TOP Ranking

Article

Handling appending to abstraction of dataframe

Handling appending to abstraction of dataframe

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Is this docker-for-mac password dialog legit?

Double spacing in rmarkdown pdf

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages

Bootstrap 5 Static Modal Still Closes when I Click Outside

Can a 32-bit antivirus program protect you from 64-bit threats

Using Response.Redirect with Friendly URLS in ASP.NET

BigQuery - concatenate ignoring NULL

How to how increase/decrease compared to adjacent cell

AirflowException: Celery command failed - The recorded hostname does not match this instance's hostname

@RefreshScope annotated Bean registered through BeanDefinitionRegistryPostProcessor not getting refreshed on Cloud Config changes

MTKView Displaying Wide Gamut P3 Colorspace

Displaying attached image with post how to i get it to display

Python connect to firebird docker database