Pandas' expanding with apply function on multiple columns

Iqigai

Is it possible to use panda's expanding function to calculate the coefficient of a polynomial regression using several columns of the window object?

I have a data frame which has two columns, a predictor and a response. I want to use pandas' expanding() function to calculate the corresponding coefficients of a second order polynomial regression for each expanding pair of series. For each row I would like to get the updated coefficients from the regression applied to all previous rows.

import pandas as pd
import numpy as np

def func1(df):
   # some processing
   return np.polyfit(df['Input'], df['Response'], 2)
   
def func2(x, y):
   # some processing
   return np.polyfit(x, y, 2)

np.random.seed(0)
df = pd.DataFrame(np.random.rand(10, 2).round(2), 
                  columns=['Input', 'Response'])

df[['Coef1', 'Coef2', 'Coef3']] = df.expanding(min_periods=3).apply(func)

I'd like the output to look like this:

>>> df

   Input Response Coef1  Coef2  Coef3
0  0.63  0.23     NaN    NaN    NaN
1  0.45  0.11     NaN    NaN    NaN
2  0.17  0.71     NaN    NaN    NaN
3  0.17  0.32     0.19   0.54   0.50
4  0.65  0.99     0.48   0.23   0.60
5  0.21  0.54     0.71   0.89   0.97
6  0.63  0.73     0.22   0.05   0.80
7  0.54  0.23     0.87   0.01   0.25
8  0.33  0.06     0.18   0.96   0.03
9  0.18  0.72     0.13   0.38   0.13

My different trials has led to two types of error. If I use the function that uses the dataframe as a parameter such as in df[['Coef1', 'Coef2', 'Coef3']] = df.expanding(min_periods=3).apply(func1)), I get KeyError: 'Input'. If I use the second function where I extract the parameters before df['Coef1', 'Coef2', 'Coef3'] = df.expanding(min_periods=3).apply(lambda x: func2(x['Input'], x['Output'])), I get DataError: No numeric types to aggregate However, If I try for instance df.expanding().cov(pairwise=True) it shows that calculation can be performed on the different columns of the object returned by expanding. There's a similar question here: Apply expanding function on dataframe. However, the solution consisting in calling expanding() in the function does not seem to apply in this case. I would appreciate any pointers or suggestion.

I found a package that does that with numpy: 3jane.github.io/numpy_ext so it inspired me to do it manually:

def func_np(df):
    length = len(df)
    if length == 1:
        return [[0], [0], [0]]

    coef1, coef2, coef3 = [], [], []

    x = df['A'].to_numpy()  # This is the predictor column
    y = df['B'].to_numpy()  # This is the response column

    for step in range(1, length + 1):
        weights = np.polyfit(x[: step], y[: step], 2)  # 2 is the polynomial's order
        coef1.append(weights[0])
        coef2.append(weights[1])
        coef3.append(weights[2])
    # Note that coef1, coef2, coef3 correspond to the polynomial terms from highest to lowest

    # It is easier to return a data frame, so that we can reassign the result to the initial one
    return pd.DataFrame({'Coef1': coef1, 'Coef2': coef2, 'Coef3': coef3})

I wanted to do it with Numba to speed up the execution but it does not recognize the np.polyfit function. Also I have not found a neat way to assign back the results to the initial data frame. That is why I am still interested in seeing a simple and more "pythonic" solution with expanding()

Pierre D

I suspect what you are looking for is the new df.expanding(..., method='table') in the upcoming pandas=1.3 (see "Other enhancements").

In the meantime, you can do it "by hand", using a loop (sorry):

xy = df.values
df['c1 c2 c3'.split()] = np.stack([
    func2(*xy[:n].T) if n >= 3 else np.empty(3)*np.nan
    for n in range(xy.shape[0])
])

Example:

np.random.seed(0)
df = pd.DataFrame(np.random.rand(10, 2).round(2), 
                  columns=['Input', 'Response'])

# the code above, then

>>> df
   Input  Response         c1         c2        c3
0   0.55      0.72        NaN        NaN       NaN
1   0.60      0.54        NaN        NaN       NaN
2   0.42      0.65        NaN        NaN       NaN
3   0.44      0.89 -22.991453  22.840171 -4.887179
4   0.96      0.38 -29.759096  29.213620 -6.298277
5   0.79      0.53   0.454036  -1.369701  1.272156
6   0.57      0.93   0.122450  -0.874260  1.113586
7   0.07      0.09  -1.010312   0.623331  0.696287
8   0.02      0.83  -2.687387   2.995143 -0.079214
9   0.78      0.87  -1.425030   1.294210  0.442684

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Apply apply() function to multiple columns in pandas?

Apply custom function over multiple columns in pandas

Apply function to multiple pandas columns with Args

pandas groupby apply the same function to multiple columns

Apply Python function to multiple Pandas columns

Pandas update multiple columns using apply function

Apply "list" function on multiple columns pandas

Pandas apply function on dataframe over multiple columns

Pandas apply tuple unpack function on multiple columns

How to apply a function for multiple columns pandas?

How to apply group by function of multiple columns in pandas

Update Multiple Columns using Pandas Apply Function

How to Apply a function to multiple multiindex columns in Pandas?

Pandas apply rowwise function on multiple columns

pandas apply function to multiple columns and create multiple columns to store results

Pandas DataFrame apply function to multiple columns and output multiple columns

How to apply a function to multiple columns to create multiple columns in Pandas?

Pandas expanding dataframe returning multiple values on apply

Apply Python function to one pandas column and apply the output to multiple columns

pandas apply function to multiple columns with condition and create new columns

Apply (in Pandas) to Multiple Columns

Expanding mean grouped by multiple columns in pandas

Pandas: apply function that return multiple new columns over Pandas DataFrame

Pandas apply function to each row by calculating multiple columns

Pandas apply function by group returning multiple new columns

How to create multiple columns using pandas apply function

How to apply lambda function on multiple columns using pandas

pandas apply User defined function to grouped dataframe on multiple columns

Apply pandas function to column to create multiple new columns error