when using group_by: TypeError: incompatible index of inserted column with frame index

A7DC

I have a df that I've read from sql:

      id  stock_id symbol        date   open   high      low  close  volume
0      1        35   ABSI  2022-09-28   3.06   3.33   3.0400   3.27  217040
1      2        35   ABSI  2022-09-29   3.19   3.19   3.0300   3.12  187309
2      3        35   ABSI  2022-09-30   3.11   3.27   3.0700   3.13  196566
3      4        35   ABSI  2022-10-03   3.16   3.16   2.8600   2.97  310441
4      5        35   ABSI  2022-10-04   3.04   3.37   2.9600   3.27  361082
..   ...       ...    ...         ...    ...    ...      ...    ...     ...
383  384        16    VVI  2022-10-03  31.93  33.85  31.3050  33.60  151357
384  385        16    VVI  2022-10-04  34.41  35.46  34.1900  35.39  105773
385  386        16    VVI  2022-10-05  34.67  35.30  34.5000  34.86   59605
386  387        16    VVI  2022-10-06  34.80  35.14  34.3850  34.50   55323
387  388        16    VVI  2022-10-07  33.99  33.99  33.3409  33.70   45187

[388 rows x 9 columns]

I then try and get the average of the last 5 days and add it to a new column:

df['volume_5_day'] = df.groupby('stock_id')['volume'].rolling(5).mean()

Which gives me the following error:

Traceback (most recent call last):
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11003, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/series.py", line 4672, in reindex
    return super().reindex(**kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4966, in reindex
    return self._reindex_axes(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 4981, in _reindex_axes
    new_index, indexer = ax.reindex(
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 4237, in reindex
    target = self._wrap_reindex_result(target, indexer, preserve_names)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 2520, in _wrap_reindex_result
    target = MultiIndex.from_tuples(target)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 204, in new_meth
    return meth(self_or_cls, *args, **kwargs)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/indexes/multi.py", line 559, in from_tuples
    arrays = list(lib.tuples_to_object_array(tuples).T)
  File "pandas/_libs/lib.pyx", line 2930, in pandas._libs.lib.tuples_to_object_array
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dan/Documents/code/wolfhound/add_indicators_daily.py", line 10, in <module>
    df['volume_10_day'] = df.groupby('stock_id')['volume'].rolling(1).mean()
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3655, in __setitem__
    self._set_item(key, value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3832, in _set_item
    value = self._sanitize_column(value)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 4535, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
  File "/home/dan/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 11010, in _reindex_for_setitem
    raise TypeError(
TypeError: incompatible index of inserted column with frame index

Any ideas what's going wrong here? Previously this worked and now it's throwing an error - and I can't seem to figure out why

ouroboros1

Chain Series.to_numpy to add the values as a np.array and make sure to add sort=False inside df.groupby:

df['volume_5_day'] = df.groupby('stock_id', sort=False)['volume']\
    .rolling(5).mean().to_numpy()

print(df)

      id  stock_id symbol        date  ...      low  close  volume  volume_5_day
0      1        35   ABSI  2022-09-28  ...   3.0400   3.27  217040           NaN
1      2        35   ABSI  2022-09-29  ...   3.0300   3.12  187309           NaN
2      3        35   ABSI  2022-09-30  ...   3.0700   3.13  196566           NaN
3      4        35   ABSI  2022-10-03  ...   2.8600   2.97  310441           NaN
4      5        35   ABSI  2022-10-04  ...   2.9600   3.27  361082      254487.6
383  384        16    VVI  2022-10-03  ...  31.3050  33.60  151357           NaN
384  385        16    VVI  2022-10-04  ...  34.1900  35.39  105773           NaN
385  386        16    VVI  2022-10-05  ...  34.5000  34.86   59605           NaN
386  387        16    VVI  2022-10-06  ...  34.3850  34.50   55323           NaN
387  388        16    VVI  2022-10-07  ...  33.3409  33.70   45187       83449.0

Your initial approach fails, because the df.groupby method that you are using, returns a pd.Series with a different index than your df. E.g.:

print(df.groupby('stock_id')['volume'].rolling(5).mean().index)
MultiIndex([(16, 383),
            (16, 384),
            (16, 385),
            (16, 386),
            (16, 387),
            (35,   0),
            (35,   1),
            (35,   2),
            (35,   3),
            (35,   4)],
           names=['stock_id', None])

So, it is saying it is unable to map this onto:

print(df.index)
Int64Index([0, 1, 2, 3, 4, 383, 384, 385, 386, 387], dtype='int64')

With a np.array you don't have this problem. You could also have used:

df['volume_5_day'] = df.groupby('stock_id', as_index=False)['volume']\
    .rolling(5).mean()['volume']

In this case, you don't need to add sort=False, as it will match correctly on the index values.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

incompatible index of inserted column with frame index with group by and count

incompatible index of inserted column with frame index

Why is there a "TypeError: incompatible index of inserted column with frame index" error in Pandas version 2.0.0 only

How to assign Pandas.Series.str.extractall() result back to original dataset? (TypeError: incompatible index of inserted column with frame index)

Use column index instead of name in group_by

Setting index using commonly occurring column value as index of the data frame

R - refer to column names rather than column index when using lapply with data frame

PostgreSQL multi-column group by not using index when selecting minimum

Using column values of a data frame to index rows of a multiindex data frame

Split a data frame by column using a list of vectors as the column index

Using a column as a column index to extract value from a data frame in R

Group_by with summarize_each changes the original order of index column

Call the column of a data frame by position using group_by function in R

Add a column to a data frame that index the number of occurrences in a group

Python: new data frame from group_by with new index in one line

TypeError: unhashable type: 'dict' when using pandas Multi Index

R predict glm fit on each column in data frame using column index number

Select column and row values with an index in data frame

Adding a column based on index to a data frame in Pandas

compare index and column in data frame with dictionary

Filter pandas multi index data frame based on index column values

Get column names as index in multi index data frame

How to skip the default data frame index, when printing a python data frame in an excel worksheet, using openpyxl

rename index using index and name column

Index when using OR in query

Remove index column when exporting image using dataframe_image

How to stop returning the index and column name when using loc

Group by query not using index

Group By not using index