I have dataframes I want to horizontally concatenate while ignoring the index.
I know that for arithmetic operations, ignoring the index can lead to a substantial speedup if you use the numpy array .values
instead of the pandas Series. Is it possible to horizontally concatenate or merge pandas dataframes whilst ignoring the index? (To my dismay, ignore_index=True does something else.) And if so, does it give a speed gain?
import pandas as pd
df1 = pd.Series(range(10)).to_frame()
df2 = pd.Series(range(10), index=range(10, 20)).to_frame()
pd.concat([df1, df2], axis=1)
# 0 0
# 0 0.0 NaN
# 1 1.0 NaN
# 2 2.0 NaN
# 3 3.0 NaN
# 4 4.0 NaN
# 5 5.0 NaN
# 6 6.0 NaN
# 7 7.0 NaN
# 8 8.0 NaN
# 9 9.0 NaN
# 10 NaN 0.0
# 11 NaN 1.0
# 12 NaN 2.0
# 13 NaN 3.0
# 14 NaN 4.0
# 15 NaN 5.0
# 16 NaN 6.0
# 17 NaN 7.0
# 18 NaN 8.0
# 19 NaN 9.0
I know I can get the result I want by resetting the index of df2, but I wonder whether there is a faster (perhaps numpy method) to do this?
A pure numpy method would be to use np.hstack
:
In[33]:
np.hstack([df1,df2])
Out[33]:
array([[0, 0],
[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5],
[6, 6],
[7, 7],
[8, 8],
[9, 9]], dtype=int64)
this can be easily converted to a df by passing this as the data arg to the DataFrame
ctor:
In[34]:
pd.DataFrame(np.hstack([df1,df2]))
Out[34]:
0 1
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
with respect to whether the data is contiguous, the individual columns will be treated as separate arrays as it's a dict of Series
essentially, as you're passing numpy arrays there is no allocation of memory and copying needed here for simple and homogeneous dtype so it should be fast.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments