合并两个数据框

Joho 发表于 Dev

乔霍

我尝试通过将第二个 df 的第一行添加到第一个 df 的第一行来合并两个数据帧。我也尝试将它们连接起来，但 eiter 失败了。数据的格式是

1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,

输出的预期格式应该是

1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,0.000,,,,,,,
2,3,N0129,Position,62.2,0.376,62.238,0.136,***---**,76.1,-36.000,0.300,-36.057,,,,
3,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,0.000,,,,,,,

我已经将上面的数据帧分成了两帧。第一个只包含奇数索引，第二个包含偶数索引。我现在的问题是，通过将第二个 df 的第一行添加到第一个 df 的第一行来合并/连接两个帧。我已经尝试了一些合并/连接的方法，但都失败了。所有的打印功能都不是必需的，我使用它们只是为了在控制台中快速浏览。我觉得最舒服的代码是：

os.chdir(output)
csv_files = os.listdir('.')
for csv_file in (csv_files):
        if csv_file.endswith(".asc.csv"):
            df = pd.read_csv(csv_file)
            keep_col = ['Messpunkt', 'Zeichnungspunkt', 'Eigenschaft', 'Position', 'Sollmass','Toleranz','Abweichung','Lage']
            new_df = df[keep_col]
            new_df = new_df[~new_df['Messpunkt'].isin(['**Teil'])]
            new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Oben'])]
            new_df = new_df[~new_df['Messpunkt'].isin(['**KS-Unten'])]
            new_df = new_df[~new_df['Messpunkt'].isin(['**N'])]
            print(new_df)   
            new_df.to_csv(output+csv_file)     
            
            df1 = new_df[new_df.index % 2 ==1]
            df2 = new_df[new_df.index % 2 ==0]
            df1.reset_index()
            df2.reset_index()
            print (df1)
            print (df2)
            merge_df = pd.concat([df1,df2], axis=1)
            print (merge_df)
            merge_df.to_csv(output+csv_file)

我非常感谢一些帮助。

使用此代码，输出为：

1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--,,,,,,,,
2,,,,,,,,,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---,,,,,,,,
4,,,,,,,,,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---,,,,,,,,
6,,,,,,,,,0.000,,,,,,,

简单

当我reset_index()在两个 DataFrame 中使用相同的索引时，我得到了预期的结果。

它可能还drop=True需要跳过索引作为新列

pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)

最小的工作示例。

我io只用来模拟内存中的文件。

text = '''1,3,N0128,Durchm.,5.0,0.1,5.0760000000000005,0.076,-----****--
2,0.000,,,,,,,
3,3,N0129,Position,62.2,0.376,62.238,0.136,***---
4,76.1,-36.000,0.300,-36.057,,,,
5,2,N0130,Durchm.,5.0,0.1,5.067,0.067,-----***---
6,0.000,,,,,,,'''

import pandas as pd
import io

pd.options.display.max_columns = 20  # to display all columns

df = pd.read_csv(io.StringIO(text), header=None, index_col=0)

#print(df)

df1 = df[df.index % 2 == 1] # .reset_index(drop=True)
df2 = df[df.index % 2 == 0] # .reset_index(drop=True)

#print(df1)
#print(df2)

merge_df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)

print(merge_df)

结果：

     1      2         3     4      5       6      7            8     1        2      3       4   5   6   7    8
0  3.0  N0128   Durchm.   5.0  0.100   5.076  0.076  -----****--   0.0      NaN    NaN     NaN NaN NaN NaN  NaN
1  3.0  N0129  Position  62.2  0.376  62.238  0.136       ***---  76.1  -36.000  0.300 -36.057 NaN NaN NaN  NaN
2  2.0  N0130   Durchm.   5.0  0.100   5.067  0.067  -----***---   0.0      NaN    NaN     NaN NaN NaN NaN  NaN

编辑：

它可能需要