我有2个大型的csv文件(都有大约一百万行具有不同的列名,单个文件中大约有70列)。我想使用python pandas执行左连接(类似sql),并使用结果创建一个新的csv文件。
使用sql与以下查询可以实现相同的操作-
select opportunities.* , data_dump.OpportunityID
from opportunities
left join data_dump on (opportunities.LeadIdentifier=data_dump.LeadId and opportunities.ProductSku=data_dump.ProductName)
我当时想做这样的事情,但这对于这么大的数据来说效率很低,
fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
data_obj = fetched_opportunities.to_dict(orient='records')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')
data_obj2 = fetched_opportunities2.to_dict(orient='records')
for opportunity_detail2 in data_obj:
for opportunity_detail1 in data_obj:
if opportunity_detail2['LeadIdentifier'] == opportunity_detail1['LeadId'] & opportunity_detail2['ProductSku'] == opportunity_detail1['ProductName']:
尝试使用merge
如下功能:
fetched_opportunities = pd.read_csv(path + "/data_dump.csv").fillna('')
fetched_opportunities2 = pd.read_csv(path + "/opportunities.csv").fillna('')
out=fetched_opportunities[["OpportunityID","LeadId","ProductName"]].merge(fetched_opportunities2,how='left',left_on=['LeadId','ProductName'],right_on=['LeadIdentifier','ProductSku']).drop(["LeadId","ProductName"],axis=1)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句