我试图在数据框中添加行作为循环的一部分。
该程序循环访问URL并以数据帧格式提取数据
for id in game_ids:
df_team_final = []
df_player_final = []
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
在循环的最后,我用concat合并了客队/主队(和球员)的两个df
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
然后,在循环之外,我已编程为另存为Excel
# #if cant find it, create new spread sheet
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
当我循环浏览多个网页时,我需要随时更新df,显然在当前格式下,我只是用第二个循环覆盖了第一个网址
在循环结束时追加或添加到数据框的最佳方法是什么?
谢谢
由于我们看不到完整的代码,因此我只能在这里给出一个简单的轮廓。
我假设您没有将已抓取的数据附加到某种容器中,因此在下一次迭代后它会丢失。
# empty lists outside of loop to store data
df_team_final = []
df_player_final = []
for id in game_ids:
url = 'https://www.fibalivestats.com/data/' + id + '/data.json'
content = requests.get(url)
data = json.loads(content.content)
# create dataframes that you need
# df_home_team, df_away_team etc
# and append data to containers
team_full = pd.concat([df_home_team, df_away_team])
player_full = pd.concat([df_home_player_merge, df_away_player_merge])
df_team_final.append(team_full)
df_player_final.append(player_full )
现在,您将数据框存储为列表,可以将它们与合并 pandas.concat
# outside of the loop
team_full = pd.concat(df_team_final)
player_full = pd.concat(df_player_final)
并立即保存:
writer = pd.ExcelWriter('Box Data.xlsx', engine='openpyxl')
team_full.to_excel(writer, sheet_name='Team Stats', index=False)
player_full.to_excel(writer, sheet_name='Player Stats', index=False)
writer.save()
writer.close()
从共享的文件中,我看到您在循环内添加了容器:
但是您应该将它们放在循环开始之前:
# initialize them here
df_team_final = []
df_player_final = []
for id in game_ids:
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句