我需要根据2个条件创建一个新列,即人口超过50,000的国家/地区和降序恢复率。
df1['Recovery Rate'] = df1.apply(lambda x: (x['Total Recovered']/x['Total Infected']), axis = 1)
df1['Populated Country'] = df1.apply(if lambda row: row.Country == Country and (row: row.Population 2020 (in thousands) >= 50000), axis = 1)
df1.sort_values(['Recovery Rate'], ascending = [False])
print(df1[['Populated Country','Recovery Rate']].head(10))
但是我在新列代码中遇到以下错误。
File "<ipython-input-25-ab35558abd61>", line 4
df1['Populated Country'] = df1.apply(if lambda row: row.Country == Country and (row: row.Population 2020 (in thousands) >= 50000), axis = 1)
^
SyntaxError: invalid syntax
>Country Daily Tests Daily Tests per 100000 people Pop density per sq. km Urban Population (%) Start Date of Quarantine/Lockdown Start Date of Schools Closure Start Date of Public Place Restrictions Hospital Beds per 1000 people M-to-F Gender Ratio at Birth ... Death rate from lung diseases per 100k people for male Median Age GDP 2018 Crime Index Population 2020 (in thousands) Smokers in Population (%) % of Females in Population Total Infected Total Deaths Total Recovered
>0 Albania NaN NaN 105 63 NaN NaN NaN 2.9 1.08 ... 17.04 32.9 1.510250e+10 40.02 2877.797 28.7 49.063095 949 31 742
>1 Algeria NaN NaN 18 73 NaN NaN NaN 1.9 1.05 ... 12.81 28.1 1.737580e+11 54.41 43851.044 15.6 49.484268 7377 561 3746
>2 Argentina NaN NaN 17 93 3/20/2020 NaN NaN 5.0 1.05 ... 42.59 31.7 5.198720e+11 62.96 45195.774 21.8 51.237348 8809 393 2872
>3 Armenia 694.0 2.342029 104 63 NaN NaN NaN 4.2 1.13 ... 35.99 35.1 1.243309e+10 20.78 2963.243 24.1 52.956577 5041 64 2164
>4 Australia 31635.0 12.405939 3 86 NaN NaN 3/23/2020 3.8 1.06 ... 22.16 38.7 1.433900e+12 42.70 25499.884 14.7 50.199623 7072 100 6431
这是数据-https://raw.githubusercontent.com/ptw2/PRGA/main/covid19_by_country.csv
这是我应该得到的结果
> Country Recovery Rate
>17 China 0.943459
>87 Thailand 0.941972
>47 South Korea 0.906031
>32 Germany 0.875705
>95 Vietnam 0.811728
有人可以帮忙吗?
在这种情况下,定义一个函数进行计算然后在lambda语句中应用该函数会更干净:
def compute_rr(row):
if row['Population 2020 (in thousands)'] >= 50000:
return row['Total Recovered'] / row['Total Infected']
df1['Recovery Rate'] = df1.apply(lambda row: compute_rr(row), axis = 1)
df1 = df1.sort_values(['Recovery Rate'], ascending = [False])
print(df1[['Country','Total Recovered','Total Infected','Recovery Rate']].head())
#Output:
Country Total Recovered Total Infected Recovery Rate
17 China 79310 84063 0.943459
87 Thailand 2857 3033 0.941972
47 South Korea 10066 11110 0.906031
32 Germany 155681 177778 0.875705
95 Vietnam 263 324 0.811728
如果您确实想更改数据框以消除人口少于5万的国家/地区,只需将以下行添加到上一个代码的底部即可。它会删除“恢复率”列中所有包含NaN的行。
df1 = df1[df1['Recovery Rate'].notna()]
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句