I have data that contains fertility rates for different countries and I'd like to: 1. rename columns 2. Print out only specific countries (not using index but names)
Here I import data from website
df = pd.read_html('https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html')
Then I try to rename columns (from '0' to 'Country' and from '1' to 'TFR'):
df= df.rename(index=str, columns ={'0':'Country', '1':'TFR'})
But I get error message:
df = df.rename(index=str, columns ={'0':'Country', '1':'TFR'})
AttributeError: 'list' object has no attribute 'rename'
This is the way in which I try to look for specific country:
print(df[df['0'].str.contains("Tanzan")])
And I get following error:
TypeError: list indices must be integers or slices, not str
What am I doing wrong? How to sort it out (if it is possible)? Thank you for your help!
First add parameter header=0
for convert first row of page to header of DataFrame and then add [0]
for select first DataFrame from list of DataFrames:
url = 'https://www.cia.gov/library/publications/the-world-factbook/fields/2127.html'
d = {'TOTAL FERTILITY RATE(CHILDREN BORN/WOMAN)':'TFR'}
df = pd.read_html(url, header=0)[0].rename(columns=d)
print (df.head())
Country TFR
0 Afghanistan 5.12 children born/woman (2017 est.)
1 Albania 1.51 children born/woman (2017 est.)
2 Algeria 2.7 children born/woman (2017 est.)
3 American Samoa 2.68 children born/woman (2017 est.)
4 Andorra 1.4 children born/woman (2017 est.)
Last filter by new column name:
print(df[df['Country'].str.contains("Tanzan")])
Country TFR
204 Tanzania 4.77 children born/woman (2017 est.)
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments