我从试图刮数据elections.in。有三个表相同类别。以下是网站的HTML
<h3 class="blmap">17th General (Lok Sabha) Election Results 2019 – State Wise</h3>
<table class="tableizer-table">
<thead><tr class="tableizer-firstrow"><th>State</th><th>Party</th><th>Number of Seats</th></tr></thead><tbody>
<tr><td>Andaman & Nicobar Islands</td><td>Indian National Congress</td><td>1</td></tr>
<tr><td>Andhra Pradesh</td><td>Yuvajana Sramika Rythu Congress Party</td><td>22</td></tr>
<tr><td>Andhra Pradesh</td><td>Telugu Desam</td><td>3</td></tr>
<tr><td>Arunachal Pradesh</td><td>Bharatiya Janata Party</td><td>2</td></tr>
<tr><td>Assam</td><td>Bharatiya Janata Party</td><td>9</td></tr>
<tr><td>Assam</td><td>Indian National Congress</td><td>3</td></tr>
<tr><td>Assam</td><td>All India United Democratic Front</td><td>1</td></tr>
我能够获取数据,看起来像这样,
StatePartyNumber of Seats
Andaman & Nicobar IslandsIndian National Congress1
Andhra PradeshYuvajana Sramika Rythu Congress Party22
Andhra PradeshTelugu Desam3
Arunachal PradeshBharatiya Janata Party2
AssamBharatiya Janata Party9
AssamIndian National Congress3
AssamAll India United Democratic Front1
AssamIndependent1
BiharBharatiya Janata Party17
我想要下面的输出,
State,Party,Number of Seats
Andaman & Nicobar Islands, Indian National Congress,1
Andhra Pradesh,Yuvajana Sramika Rythu Congress Party,22
或作为清单。
这行代码为我提供了以上输出
soup.find_all('table')[1].get_text()
这是我的代码Github
请建议如何实现
谢谢。
如果您想解析<table>
标签,请选择pandas .read_html()
。它为您完成了大部分繁重的工作。它将返回数据帧列表。您引用的表是第三个表(因此索引位置2)
import pandas as pd
url="http://www.elections.in/"
tables = pd.read_html(url)
输出:
print (tables[2].to_string())
State Party Number of Seats
0 Andaman & Nicobar Islands Indian National Congress 1
1 Andhra Pradesh Yuvajana Sramika Rythu Congress Party 22
2 Andhra Pradesh Telugu Desam 3
3 Arunachal Pradesh Bharatiya Janata Party 2
4 Assam Bharatiya Janata Party 9
5 Assam Indian National Congress 3
6 Assam All India United Democratic Front 1
7 Assam Independent 1
8 Bihar Bharatiya Janata Party 17
9 Bihar Janata Dal (United) 16
10 Bihar Lok Jan Shakti Party 6
11 Bihar Indian National Congress 1
12 Chandigarh Bharatiya Janata Party 1
13 Chhattisgarh Bharatiya Janata Party 9
14 Chhattisgarh Indian National Congress 2
15 Dadra & Nagar Haveli Independent 1
16 Daman & Diu Bharatiya Janata Party 1
17 Goa Bharatiya Janata Party 1
18 Goa Indian National Congress 1
19 Gujarat Bharatiya Janata Party 26
20 Haryana Bharatiya Janata Party 10
21 Himachal Pradesh Bharatiya Janata Party 4
22 Jammu & Kashmir Bharatiya Janata Party 3
23 Jammu & Kashmir Jammu & Kashmir National Conference 3
24 Jharkhand Bharatiya Janata Party 11
25 Jharkhand Ajsu Party 1
26 Jharkhand Indian National Congress 1
27 Jharkhand Jharkhand Mukti Morcha 1
28 Karnataka Bharatiya Janata Party 25
29 Karnataka Independent 1
30 Karnataka Indian National Congress 1
31 Karnataka Janata Dal (Secular) 1
32 Kerala Indian National Congress 15
33 Kerala Indian Union Muslim League 2
34 Kerala Communist Party Of India (Marxist) 1
35 Kerala Kerala Congress (M) 1
36 Kerala Revolutionary Socialist Party 1
37 Lakshadweep Nationalist Congress Party 1
38 Madhya Pradesh Bharatiya Janata Party 28
39 Madhya Pradesh Indian National Congress 1
40 Maharashtra Bharatiya Janata Party 23
41 Maharashtra Shivsena 18
42 Maharashtra Nationalist Congress Party 4
43 Maharashtra All India Majlis-E-Ittehadul Muslimeen 1
44 Maharashtra Independent 1
45 Maharashtra Indian National Congress 1
46 Manipur Bharatiya Janata Party 1
47 Manipur Naga Peoples Front 1
48 Meghalaya Indian National Congress 1
49 Meghalaya National People'S Party 1
50 Mizoram Mizo National Front 1
51 Nagaland Nationalist Democratic Progressive Party 1
52 NCT OF Delhi Bharatiya Janata Party 7
53 Odisha Biju Janata Dal 12
54 Odisha Bharatiya Janata Party 8
55 Odisha Indian National Congress 1
56 Puducherry Indian National Congress 1
57 Punjab Indian National Congress 8
58 Punjab Bharatiya Janata Party 2
59 Punjab Shiromani Akali Dal 2
60 Punjab Aam Aadmi Party 1
61 Rajasthan Bharatiya Janata Party 24
62 Rajasthan Rashtriya Loktantrik Party 1
63 Sikkim Sikkim Krantikari Morcha 1
64 Tamil Nadu Dravida Munnetra Kazhagam 23
65 Tamil Nadu Indian National Congress 8
66 Tamil Nadu Communist Party Of India 2
67 Tamil Nadu Communist Party Of India (Marxist) 2
68 Tamil Nadu All India Anna Dravida Munnetra Kazhagam 1
69 Tamil Nadu Indian Union Muslim League 1
70 Tamil Nadu Viduthalai Chiruthaigal Katchi 1
71 Telangana Telangana Rashtra Samithi 9
72 Telangana Bharatiya Janata Party 4
73 Telangana Indian National Congress 3
74 Telangana All India Majlis-E-Ittehadul Muslimeen 1
75 Tripura Bharatiya Janata Party 2
76 Uttar Pradesh Bharatiya Janata Party 62
77 Uttar Pradesh Bahujan Samaj Party 10
78 Uttar Pradesh Samajwadi Party 5
79 Uttar Pradesh Apna Dal (Soneylal) 2
80 Uttar Pradesh Indian National Congress 1
81 Uttarakhand Bharatiya Janata Party 5
82 West Bengal All India Trinamool Congress 22
83 West Bengal Bharatiya Janata Party 18
84 West Bengal Indian National Congress
2
要使用BeautifulSoup实现此目的,您必须遍历每一行(标签<tr>
),然后遍历每一行的每个数据单元格标签(<td>
),然后将其附加到列表或数据帧中,或者将其存储的方式。
所以像这样:
import requests
import os
from bs4 import BeautifulSoup
url="http://www.elections.in/"
r=requests.get(url).content
htmlDoc=r.decode("utf-8")
soup = BeautifulSoup(htmlDoc, 'html.parser')
table = soup.find_all('table')[2]
rows = table.find_all('tr')
headers = table.find_all('th')
headers = [ each.text for each in headers ]
list_of_rows = []
for row in rows:
data = row.find_all('td')
if data != []:
data = [ each.text for each in data ]
list_of_rows.append(data)
输出:
print (headers)
['State', 'Party', 'Number of Seats']
print (list_of_rows)
[['Andaman & Nicobar Islands', 'Indian National Congress', '1'], ['Andhra Pradesh', 'Yuvajana Sramika Rythu Congress Party', '22'], ['Andhra Pradesh', 'Telugu Desam', '3'], ['Arunachal Pradesh', 'Bharatiya Janata Party', '2'], ['Assam', 'Bharatiya Janata Party', '9'], ['Assam', 'Indian National Congress', '3'], ['Assam', 'All India United Democratic Front', '1'], ['Assam', 'Independent', '1'], ['Bihar', 'Bharatiya Janata Party', '17'], ['Bihar', 'Janata Dal (United)', '16'], ['Bihar', 'Lok Jan Shakti Party', '6'], ['Bihar', 'Indian National Congress', '1'], ['Chandigarh', 'Bharatiya Janata Party', '1'], ['Chhattisgarh', 'Bharatiya Janata Party', '9'], ['Chhattisgarh', 'Indian National Congress', '2'], ['Dadra & Nagar Haveli', 'Independent', '1'], ['Daman & Diu', 'Bharatiya Janata Party', '1'], ['Goa', 'Bharatiya Janata Party', '1'], ['Goa', 'Indian National Congress', '1'], ['Gujarat', 'Bharatiya Janata Party', '26'], ['Haryana', 'Bharatiya Janata Party', '10'], ['Himachal Pradesh', 'Bharatiya Janata Party', '4'], ['Jammu & Kashmir', 'Bharatiya Janata Party', '3'], ['Jammu & Kashmir', 'Jammu & Kashmir National Conference', '3'], ['Jharkhand', 'Bharatiya Janata Party', '11'], ['Jharkhand', 'Ajsu Party', '1'], ['Jharkhand', 'Indian National Congress', '1'], ['Jharkhand', 'Jharkhand Mukti Morcha', '1'], ['Karnataka', 'Bharatiya Janata Party', '25'], ['Karnataka', 'Independent', '1'], ['Karnataka', 'Indian National Congress', '1'], ['Karnataka', 'Janata Dal (Secular)', '1'], ['Kerala', 'Indian National Congress', '15'], ['Kerala', 'Indian Union Muslim League', '2'], ['Kerala', 'Communist Party Of India (Marxist)', '1'], ['Kerala', 'Kerala Congress (M)', '1'], ['Kerala', 'Revolutionary Socialist Party', '1'], ['Lakshadweep', 'Nationalist Congress Party', '1'], ['Madhya Pradesh', 'Bharatiya Janata Party', '28'], ['Madhya Pradesh', 'Indian National Congress', '1'], ['Maharashtra', 'Bharatiya Janata Party', '23'], ['Maharashtra', 'Shivsena', '18'], ['Maharashtra', 'Nationalist Congress Party', '4'], ['Maharashtra', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Maharashtra', 'Independent', '1'], ['Maharashtra', 'Indian National Congress', '1'], ['Manipur', 'Bharatiya Janata Party', '1'], ['Manipur', 'Naga Peoples Front', '1'], ['Meghalaya', 'Indian National Congress', '1'], ['Meghalaya', "National People'S Party", '1'], ['Mizoram', 'Mizo National Front', '1'], ['Nagaland', 'Nationalist Democratic Progressive Party', '1'], ['NCT OF Delhi', 'Bharatiya Janata Party', '7'], ['Odisha', 'Biju Janata Dal', '12'], ['Odisha', 'Bharatiya Janata Party', '8'], ['Odisha', 'Indian National Congress', '1'], ['Puducherry', 'Indian National Congress', '1'], ['Punjab', 'Indian National Congress', '8'], ['Punjab', 'Bharatiya Janata Party', '2'], ['Punjab', 'Shiromani Akali Dal', '2'], ['Punjab', 'Aam Aadmi Party', '1'], ['Rajasthan', 'Bharatiya Janata Party', '24'], ['Rajasthan', 'Rashtriya Loktantrik Party', '1'], ['Sikkim', 'Sikkim Krantikari Morcha', '1'], ['Tamil Nadu', 'Dravida Munnetra Kazhagam', '23'], ['Tamil Nadu', 'Indian National Congress', '8'], ['Tamil Nadu', 'Communist Party Of India', '2'], ['Tamil Nadu', 'Communist Party Of India (Marxist)', '2'], ['Tamil Nadu', 'All India Anna Dravida Munnetra Kazhagam', '1'], ['Tamil Nadu', 'Indian Union Muslim League', '1'], ['Tamil Nadu', 'Viduthalai Chiruthaigal Katchi', '1'], ['Telangana', 'Telangana Rashtra Samithi', '9'], ['Telangana', 'Bharatiya Janata Party', '4'], ['Telangana', 'Indian National Congress', '3'], ['Telangana', 'All India Majlis-E-Ittehadul Muslimeen', '1'], ['Tripura', 'Bharatiya Janata Party', '2'], ['Uttar Pradesh', 'Bharatiya Janata Party', '62'], ['Uttar Pradesh', 'Bahujan Samaj Party', '10'], ['Uttar Pradesh', 'Samajwadi Party', '5'], ['Uttar Pradesh', 'Apna Dal (Soneylal)', '2'], ['Uttar Pradesh', 'Indian National Congress', '1'], ['Uttarakhand', 'Bharatiya Janata Party', '5'], ['West Bengal', 'All India Trinamool Congress', '22'], ['West Bengal', 'Bharatiya Janata Party', '18'], ['West Bengal', 'Indian National Congress', '2']]
但是就像我说的那样,大熊猫会为您做到这一点 .read_html()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句