所以我有一个<tr>
带有多个<td>
作为其子字符串的标签。
<tr>
<td align='center' class="row2">
<a href="javascript:who_posted(4713426);">10</a>
</td>
<td align="center" class="row2">
<a href='https://forum.net/index.php?;showuser=17311'>xxboxx</a>
</td>
<td align="center" class="row2">
<!--script type="text/javascript">
s = "236".replace(/,/g,'');
document.write(abbrNum(s,1));
</script-->
236
</td>
</tr>
这是我当前的代码;我没有问题,但是我想尝试退出脚本,但是我尝试了其他类似的关于stackoverflow的问题所提供的方法。但我没有成功。
def extractDataFromRow2(_url, 'td', 'row2', 'align' , 'center'):
try:
for container in _url.find_all('td', {'class': 'row2','align': 'center''}):
# get data from topic title in table cell
replies_numb = container.select_one(
'a[href^="javascript:]"').text
print('there are ' + replies_numb + ' replies')
topic_starter = container.next_sibling.text
print('the owner of this topic is ' + topic_starter)
for total_view in container.find('a', href=True, style=True):
#total_view = container.select_one(style="background-color:").text
#total_view = container.find(("td")["style"])
#total_view = container.next_sibling.find_next_sibling/next_sibling
#but they're not able to access the last one within <tr> tag
print(total_view )
if replies_numb and topic_starter is not None:
dict_replies = {'Replies' : replies_numb}
dict_topic_S = {'Topic_Starter' : topic_starter}
list_1.append(dict_replies)
list_2.append(dict_topic_S)
else:
print('no data')
except Exception as e:
print('Error.extractDataFromRow2:', e)
return None
是否有更清洁的方法;我很高兴从给出的反馈中学习。
您共享的html代码可能不足以回答问题,因此我签出了您共享的url。这是刮桌子的方法。
from bs4 import BeautifulSoup
import requests
r = requests.get("https://forum.lowyat.net/ReviewsandGuides")
soup = BeautifulSoup(r.text, 'lxml')
index = 0
#First two rows of table is not data so we skip it. Last row of table is for searching we also skip it. Table contains 30 rows of data. That is why we are slicing list
for row in soup.select('table[cellspacing="1"] > tr')[2:32]:
replies = row.select_one('td:nth-of-type(4)').text.strip()
topic_started = row.select_one('td:nth-of-type(5)').text.strip()
total_views = row.select_one('td:nth-of-type(6)').text.strip()
index +=1
print(index,replies, topic_started, total_views)
结果是
1 148 blurjoey 9,992
2 10 xxboxx 263
3 18 JayceOoi 1,636
4 373 idoblu 54,589
5 237 blurjoey 16,101
6 526 JayceOoi 57,577
7 131 JayceOoi 34,354
8 24 blurjoey 4,261
9 2 JayceOoi 249
10 72 KeyMochi 26,622
11 7 champu 331
12 0 asunakirito 210
13 0 asunakirito 172
14 0 asunakirito 199
15 17 blurjoey 3,351
16 860 blurjoey 112,556
17 0 chennegan 174
18 0 goldfries 185
19 4 JayceOoi 601
20 2 JayceOoi 309
21 10 blurjoey 1,826
22 3 JayceOoi 398
23 4 squallz05 310
24 0 asunakirito 265
25 25 asunakirito 12,326
26 0 blurjoey 279
27 14 JayceOoi 2,092
28 0 chennegan 305
29 8 Pharamain 732
30 19 idoblu 1,273
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句