我有一个HTML如下:
<div class="info">
<h5>
<a href="/aaa/">aaa </a>
</h5>
<span class="date">
8:27AM, Sep 30</span>
</div>
我使用Ruby和我想要得到的文本"8:27AM, Sep 30"
是里面<span class="date">
。我无法通过下面的命令找到它。
find('div.info span.date').text
您能告诉我为什么它不起作用吗?如果h5
使用以下命令在文本中找到文字,则可以"aaa"
正确获取。
find('div.info h5').text
完整的红宝石代码
Then(/^you should see (\d+) latest items$/) do |arg1|
within("div.top-feature-list") do
# Validate images of those items exist, print report
expect(all("img").size.to_s).to eq(arg1)
puts "The number of items on the current site is " + (all("img").size.to_s)
# List of all items' details (Image, Headline, Introduction, Identifier, Url)
$i = 1
while $i <= arg1.to_i do
puts "Item no." + $i.to_s
puts " - Image: " + find('ul.category-index li.item-' + $i.to_s + ' img')[:src].to_s
puts " - Headline: " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5').text
puts " - Introduction: " + find('ul.category-index li.item-' + $i.to_s + ' div.summary').text
puts " - Url: " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5 a')[:href].to_s
puts " - Created Date " + find('ul.category-index li.item-' + $i.to_s + ' div.info span.date').text
puts " - Identifier: " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name').text
puts " - Subsection: " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name')[:href].to_s
$i +=1
end
end
end
更多html
<div class="top-feature-list">
<ul class="category-index">
<li class="group">
<ul>
<li class="item-1 left ">
<a name="item-1"></a>
<div class="img">
<a href="/health-lifestyle/item1.html">
<img alt="How to" src="//image_url">
</a>
<a class="section-name test" href="/health-lifestyle/">
LIFESTYLE </a>
</div>
<div class="info">
<h5>
<a href="/health-lifestyle/item1.html">
How to </a>
</h5>
<span class="date">
10:20AM, Sep 30</span>
</div>
<div class="summary">
<p>
Summary text</p>
</div>
</li>
....
env.rb
require 'parallel_tests'
require 'capybara/cucumber'
require 'capybara/poltergeist'
require 'rspec'
在Ruby中,解析HTML非常容易。您需要做的是在程序中需要两个gem:
require 'open-uri'
require 'nokogiri'
# Set the page you are going to scan.
page = Nokogiri::HTML(open("http://google.com/"))
# (Updated to reflect the date class provided in question)
# Extract specific elements via CSS selector.
# This first selects all everything that has span tag,
# then narrows down to anything with class of ".date"
# use .strip to remove any whitespace from HTML
page.css('span').css('.date').text.strip!
# => outputs "8:27AM, Sep 30"
如果您想了解有关使用Ruby解析HTML的更多信息,则需要进行谷歌搜索和阅读。一种使您入门的重要资源在这里。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句