在``中找不到文本

JoeW 发表于 Dev

乔

我有一个HTML如下：

<div class="info">
 <h5>
   <a href="/aaa/">aaa </a>
 </h5>
 <span class="date">
       8:27AM, Sep 30</span>     
</div>

我使用Ruby和我想要得到的文本"8:27AM, Sep 30"是里面。我无法通过下面的命令找到它。

find('div.info span.date').text

您能告诉我为什么它不起作用吗？如果h5使用以下命令在文本中找到文字，则可以"aaa"正确获取。

find('div.info h5').text

完整的红宝石代码

Then(/^you should see (\d+) latest items$/) do |arg1|
    within("div.top-feature-list") do
       # Validate images of those items exist, print report
       expect(all("img").size.to_s).to eq(arg1)
       puts "The number of items on the current site is " + (all("img").size.to_s)
       # List of all items' details (Image, Headline, Introduction, Identifier, Url)
       $i = 1
       while $i <= arg1.to_i do
          puts "Item no." + $i.to_s
          puts "        - Image:        " + find('ul.category-index li.item-' + $i.to_s + ' img')[:src].to_s
          puts "        - Headline: " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5').text
          puts "        - Introduction: " + find('ul.category-index li.item-' + $i.to_s + ' div.summary').text
          puts "        - Url:      " + find('ul.category-index li.item-' + $i.to_s + ' div.info h5 a')[:href].to_s
          puts "        - Created Date " + find('ul.category-index li.item-' + $i.to_s + ' div.info span.date').text
          puts "        - Identifier:   " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name').text
          puts "        - Subsection:   " + find('ul.category-index li.item-' + $i.to_s + ' div.img a.section-name')[:href].to_s
          $i +=1
      end
    end
  end

更多html

<div class="top-feature-list">  
 <ul class="category-index">
    <li class="group">
           <ul>
    <li class="item-1 left ">
        <a name="item-1"></a>
        <div class="img">
            <a href="/health-lifestyle/item1.html">
                <img alt="How to" src="//image_url">     
            </a>

            <a class="section-name test" href="/health-lifestyle/">
                LIFESTYLE </a>
        </div>
        <div class="info">
            <h5>

                <a href="/health-lifestyle/item1.html">
                    How to </a>

            </h5>
            <span class="date">
                10:20AM, Sep 30</span>

        </div>
        <div class="summary">

            <p>
                Summary text</p>

        </div>


    </li>
    ....

env.rb

require 'parallel_tests'
require 'capybara/cucumber'
require 'capybara/poltergeist'
require 'rspec'

二元石匠

在Ruby中，解析HTML非常容易。您需要做的是在程序中需要两个gem：

require 'open-uri'
require 'nokogiri'

# Set the page you are going to scan.
page = Nokogiri::HTML(open("http://google.com/"))

# (Updated to reflect the date class provided in question)
# Extract specific elements via CSS selector.
# This first selects all everything that has span tag,
# then narrows down to anything with class of ".date"
# use .strip to remove any whitespace from HTML

page.css('span').css('.date').text.strip! 

# => outputs "8:27AM, Sep 30"

如果您想了解有关使用Ruby解析HTML的更多信息，则需要进行谷歌搜索和阅读。一种使您入门的重要资源在这里。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。