How to get a value from inside an href in the HTML structure

NMCherry132

I am using the following code to get values from a site

import scrapy

class scraping(scrapy.Spider):
    name = 'NewsSpider'
    start_urls = ['https://www.uol.com.br/']

    def parse(self, response):
        news = response.xpath('//article')
        for n in news:
            print({
                'Link': n.xpath("//a[@class='hyperlink headlineSub__link']").get(),
                'Title': n.xpath('//a/div/h3/text()').get(),
            })

On "Link" I am getting a lot of information but I want to get only the link inside the href, is it possible to get only that information?

mohammad hosein bahmani

I have a sample of doing this very same thing. You should use something like this selector:

.css('a[href*=topic]::attr(href)')

a tag in my case was something like <a ... href="topic/1321343">something</a>.
The key is a::attr(href)
parse your response and make it as small as you can and get your wanted href value.

This is my solution on a project for scraping Microsoft Academia articles. The linked line gets items in "Related Topics" section.

Here is some other example:

<span class="title">
  <a href="https://www.example.com"></a>
</span>

pars by:

Link = Link1.css('span.title a::attr(href)').extract()[0]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

How to get the css value from outer html structure using Jquery?

PHP - How to match href value from parsed HTML code with specific start word and end number inside href attribute value?

How to get a value from inside of an element in HTML and set it as variable in JS

GTM - get href value inside custom html tag

How to get href value from string

How to get href from html refer to linktext?

How to get href from HTML class?

How do I get href attribute value from the given HTML using Python+Selenium?

How to get the value from this structure of an Object in Laravel

How to get the href from an <a> tag inside a <div> by text using beautifulsoup?

How to get value inside .html() in JQuery

how to get a specific value in a script inside an html?

How to get a return value from inside a for/if

How to get from outside value inside subscriber

How to get a data structure from inside a string (Python)

How to get value from inside quotation marks in HTML using Java and Selenium WebDriver

how to get an attribute value from a href link in selenium

How to get the href value of an anchor tag with javascript from a string

How to get href data parameter value from selected item

How do I get value from input radio in a href?

How can I get href links from HTML using Python?

How to get the value of href attribute

How to get the value of a "hidden" href?

Get "checked=true/false" value of checkbox from inside HTML attributes

How to pass value inside href to laravel controller?

How do I get the link inside href?

How to get img src inside a href jquery

How get a max value from array of dictionaries with the same structure?

How to get selected option inside href link in HTML using Django for backend