How to get a value from inside an href in the HTML structure

NMCherry132

I am using the following code to get values from a site

import scrapy

class scraping(scrapy.Spider):
    name = 'NewsSpider'
    start_urls = ['https://www.uol.com.br/']

    def parse(self, response):
        news = response.xpath('//article')
        for n in news:
            print({
                'Link': n.xpath("//a[@class='hyperlink headlineSub__link']").get(),
                'Title': n.xpath('//a/div/h3/text()').get(),
            })

On "Link" I am getting a lot of information but I want to get only the link inside the href, is it possible to get only that information?

mohammad hosein bahmani

I have a sample of doing this very same thing. You should use something like this selector:

.css('a[href*=topic]::attr(href)')

a tag in my case was something like <a ... href="topic/1321343">something</a>.
The key is a::attr(href)
parse your response and make it as small as you can and get your wanted href value.

This is my solution on a project for scraping Microsoft Academia articles. The linked line gets items in "Related Topics" section.

Here is some other example:

<span class="title">
  <a href="https://www.example.com"></a>
</span>

pars by:

Link = Link1.css('span.title a::attr(href)').extract()[0]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-11-3

Comments

0 comments

How to get the css value from outer html structure using Jquery?

PHP - How to match href value from parsed HTML code with specific start word and end number inside href attribute value?

How to get a value from inside of an element in HTML and set it as variable in JS

GTM - get href value inside custom html tag

How to get href value from string

How to get href from html refer to linktext?

How to get href from HTML class?

How do I get href attribute value from the given HTML using Python+Selenium?

How to get the value from this structure of an Object in Laravel

How to get the href from an <a> tag inside a <div> by text using beautifulsoup?

How to get value inside .html() in JQuery

how to get a specific value in a script inside an html?

How to get a return value from inside a for/if

How to get from outside value inside subscriber

How to get a data structure from inside a string (Python)

How to get value from inside quotation marks in HTML using Java and Selenium WebDriver

how to get an attribute value from a href link in selenium

How to get the href value of an anchor tag with javascript from a string

How to get href data parameter value from selected item

How do I get value from input radio in a href?

How can I get href links from HTML using Python?

How to get the value of href attribute

How to get the value of a "hidden" href?

Get "checked=true/false" value of checkbox from inside HTML attributes

How to pass value inside href to laravel controller?

How do I get the link inside href?

How to get img src inside a href jquery

How get a max value from array of dictionaries with the same structure?

How to get selected option inside href link in HTML using Django for backend

TOP Ranking

Article

How to get a value from inside an href in the HTML structure

How to get a value from inside an href in the HTML structure

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

Emulator wrong screen resolution in Android Studio 1.3

3D Touch Peek Swipe Like Mail

Double spacing in rmarkdown pdf

Svchost high CPU from Microsoft.BingWeather app errors

How to how increase/decrease compared to adjacent cell

Using Response.Redirect with Friendly URLS in ASP.NET

java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

BigQuery - concatenate ignoring NULL

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Can a 32-bit antivirus program protect you from 64-bit threats

Make a B+ Tree concurrent thread safe

Bootstrap 5 Static Modal Still Closes when I Click Outside

Vector input in shiny R and then use it

Assembly definition can't resolve namespaces from external packages