Download PDFs under a specific header on webpage through Selenium Python

Shankar

How to download only PDF documents under the 'Design Review' header from the below URL through Selenium in Python.

https://platform.sustain-cert.com/public-project/2756

Design Review header can be anywhere on the web page (top, middle or at bottom). There can be many unique headers apart from the design review header.

Ajeet Verma

This is how you may try:

import time
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = Chrome()

url = "https://platform.sustain-cert.com/public-project/2756"
driver.get(url)

files = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'div.MuiBox-root.css-16uqhx7')))
print(f"total files: {len(files)}")

container = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.MuiContainer-root.MuiContainer-maxWidthLg.css-got2s4')))
categories = container.find_elements(By.CSS_SELECTOR, 'div>h6')

for category in categories:

    if category.text == "Design Review":
        design_files = category.find_element(By.XPATH, "parent::*").find_elements(By.CSS_SELECTOR, 'div.MuiBox-root.css-16uqhx7')
        print(f"total files under Design Review:: {len(design_files)}")

        delay = 5
        for file in design_files:
            file_detail = file.text.split('\n')

            if file_detail[0].endswith('.pdf)'):
                print(f"pdf files under Design Review:")
                print(file_detail[0].replace('(', '').replace(')', ''))
                # click button to download the pdf file
                file.find_element(By.TAG_NAME, 'button').click()
                time.sleep(delay)

            delay += 10

output:

total files: 12
total files under Design Review:: 6
pdf files under Design Review:
03 Deviation Request Form-Zengjiang wind power project-20220209-V01.pdf
pdf files under Design Review:
20220901_GS4GG VAL FVR_Yunxiao Wind_clean.pdf

Few things to note:

  1. As you are only interested in the pdf files in the Design Review section, so we first locate the element using h6 tag
  2. next, we iterate over all h6 tags and pick only the one with the Design Review text.
  3. Then, we refer back to the parent element/tag of the filtered h6 tag, find all the files, and store them in a variable design_files.
  4. Now, we get all the files under the Design Review and we easily filter out the files which end with .pdf
  5. finally, click on the located pdf file to download.

Downloading the files takes a bit of time, so we add incremental delay to wait for the current files to get downloaded before starting the next file download.

I hope this answers your problem.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Python: Unable to download with selenium in webpage

Download all pdfs on a webpage with PowerShell

Download a CSV inside a webpage through Python

Iterate through web pages and download PDFs

How to extract the text $7.56 from the webpage using Selenium through Python

Using Selenium to download PDFs from relative links

Download pictures to specific folder Selenium Python

Using Selenium and Python to scrape Morningstar website. Selenium doesn't download the full webpage

How to find an element under located header using python selenium

Download pdfs and join them using python

How to Download PDFs from Scraped Links [Python]?

Download ans save many PDFs files with python

Finding header of webpage selenium C#

Use selenium to download multiple pdfs from a list of urls

Python equivalent of full webpage download

Python - Unable to click button in Selenium under tooltip, webpage throws request error

Scraping pdfs from a webpage

How to click on a button on a webpage and iterate through contents after clicking on button using python selenium

How can I iterate through an excel sheet to perform a search on a webpage Python Selenium

How to pull text from webpage from paragraph element in specific header inside a div using Beautifulsoup-python

Copy and Paste under specific Header

Parse resulting webpage Python Selenium

Find xpath with on webpage selenium python

Python and Selenium to access Saved Webpage

How to select specific element to match a certain value on a webpage? (python selenium tkinter)

How can I search for text in a specific part of a webpage in Selenium (Python) ? With pictures:

Center Google map under header block in center on webpage

Python Selenium Headless download

Download PDFs from URL in list with python3.7

TOP Ranking

  1. 1

    Failed to listen on localhost:8000 (reason: Cannot assign requested address)

  2. 2

    Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

  3. 3

    How to import an asset in swift using Bundle.main.path() in a react-native native module

  4. 4

    pump.io port in URL

  5. 5

    Compiler error CS0246 (type or namespace not found) on using Ninject in ASP.NET vNext

  6. 6

    BigQuery - concatenate ignoring NULL

  7. 7

    ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

  8. 8

    ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

  9. 9

    Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

  10. 10

    How to remove the extra space from right in a webview?

  11. 11

    java.lang.NullPointerException: Cannot read the array length because "<local3>" is null

  12. 12

    Jquery different data trapped from direct mousedown event and simulation via $(this).trigger('mousedown');

  13. 13

    flutter: dropdown item programmatically unselect problem

  14. 14

    How to use merge windows unallocated space into Ubuntu using GParted?

  15. 15

    Change dd-mm-yyyy date format of dataframe date column to yyyy-mm-dd

  16. 16

    Nuget add packages gives access denied errors

  17. 17

    Svchost high CPU from Microsoft.BingWeather app errors

  18. 18

    Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

  19. 19

    12.04.3--- Dconf Editor won't show com>canonical>unity option

  20. 20

    Any way to remove trailing whitespace *FOR EDITED* lines in Eclipse [for Java]?

  21. 21

    maven-jaxb2-plugin cannot generate classes due to two declarations cause a collision in ObjectFactory class

HotTag

Archive