使用请求python下载txt文件

饼干

我想txt从 API下载多个文件。我可以使用以下代码下载 pdf 文件。但是,有人愿意帮助如何自定义request下载 txt 文件的文档类型吗?非常感谢。

links = ["P167897", "P173997", "P166309"]

for link in links:
    end_point = f"https://search.worldbank.org/api/v2/wds?" \
                f"format=json&includepublicdocs=1&" \
                f"fl=docna,lang,docty,repnb,docdt,doc_authr,available_in&" \
                f"os=0&rows=20&proid={link}&apilang=en"
    documents = requests.get(end_point).json()["documents"]
    for document_data in documents.values():
        try:
            pdf_url = document_data["pdfurl"]
            file_path = Path(f"K:/downloading_text/{link}/{pdf_url.rsplit('/')[-1]}")
            file_path.parent.mkdir(parents=True, exist_ok=True)
            with file_path.open("wb") as f:
                f.write(requests.get(pdf_url).content)
            time.sleep(1)
        except KeyError:
            continue
莫里斯·迈耶

您只需要更改 URL:

.../pdf/Sierra-Leone-AFRICA-WEST-P167897-Sierra-Leone-Free-Education-Project-Procurement-Plan.pdf

到:

.../text/Sierra-Leone-AFRICA-WEST-P167897-Sierra-Leone-Free-Education-Project-Procurement-Plan.txt

这可以使用str.replace()以下方法轻松完成

links = ["P167897", "P173997", "P166309"]

for link in links:
    end_point = f"https://search.worldbank.org/api/v2/wds?" \
                f"format=json&includepublicdocs=1&" \
                f"fl=docna,lang,docty,repnb,docdt,doc_authr,available_in&" \
                f"os=0&rows=20&proid={link}&apilang=en"
    #print(requests.get(end_point).json())
    #break
    documents = requests.get(end_point).json()["documents"]
    for document_data in documents.values():
        try:
            pdf_url = document_data["pdfurl"]
            txt_url = pdf_url.replace('.pdf', '.txt')
            txt_url = txt_url.replace('/pdf/', '/text/')
            print(f"Downloading: {txt_url}")
            uniqueId = txt_url[6:].split('/')[4]
            file_path = Path(
                f"/tmp/{link}/{uniqueId}-{txt_url.rsplit('/')[-1]}"
            )
            file_path.parent.mkdir(parents=True, exist_ok=True)
            with file_path.open("wb") as f:
                f.write(requests.get(txt_url).content)
            time.sleep(1)
        except KeyError:
            continue

出:

Downloading: http://documents.worldbank.org/curated/en/106981614570591392/text/Official-Documents-Grant-Agreement-for-Additional-Financing-Grant-TF0B4694.txt
Downloading: http://documents.worldbank.org/curated/en/331341614570579132/text/Official-Documents-First-Restatement-to-the-Disbursement-Letter-for-Grant-D6810-SL-and-for-Additional-Financing-Grant-TF0B4694.txt
...

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章