如何从 BeautifulSoup 对象中提取 JSON?

恰兰

我已经使用 python-requests 下载了网页的 HTML。我现在需要从这个内容中提取一个 JSON 对象。我已经用一些 BS4 方法找到了 JSON 对象。但是,我不知道如何从 BS4 对象中提取它。这是我的代码

from bs4 import BeautifulSoup
import requests
import json

url = "https://matmatch.com/materials?materialPath=mitf1194-astm-b196-grade-c17200-tb00"

html_content = requests.get(url).text
soup = BeautifulSoup(html_content,features="html.parser")
body = soup.find('body')
the_contents_of_body_without_body_tags = body.findChildren(recursive=False)
#print(the_contents_of_body_without_body_tags)


element = soup.find_all("script",type="application/ld+json")
print(element[2])
#print(type(soup.find_all("script", {"type":"application/ld+json"})[2]))
js = json.loads(element[2])

这是此代码的输出:

<script type="application/ld+json">{
      "@context": ["https://schema.org", {"csvw": "http://www.w3.org/ns/csvw#"}],
      "@type": "Dataset",
      "name":"ASTM B196 Grade C17200 TB00",
      "description": "Chemical composition and material properties of ASTM B196 Grade C17200 TB00. Also available for download in XLSX and PDF. Data provided by MakeItFrom.com,Matmatch,Materion Brush GmbH",
      "license": "https://matmatch.com/imprint",
      "publisher": {
        "@type": "Organization",
        "name": "Matmatch"
      },
      "mainEntity" : {
        "@type" : "csvw:Table",
        "csvw:tableSchema": {
          "csvw:columns": [
            {
              "csvw:name": "Property Name",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":"Density","csvw:primaryKey":"Density"},{"csvw:value":"Outside diameter","csvw:primaryKey":"Outside diameter"},{"csvw:value":"Thickness","csvw:primaryKey":"Thickness"},{"csvw:value":"Width","csvw:primaryKey":"Width"},{"csvw:value":"Bendability 90°, bw","csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":"Bendability 90°, gw","csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":"Elastic modulus","csvw:primaryKey":"Elastic modulus"},{"csvw:value":"Elongation","csvw:primaryKey":"Elongation"},{"csvw:value":"Hardness, Rockwell C","csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":"Hardness, Vickers","csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":"Shear modulus","csvw:primaryKey":"Shear modulus"},{"csvw:value":"Tensile strength","csvw:primaryKey":"Tensile strength"},{"csvw:value":"Yield strength","csvw:primaryKey":"Yield strength"},{"csvw:value":"Yield strength Rp0.2","csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":"Coefficient of thermal expansion","csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":"Melting point","csvw:primaryKey":"Melting point"},{"csvw:value":"Specific heat capacity","csvw:primaryKey":"Specific heat capacity"},{"csvw:value":"Thermal conductivity","csvw:primaryKey":"Thermal conductivity"},{"csvw:value":"Electrical resistivity","csvw:primaryKey":"Electrical resistivity"},{"csvw:value":"Specific Electrical conductivity","csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":"Relative magnetic permeability","csvw:primaryKey":"Relative magnetic permeability"}]
            },
            {
              "csvw:name": "Value",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":8.26,"csvw:primaryKey":"Density"},{"csvw:value":19.1,"csvw:primaryKey":"Outside diameter"},{"csvw:value":0.05,"csvw:primaryKey":"Thickness"},{"csvw:value":1.27,"csvw:primaryKey":"Width"},{"csvw:value":0,"csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":0,"csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":130,"csvw:primaryKey":"Elastic modulus"},{"csvw:value":1,"csvw:primaryKey":"Elongation"},{"csvw:value":36,"csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":210,"csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":50,"csvw:primaryKey":"Shear modulus"},{"csvw:value":410,"csvw:primaryKey":"Tensile strength"},{"csvw:value":220,"csvw:primaryKey":"Yield strength"},{"csvw:value":130,"csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":0.0000175,"csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":870,"csvw:primaryKey":"Melting point"},{"csvw:value":360,"csvw:primaryKey":"Specific heat capacity"},{"csvw:value":84,"csvw:primaryKey":"Thermal conductivity"},{"csvw:value":6.2e-8,"csvw:primaryKey":"Electrical resistivity"},{"csvw:value":17,"csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":1.0006,"csvw:primaryKey":"Relative magnetic permeability"}]
            },
            {
              "csvw:name": "Unit",
              "csvw:datatype": "string",
              "csvw:cells": [{"csvw:value":"g/cm³","csvw:primaryKey":"Density"},{"csvw:value":"mm","csvw:primaryKey":"Outside diameter"},{"csvw:value":"mm","csvw:primaryKey":"Thickness"},{"csvw:value":"mm","csvw:primaryKey":"Width"},{"csvw:value":"[-]","csvw:primaryKey":"Bendability 90°, bw"},{"csvw:value":"[-]","csvw:primaryKey":"Bendability 90°, gw"},{"csvw:value":"GPa","csvw:primaryKey":"Elastic modulus"},{"csvw:value":"%","csvw:primaryKey":"Elongation"},{"csvw:value":"[-]","csvw:primaryKey":"Hardness, Rockwell C"},{"csvw:value":"[-]","csvw:primaryKey":"Hardness, Vickers"},{"csvw:value":"GPa","csvw:primaryKey":"Shear modulus"},{"csvw:value":"MPa","csvw:primaryKey":"Tensile strength"},{"csvw:value":"MPa","csvw:primaryKey":"Yield strength"},{"csvw:value":"MPa","csvw:primaryKey":"Yield strength Rp0.2"},{"csvw:value":"1/K","csvw:primaryKey":"Coefficient of thermal expansion"},{"csvw:value":"°C","csvw:primaryKey":"Melting point"},{"csvw:value":"J/(kg·K)","csvw:primaryKey":"Specific heat capacity"},{"csvw:value":"W/(m·K)","csvw:primaryKey":"Thermal conductivity"},{"csvw:value":"Ω·m","csvw:primaryKey":"Electrical resistivity"},{"csvw:value":" % IACS","csvw:primaryKey":"Specific Electrical conductivity"},{"csvw:value":"[-]","csvw:primaryKey":"Relative magnetic permeability"}]
            }]
        }
      }
    }</script>

代码的最后一行返回此错误:

TypeError: the JSON object must be str, bytes or bytearray, not 'Tag'

我曾尝试在 BS4 对象上使用.text.content方法,但它也会导致错误。

如何从此输出中提取 JSON 对象?

孟德尔

调用.string方法:

如果一个标签只有一个孩子,并且该孩子是 a NavigableString,则该孩子将作为.string


在你的例子中:

from bs4 import BeautifulSoup
import requests
import json

url = "https://matmatch.com/materials?materialPath=mitf1194-astm-b196-grade-c17200-tb00"

html_content = requests.get(url).text
soup = BeautifulSoup(html_content,features="html.parser")
body = soup.find('body')
the_contents_of_body_without_body_tags = body.findChildren(recursive=False)

element = soup.find_all("script",type="application/ld+json")

js = json.loads(element[2].string) # <- Calling `.string` to get the JSON
print(js)

示例输出(截断):

 {'@context': ['https://schema.org', {'csvw': 'http://www.w3.org/ns/csvw#'}], '@type': 'Dataset', 'name': 'ASTM B196 Grade C17200 TB00', ...., {'csvw:value': '[-]', 'csvw:primaryKey': 'Relative magnetic permeability'}]}]}}}

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章