我试图抓取 Myntra 网站。链接在这里
我使用 Puppeteer 和 Node JS 来抓取它。它工作正常,目前我收到一个错误
Error: Evaluation failed: TypeError: Cannot read property 'textContent' of null
at __puppeteer_evaluation_script__:2:55
该函数返回一个空对象。我在下面附上了我的代码。
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.myntra.com/jeans/only/only-women-black-skinny-fit-mid-rise-low-distress-stretchable-cropped-jeans/10973332/buy');
const body = await page.evaluate( () => {
return document.querySelector('.pdp-price') ;
});
console.log(body);
await browser.close();
} catch (error) {
console.log(error);
}
})();
似乎该站点正在阻止HeadlessChrome
中指定的请求user-agent
,因此我更改了user-agent
,现在一切正常。试试这个代码:
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setExtraHTTPHeaders({
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36'
});
await page.goto('https://www.myntra.com/jeans/only/only-women-black-skinny-fit-mid-rise-low-distress-stretchable-cropped-jeans/10973332/buy');
const body = await page.evaluate(() => {
return document.querySelector('.pdp-price').textContent;
});
console.log(body);
await browser.close();
} catch (error) {
console.log(error);
}
})();
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句