I am trying to maintain some PHP code which is doing web page scraping. The web page has changed so an update is needed, but I'm not so experienced with Xpath so am struggling.
Basically this is the section of html that is relevant
<div class="carousel-item-wrapper">
<picture class="">
<source srcset="/medias/tea-tree-skin-clearing-foaming-cleanser-1-640x640.jpg?context=product-images/h3b/hd3/8796813918238/tea-tree-skin-clearing-foaming-cleanser_1-640x640.jpg" media="(min-width: 641px) and (max-width: 1024)">
<source srcset="/medias/tea-tree-skin-clearing-foaming-cleanser-1-320x320.jpg?context=product-images/h09/h9a/8796814049310/tea-tree-skin-clearing-foaming-cleanser_1-320x320.jpg" media="(max-width: 640px)">
<img srcset="/medias/myimage.jpg" alt="150 ML" class="">
</picture>
</div>
I am trying to extract the srcset attribute from the IMG tag which is the value of "/medias/myimage.jpg". I'm using XPATH Helper chrome plugin to help me and I have the following xpath;
//div[@class="carousel-item-wrapper"]/picture/img/@srcset
In the plugin, it returns exact what I expect, so it appears to work fine.
If I also use an online xpath tester http://www.online-toolz.com/tools/xpath-editor.php then it also works OK.
But in my PHP code I get a null value.
$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->strictErrorChecking = false;
$dom->recover = true;
@$dom->loadHtml($html);
$xPath = new DOMXPath($dom);
//Other xPath queries executed OK.
$node = $xPath->query('//div[@class="carousel-item-wrapper"]/picture/img/@srcset')->item(0);
if ($node === NULL)
writelog("Node is NULL"); // <-- Writes NULL to the log file!
I have of course tried a lot of different variations on this, trying not to specify the attribute name etc. But all with not luck.
What am I doing wrong? I'm sure it must be something simple, but I can't spot it.
Other extracts using my PHP code on the same HTML document are working OK. So it is just this element causing me trouble.
PHP's DOMXPath class seems to have trouble with self-closing tags. You need to add a double forward-slash if you're looking to find a self-closing tag, so your new xPath query should be:
//div[@class="carousel-item-wrapper"]/picture//img/@srcset
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments