OK, So I have a page that has that has images on it that I'm looking to scrape and return the following information:
I have it working to return ONE Image, but I need it to return all of them (there is about 5)
This is what I have at the moment:
function getMostRecentScreenshot($url) {
$content = file_get_contents($url);
$first_step = explode('<div class="imageWall5Floaters">' , $content );
$second_step = explode('<div style="clear: left;"></div>' , $first_step[1] );
return $second_step[0];
}
This is what it returns
<div class="floatHelp">
<a href="websiteurl.com/imagepage" onclick="return OnScreenshotClicked(9384938);" class="profile_media_item modalContentLink " data-desired-aspect="1.77777777778">
<div style="background-image: url('website.com/image');" class="imgWallItem " id="imgWallItem_757249198">
<div style="position: relative;">
<input type="checkbox" style="position: absolute; display: none;" name="screenshots[9384938]" class="screenshot_checkbox" id="screenshot_checkbox_9384938" />
</div>
<div class="imgWallHover" id="imgWallHover9384938">
<div class="imgWallHoverBottom">
<div class="imgWallHoverDescription ">
<q class="ellipsis">Quote about the image</q>
</div>
</div>
</div>
</div>
</a>
The give images have different ID's (the 9384938 part).
How would I get the information needed from what it returns?
I have another function at the moment that returns the data for one of the images (kind of), but it's basically just the exact same thing with code between the explode, which is very messy.
You could use PHP's DOMDocument
class with this function:
function getDataFromHTML($html) {
$doc = new DOMDocument();
$html = $doc->loadHTML($html);
foreach($doc->getElementsByTagName('a') as $a) {
if (strpos($a->getAttribute('class'), 'profile_media_item') !== false) {
$row = [];
$row['baseURL'] = $a->getAttribute('href');
foreach($a->getElementsByTagName('div') as $div) {
preg_match("~(?<=url\(['\"]).*?(?=['\"])~",
$div->getAttribute('style'), $attr);
$row['imageURL'] = reset($attr);
foreach($a->getElementsByTagName('q') as $q) {
$row['quote'] = $q->textContent;
break;
}
break;
}
$result[] = $row;
}
}
return $result;
}
Call it as:
$result = getDataFromHTML($html);
Output for the sample data is:
array (
array (
'baseURL' => 'websiteurl.com/imagepage',
'imageURL' => 'website.com/image',
'quote' => 'Quote about the image'
)
)
The outer array would have more such entries if run on a HTML string that has several of those DOM structures.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments