Click here to Skip to main content
15,891,136 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Am trying to get the content under "ul" I tried the below code, when I execute the script it tells me the total amount of links but it doesn't scrape further then that.

If the ul had a class let say <ul class="list-group"> I would just simply do this
PHP
foreach($html->find('.list-group a') as $element)
So my question is how do I get the scraper to scrap the links under "ul" if the "ul" doesn't have a class. Thanks in advance.

Here is the content snippet am trying to scrape.

Snipt site · GitHub[^]

What I have tried:

PHP
else {
    $html = get_web_page(SITE2_BASE_URL . $path);
    $html = str_get_html($html);
    $result = array();
    foreach($html->find('.series a') as $element)  {
        $result[] = array('title'=> $element->plaintext, 'href'=> $element->href);
    }
Posted
Updated 28-Jan-18 19:26pm

1 solution

I haven't ran your code but i think you either need to change foreach($html->find('.list-group a') as $element) to

foreach($html->find('ul') as $element)


or if you have more than 1 <ul> tag on the page, try

foreach($html->find('.menu_series ul') as $element)


This seems pretty straight forward so I'm not sure if I am missing something.
 
Share this answer
 
Comments
Member 13648817 29-Jan-18 1:33am    
The 2nd solution worked but when I execute the script it fetches the total amount of links and shows "processing link" for the first link "/category/25-sai-no-joshikousei" but doesn't scrape it or any of the links for that matter. Screenshot of results (https://i.stack.imgur.com/ucJlY.png) full code of my script (https://gist.github.com/PushmeOver/74adf94fb66d8f6287420d6b63f8c942)
David_Wimbley 29-Jan-18 1:39am    
Your github link didnt work for me.
David_Wimbley 29-Jan-18 1:41am    
Looking at your code, given that it only executed the loop once you may need to change foreach($html->find('.menu_series ul') as $element) to foreach($html->find('.menu_series ul li') as $element)
Member 13648817 29-Jan-18 1:44am    
Its still not working when I execute the script. Here is a new link sorry about that. https://codeshare.io/5PzpJX
David_Wimbley 29-Jan-18 1:50am    
Im not sure .menu_series ul a would work as the XPATH location of the anchor tag is not /nav/ul/a, the anchor tag is under /nav/ul/li/a so by not including the li element in your .find method i would expect it to not work.

One thing to keep in mind is that if you are trying to target nested HTML/XML you can't necessarily skip elements unless the library you are using supports not specifying the exact location.

What i mean by this is your current usage in .find method seems to be thinking the anchor tag looks something like this
	<nav class="menu_series cron">
		<ul>
			<a href="url">link</a>
		</ul>
	</nav>


When in reality, according to the link you posted in your question, the anchor tag is under a li element like so.


	<nav class="menu_series cron">
		<ul>
			<li>
				<a href="url">link</a>
			</li>
		</ul>
	</nav>



All this to say I think you need to change .menu_series ul a to .menu_series ul li a

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900