Click here to Skip to main content
15,906,213 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Am trying to get the content under "ul" I tried the below code, when I execute the script it tells me the total amount of links but it doesn't scrape further then that.

If the ul had a class let say <ul class="list-group"> I would just simply do this
PHP
foreach($html->find('.list-group a') as $element)
So my question is how do I get the scraper to scrap the links under "ul" if the "ul" doesn't have a class. Thanks in advance.

Here is the content snippet am trying to scrape.

Snipt site · GitHub[^]

What I have tried:

PHP
else {
    $html = get_web_page(SITE2_BASE_URL . $path);
    $html = str_get_html($html);
    $result = array();
    foreach($html->find('.series a') as $element)  {
        $result[] = array('title'=> $element->plaintext, 'href'=> $element->href);
    }
Posted
Updated 28-Jan-18 19:26pm

1 solution

I haven't ran your code but i think you either need to change foreach($html->find('.list-group a') as $element) to

foreach($html->find('ul') as $element)


or if you have more than 1 <ul> tag on the page, try

foreach($html->find('.menu_series ul') as $element)


This seems pretty straight forward so I'm not sure if I am missing something.
 
Share this answer
 
Comments
Member 13648817 29-Jan-18 1:33am    
The 2nd solution worked but when I execute the script it fetches the total amount of links and shows "processing link" for the first link "/category/25-sai-no-joshikousei" but doesn't scrape it or any of the links for that matter. Screenshot of results (https://i.stack.imgur.com/ucJlY.png) full code of my script (https://gist.github.com/PushmeOver/74adf94fb66d8f6287420d6b63f8c942)
David_Wimbley 29-Jan-18 1:39am    
Your github link didnt work for me.
David_Wimbley 29-Jan-18 1:41am    
Looking at your code, given that it only executed the loop once you may need to change foreach($html->find('.menu_series ul') as $element) to foreach($html->find('.menu_series ul li') as $element)
Member 13648817 29-Jan-18 1:44am    
Its still not working when I execute the script. Here is a new link sorry about that. https://codeshare.io/5PzpJX
David_Wimbley 29-Jan-18 1:50am    
Im not sure .menu_series ul a would work as the XPATH location of the anchor tag is not /nav/ul/a, the anchor tag is under /nav/ul/li/a so by not including the li element in your .find method i would expect it to not work.

One thing to keep in mind is that if you are trying to target nested HTML/XML you can't necessarily skip elements unless the library you are using supports not specifying the exact location.

What i mean by this is your current usage in .find method seems to be thinking the anchor tag looks something like this
	<nav class="menu_series cron">
		<ul>
			<a href="url">link</a>
		</ul>
	</nav>


When in reality, according to the link you posted in your question, the anchor tag is under a li element like so.


	<nav class="menu_series cron">
		<ul>
			<li>
				<a href="url">link</a>
			</li>
		</ul>
	</nav>



All this to say I think you need to change .menu_series ul a to .menu_series ul li a

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS


CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900