Click here to Skip to main content
15,895,746 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
PHP
<code>Hey guys, im trying to write a content specific webcrawler in PHP, and I keep getting an undefined offset on line 68. Im really not sure where I went wrong, so any help would be great!

1    <?php
2   
3    include("C:\Program Files\PHPCrawl_082\libs\PHPCrawler.class.php"); 
4    
5    set_time_limit(0);    
6    
7    $domain = "http://www.carrollcountyohio.us/";
8    
9    
10    $content = "Election Results";
11    
12    
13    $content_tag = "Election Results";
14    
15    
16    $output_file = "ElectionsURL.txt";
17    
18    
19    $max_urls_to_check = 10;
20    
21    $rounds = 0;
22    
23    $domain_stack = array();
24    
25    
26    $max_size_domain_stack = 1000;
27    
28    
29    $checked_domains = array();
30     
31    
32    while ($domain != "" && $rounds < $max_urls_to_check) {
33        $doc = new DOMDocument();
34     
35        @$doc->loadHTMLFile($domain);
36        $found = false;
37     
38    
39        foreach($doc->getElementsByTagName($content_tag) as $tag) {
40            if (strpos($tag->nodeValue, $content)) {
41                $found = true;
42                break;
43            }
44        }
45     
46        $checked_domains[$domain] = $found;
47    
48        foreach($doc->getElementsByTagName('Election') as $link) {
49            $href = $link->getAttribute('href');
50            if (strpos($href, 'http://www.carrollcountyohio.us/') !== false && strpos($href,      $domain) === false) {
52                $href_array = array_pad(explode("/", $href, 2), 2, $href);
53                if (count($domain_stack) < $max_size_domain_stack &&
54                    $checked_domains["http://www.carrollcountyohio.us/".$href_array[2]] === null) {
56                    array_push($domain_stack, "http://www.carrollcountyohio.us/".$href_array[2]);
58                }
59            };
60        }
61    	
62    	if(isset($domain_stack[0])) {
63    	}
64    		
65    		
66    	
67     $domain_stack = array_unique($domain_stack);
68        $domain = $domain_stack[0];
69        unset($domain_stack[0]);
70        $domain_stack = array_values($domain_stack);
70        $rounds++;
72      
73    
74    }
75     
76    $found_domains = $domain_stack;
77    foreach ($checked_domains as $key => $value) {
78        if ($key == 2) {
79            $found_domains .= $key."\n";
80        }
81    }
82     
83    file_put_contents($output_file, $found_domains);
84    ?></code>


Any help is greatly appreciated.
Posted
Comments
Sergey Alexandrovich Kryukov 12-Aug-14 11:54am    
What is "underfined offset"? How about exact and complete exception/error information? Are you really sure it's in this line?
The only potential problem I can see is when $domain_stack is empty...
—SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900