Click here to Skip to main content
15,886,106 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Ok, I'm making a program that will spider through a few sites, the problem I'm having is that some pages will return an error 404. The sites in question have a custom 404 error page which I NEED to get. I have tried a few different attempts at getting web sites sources and they all just return an error404 and no page source. How ever running the site in my browser I get the site.

If I run my program and fiddler as a proxy to see what is returned, it doesn't seem to be requesting the custom page. It just tries, fails and gives in.

My question:
How can I make my program return the custom 404 pages that the sites have?

Below is the most basic thing I have tried, still getting the same result as some of the more complex examples.

C#
using System;
using System.Net;
class Program
{
    static void Main(string[] args)
    {
        try
        {
            using (WebClient client = new WebClient())
            {
                string value = client.DownloadString("http://www.404errorpages.com/something");
                Console.WriteLine(value);
            }
        }
        finally
        {
            Console.WriteLine(===EOF===);
            Console.ReadLine();
        }
    }
}
Posted

1 solution

When you try to get the data from the site, an exception is thrown because the page is not found (your error 404) so you need to catch this exception and grab the data from the response stream.

So you'd need something like this:
string s;
try
{
  s = client.DownloadString("http://www.404errorpages.com/something");
}
catch (WebException e)
{
  if(e.Status == WebExceptionStatus.ProtocolError)
  {
    HttpWebResponse response = e.Response as HttpWebResponse;
    if (response != null)
    {
      //response.StatusCode contains the actual error code you will want to check
      //(HttpStatusCode.NotFound, in this case)
      StreamReader sr = new StreamReader(e.Response.GetResponseStream());
      s = sr.ReadToEnd();
    }
  }
}
console.WriteLine(s);


You might want to take a quick look over this[^] which should give you any info you might want on HttpWebRequest and HttpWebResponse
 
Share this answer
 
Comments
mosimo 3-Jun-10 6:50am    
Works great, didn't think of doing it that way. Thanks

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900