Click here to Skip to main content
Click here to Skip to main content

How To Submit Links and Data to DZone and Other Sites Programmatically

, 10 Feb 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
Article describing step by step guide on web scrapping and use of HttpWebRequest

Introduction

This article is more of a guide on how to programmatically execute some actions on web sites. I took an example of submission of links to DZone to illustrate this concept. These days, there is a lot of emphasis on submitting or sharing your post, links, articles with the whole community so I thought this example of DZone link submission will work out good. At the heart of implementing this whole concept are the HttpWebRequest and HttpWebResponse objects. Since I am using the .NET Framework, I mentioned these classes. But behind the scenes, it is as simple as sending an HTTP request and analyzing the response. So you can use whatever tool you have at hand. I will explain each step that I followed to come up with the solution. These steps pretty much work for all kinds of applications.

Use Web Site to Perform the Action

First you need to analyze what action you are performing and how the web site sends its request and what kind of response is returned. These two analysis steps are what drive this whole solution. Let's take an example of submitting a new link from the DZone site. You click on "Add a new link" and you are taken to a new page where it asks you to login. Then you login and you are sent to a page where you supply values for URL, Title, Description, Tags, etc. Then you click "Submit" button and you are done. So based on this, following are the steps that you need to perform programmatically.

  • Submit request to add link
  • Catch redirect to login page
  • Perform login into site
  • Send request to add page after unsuccessful login

Now to see what the browser is doing to perform all these actions, fire up tool like Fiddler and monitor all requests/responses for these actions. So if you can mimic these actions, you are good to go. Now let's see how you will perform this action programmatically.

Submit Add Request

You will be using the HttpWebRequest object to send a request to http://www.dzone.com/links/add.html. At this point, you do not have to worry about specifying any other parameters like Title, URL, etc. as your request is not going to go through because you are not logged into the site. In technical terms, you have not established an authenticated session with the site.

Catch Redirect To Login Page

When you send an unauthorized request to add a link, the site will redirect you to the login page. What this means is that when you send an HTTP request to access add.html page, the server sends an HTTP response with status code 302 which means that the response is being redirected. And with that response, it sends the redirection location in Location header in response. So programmatically you need to submit a request, look for the response status code and find the Location header. The code is as shown below:

static string GetLoginUrl(CookieContainer cookies, string targetUrl)
{
	int hops = 1;
	int maxRedirects = 20;
	bool foundIt = false;
	HttpWebRequest webReq;
	string loginUrl = targetUrl;
	do
	{
		webReq = WebRequest.Create(loginUrl) as HttpWebRequest;
		webReq.CookieContainer = cookies;
		webReq.AllowAutoRedirect = false;
		string msg = string.Format("Hope[(0) - {1}", hops++, loginUrl);
		Debug.WriteLine(msg);
		HttpWebResponse webResp = webReq.GetResponse() as HttpWebResponse;
		webResp.Close();
		if (webResp.StatusCode == HttpStatusCode.Found)
		{
			loginUrl = webResp.Headers["Location"] as String;
		}
		else
		{
			foundIt = (webResp.StatusCode == HttpStatusCode.OK);
			break;
		}
	} while (hops <= maxRedirects);
	return foundIt ? loginUrl : string.Empty;
}

Notice that the code is in a while loop, the reason being that some sites actually can redirect you to a couple of pages before sending you to the final login page. So I have limited the loop to 20 hops.

Cookies

This is the biggest part of the whole implementation. When you start a session with a site, it sends some cookies in response. And it expects some of those cookies sent in subsequent requests to make sure that you have an authorized session open. If you look at the code above, I have attached a CookieContainer object to request to make sure that all the cookies sent in response are collected. And then this container can be attached with subsequent requests.

Perform Login

When you perform login on site, it does a FORM submission to server with some key-value pairs that contain the data required to validate the user. You can use Internet Explorer Toolbar, FireBug or any other tool to inspect the HTML of the page to locate the FORM tag and values that need to be sent. I used FireBug to inspect that section to find out the values that I need. The following images show the result:

dzonelogin.PNG

dzonelogininspect.PNG

You can see that there is a FORM with POST action pointing to /links/j_acegi_security_check. And you will find that it has two text boxes with element names j_username and j_password that take login information and are used to submit data with POST request. So these are the pieces of information you needed to perform the login action. The following code shows how this is accomplished:

RequestAttributes reqAttribs = new RequestAttributes();
reqAttribs.OverrideConfigurationSettings = true;
reqAttribs.AllowSecureSiteCrawl = true;
reqAttribs.AutoRediectEnabled = false;
reqAttribs.MaxRedirects = 100;
reqAttribs.IsPost = true;
reqAttribs.RequestUrl = "http://www.dzone.com/links/j_acegi_security_check";
reqAttribs.CookieContainer = container;
reqAttribs.RequestParameters.Add("j_username", "xxxxxx");
reqAttribs.RequestParameters.Add("j_password", "xxxxxx");
HttpProtocol obHttp = new HttpProtocol(reqAttribs);
HttpProtocolOutput obOutput = obHttp.GetProtocolOutput();

Did Login Succeed?

After you executed the above request and got the response back, now the big question you will ask is how do I check if the login succeeded or not. You can't rely on status code of response because if it will be 200 means request succeeded. There are a couple of things that you can check. Some sites will redirect you to a landing page so you can check if you got 302 response code. Or a sure way to check is to parse the response and see if you have a login box on the page. For example in case of DZone.com site, you can check if there is a markup node on the page that has name attribute with value of j_username or any markup that is unique to the login page. If you will find that node, that means login did not work. Here is some sample code that I used for my application.

static bool CheckLoginStatus(HttpProtocolOutput loginRespOutput)
{		
	ParserStream obStream = 
	  new ParserStream(new System.IO.MemoryStream
		(loginRespOutput.Content.ContentData));
	Source obSource = 
	  new InputStreamSource(obStream, null, 
		loginRespOutput.Content.ContentData.Length);
	Page obPage = new Page(obSource);
	obPage.Url = "http://www.dzone.com/links/j_acegi_security_check";
	Lexer obLexer = new Lexer(obPage);
	Parser obParser = new Parser(obLexer);

	HasAttributeFilter filter = new HasAttributeFilter("name", "j_username");
	NodeList oNodes = obParser.ExtractAllNodesThatMatch(filter);
	return (oNodes.Count == 0);
}

Submit New Request with Authorized Session

During this whole process of login and redirections, make sure that you keep the cookie container around so that it keeps collecting all the cookies. You are going to need this cookie container to send a request to submit your links. Now you just need to send a new POST request to the target URL with appropriate FORM parameters like titleURL and description.

Sample Project

A sample project and other pre-requisites for the code shown here are available at ByteBlocks.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

ByteBlocks

United States United States
No Biography provided

Comments and Discussions

 
GeneralHTMLParserPro2.dll Pinmemberstumay1115-Apr-09 15:17 
I downloaded every assembly from netomatrix, from the latest to the oldest, and can't fine the HTMLParserPro2.dll library in any of them. Maybe I'm just stupid.
GeneralGreat article but Pinmemberbadalpatel9519-Mar-09 18:58 
GeneralRe: Great article but PinmemberByteBlocks20-Mar-09 10:33 
GeneralRe: Great article but Pinmemberwazuba19-Aug-09 12:06 
GeneralRe: Great article but Pinmemberiradi21-Sep-10 10:03 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.141015.1 | Last Updated 10 Feb 2009
Article Copyright 2009 by ByteBlocks
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid