5,664,339 members and growing! (16,651 online)
Email Password   helpLost your password?
Web Development » ASP.NET » Howto License: The GNU Lesser General Public License

Web site crawling using Visual Studio 2008 webtest file

By Masayuki Tanaka

How to run webtest programatically
C#, Windows, .NET, ASP.NET, Dev

Posted: 11 Jun 2008
Updated: 11 Jun 2008
Views: 6,026
Bookmarked: 15 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
1 vote for this Article.
Popularity: 0.00 Rating: 3.00 out of 5
0 votes, 0.0%
1
0 votes, 0.0%
2
1 vote, 100.0%
3
0 votes, 0.0%
4
0 votes, 0.0%
5
Note: This is an unedited contribution. If this article is inappropriate, needs attention or copies someone else's work without reference then please Report This Article

Background

If you want to get another web site data, you can use HttpWebRequest and easily get page data via HttpWebResponse.But the site reuires form-based logon , Or you need some operation to get desired page, you should manage cookies and hidden variables like viewstate.This is pretty complicated and relyed on site behavior.

So, if you can use webtest file as a direction for crawling, you can easily record your operation and can eliminate programming for each site.

Overview

The step is

  1. Load webtest file and parse it.
  2. Send Request to web site, managing cookies.
  3. Receive Response from website and apply extraction rules.
  4. Send Next Request, when parameter name exists in context, replace parameter value with which in context.
  5. Receive Response, and Loop from 4.

Implementation

Load webtest file and parse.

        public void Load(string filename)
        {
            requests = new List<RequestData>();
            
            XmlDocument xmlDoc = new XmlDocument();
            xmlDoc.Load(filename);
            XmlNamespaceManager nsmgr = new XmlNamespaceManager(xmlDoc.NameTable);
            nsmgr.AddNamespace("a",
                "%22%22http://microsoft.com/schemas/VisualStudio/TeamTest/2006%22%22">http://microsoft.com/schemas/VisualStudio/TeamTest/2006");
            foreach (XmlNode node in xmlDoc.SelectNodes("//a:Request", nsmgr))
            {
                RequestData reqData = new RequestData();
                reqData.Method = node.Attributes["Method"].Value;
                reqData.Url = node.Attributes["Url"].Value;
                reqData.Encoding = node.Attributes["Encoding"].Value;
                foreach (XmlNode formPostParameterNode in node.SelectNodes(
                    ".//a:FormPostParameter", nsmgr))
                {
                    string paramName = formPostParameterNode.Attributes["Name"].Value;
                    string paramValue = formPostParameterNode.Attributes["Value"].Value;
                    reqData.FormPostParameters.Add(paramName, paramValue);
                }
                foreach (XmlNode queryStringParameterNode in node.SelectNodes( 
                    ".//a:QueryStringParameter", nsmgr))
                {
                    string paramName = queryStringParameterNode.Attributes["Name"].Value;
                    string paramValue = queryStringParameterNode.Attributes["Value"].Value;
                    reqData.QueryStringParameters.Add(paramName,paramValue);
                }
                foreach (XmlNode extractionRuleNode in node.SelectNodes(
                    "./a:ExtractionRules/a:ExtractionRule",nsmgr))
                {
                    ExtractionRuleData ruleData = new ExtractionRuleData();
                    ruleData.ClassName = extractionRuleNode.Attributes["Classname"].Value;
                    ruleData.VariableName =
                        extractionRuleNode.Attributes["VariableName"].Value;
                    reqData.ExtractionRules.Add(ruleData);
                }
                requests.Add(reqData);
            }            
        }

Send Request to web site, managing cookies.Receive Response from website and apply extraction rules.

        public string Execute()
        {
            string result = null;
            try
            {
                foreach (RequestData reqData in requests)
                {
                    result = ExecuteRequest(reqData);
                }
            }
            catch (WebException ex)
            {
                Stream dataStream = ex.Response.GetResponseStream();
                StreamReader reader = new StreamReader(dataStream,
                    Encoding.GetEncoding(((HttpWebResponse)ex.Response).CharacterSet));
                result = reader.ReadToEnd();
                 
            }
            return result;
        }
        #region private
        private string ExecuteRequest(RequestData reqData)
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(BuildUrl(reqData));
            request.Method = reqData.Method;
            if (container != null)
            {
                request.CookieContainer = container;
            }
            request.AllowAutoRedirect = true;
            request.Accept = "*/*";
            request.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
            request.CookieContainer = container;
            //TODO
            //request.Credentials = CredentialCache.DefaultCredentials;
            if ("POST" == reqData.Method)
            {
                byte[] postData =
                   Encoding.GetEncoding(reqData.Encoding).GetBytes(BuildFormData(reqData));
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = postData.Length;
                using (Stream stream = request.GetRequestStream())
                {
                    stream.Write(postData, 0, postData.Length);
                }
            }
            
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            
            Stream dataStream = response.GetResponseStream();
            StreamReader reader = new StreamReader(
                dataStream,Encoding.GetEncoding(response.CharacterSet));
            string responseFromServer = reader.ReadToEnd();
            reader.Close();
            dataStream.Close();
            response.Close();
            if (reqData.ExtractionRules.Count > 0)
            {
                foreach (ExtractionRuleData ruleData in reqData.ExtractionRules)
                {
                    if (ruleData.ClassName.StartsWith(
                    "Microsoft.VisualStudio.TestTools.WebTesting.Rules.ExtractHiddenFields"))
                    {
                        ExtractHiddenField(responseFromServer,ruleData.VariableName);
                    }
                }
            }
            return responseFromServer;
        }
        private string BuildUrl(RequestData reqData)
        {
            if (reqData.QueryStringParameters.Count == 0)
            {
                return reqData.Url;
            }
            StringBuilder builder = new StringBuilder();
            builder.Append(reqData.Url);
            bool firstParam = true;
            foreach (string key in reqData.QueryStringParameters.Keys)
            {
                if (firstParam)
                {
                    firstParam = false;
                    builder.Append("?");
                }
                else
                {
                    builder.Append("&");
                }
                builder.Append(key);
                builder.Append("=");
                string value =reqData.QueryStringParameters[key];
                if (Context.ContainsKey(value))
                {
                    value = Context[value];
                }
                builder.Append(value);
            }
            return builder.ToString();
        }
        private string BuildFormData(RequestData reqData)
        {
            if (reqData.FormPostParameters.Count == 0)
            {
                return string.Empty;
            }
            StringBuilder builder = new StringBuilder();
            bool firstParam = true;
            foreach (string key in reqData.FormPostParameters.Keys)
            {
                if (firstParam)
                {
                    firstParam = false;
                }
                else
                {
                    builder.Append("&");
                }
               
                builder.Append(HttpUtility.UrlEncode(key));
                builder.Append("=");
                string value = reqData.FormPostParameters[key];
                if (Context.ContainsKey(value))
                {
                    value = Context[value];
                }
                builder.Append(HttpUtility.UrlEncode(value));
            }
            return builder.ToString();
        }
        private void ExtractHiddenField(string source, string variableName)
        {
            string prefix = "{{$HIDDEN" + variableName + ".";
            string suffix = "}}";
            ParseHTML parse = new ParseHTML();
            parse.Source = source;
            while (!parse.Eof())
            {
                char ch = parse.Parse();
                if (ch == 0)
                {
                    AttributeList tag = parse.GetTag();
                    if (tag.Name.ToUpper() == "INPUT")
                    {
                        if (tag["type"] != null && tag["type"].Value.ToUpper() == "HIDDEN")
                        {
                            string name = tag["name"].Value;
                            string value = tag["value"].Value;
                            Context.Add(prefix + name + suffix, value);
                        }
                    }
                }
            }
        }
        #endregion

Reference

Use below as a HTML Parser.
http://www.developer.com/net/csharp/article.php/2230091

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License

About the Author

Masayuki Tanaka


I am Software Developer currently working at global IT consulting firm in Tokyo, Japan.

I have 9 years of experience in software development especially in Web Application.

I want to improve my English writing and speaking skill.
So,I am looking for English - Japanese language exchange partner in Tokyo area.
If you have some interest, Please email me<m-tanaka@pp.iij4u.or.jp>.
Occupation: Software Developer
Location: Japan Japan

Other popular ASP.NET articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
 Msgs 1 to 1 of 1 (Total in Forum: 1) (Refresh)FirstPrevNext
GeneralReally superb...............memberShrijeshpk23:50 21 Aug '08  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 11 Jun 2008
Editor: Sean Ewington
Copyright 2008 by Masayuki Tanaka
Everything else Copyright © CodeProject, 1999-2008
Web15 | Advertise on the Code Project