Click here to Skip to main content
Click here to Skip to main content

GrabberProxy: Retrieve any web content via AJAX, from any URL(domain), in any browser, without XSS error.

, 5 Jan 2014 CPOL
Rate this:
Please Sign up or sign in to vote.
Web Proxy provides a way to pull data from anywhere. Possibilities are limitless when everything is local.

Introduction 

If you've done much web development you've most assuredly run into the problem where you want to retrieve some content, but cannot due to XSS (Cross-Site Scripting) limitations.  I have created an extremely easy-to-use proxy which will retrieve the content of any web site, web service call, or other URI resource.  

Write Remote Web Content Into Your DIVs  

For example, suppose you want to load a <DIV> element in your web page with the contents found at Amazon.com.  You can do that with GrabberProxy.  Here's a nice snapshot of a very simple web page (source included in this article) which shows that going on. 

 

When you try this out and load that page and scroll through the remote content (via scrollabel DIVs), I'm hoping you'll find it to be as cool as I did.  

The Web Is an RPC (Remote Proc. Call) Repository 

I also believe that thinking of the web as an RPC Repository is cool. In other words, if I can do a GET on a URL with a QueryString, then I can probably get what I want.  The functionality to get the data is already out there, but it is rarely exposed properly.  Even when it is, I am up against the XSS challenge, since the data is on another domain.  And, yes, this also works with MS MVC.  Just point it at the Controller URL and get your data. 

Testing / Designing / Building 

The main reason I wanted the GrabberProxy is because I wanted to be able to easily retrieve some kind of data and then manipulate it for my purposes.  Instead of always working out some way to do that now all I have to do is run the GrabberProxy and point it at a URL.  Voila!   I have the content I want to examine for testing, building and designing new ideas. 

Background 

I'm currently learning AngularJS which of course means I'm learning more JavaScript, jQuery, BootstrapJS all at once.  AngularJS allows me to retrieve models via services (via XHR) and I wanted to be able to retrieve from any source at any moment while testing.   

Yes, Virginia, My Example Web Page Is Ugly 

There are a few things to get past when reading this article: 

  1. My sample web page is ugly.  Graphic design / CSS styling is not the point of this article. 
  2. This issue may be a bit controversial.  Maybe you have thought of a better way to get around this limitation.  I'd really like to learn other ways, so please become a part of the codeproject.com community and write up your own article about how you could/would/should do it.  My main target for the GrabberProxy is as a heuristic* device.
  3. For some reason it does not work with CodeProject.com.  Weird.  Anyone have an answer to that?  Is it a robots.txt thing?  I dunno. 
  4. Some sites overflow -- out of the <DIV> and leak into the page.  Kind of neat.  Yahoo.com is a good example. 
  5. You may get weirdness as the injected site's CSS does weird things to the page as it alters styles.
  6. Yes, you can do something similar to what I've done using <IFRAME> but you cannot get the data that loads in the <IFRAME>.  GrabberProxy can get that data.  Cool, right?  Right.  
 *serving to indicate or point out; stimulating interest as a means of furthering investigation.

Note On Two Additional JavaScript Libraries

You will notice that my GPSample.htm includes the use of two JavaScript libraries:

  1. Twitter BootstrapJS  v2.3.2 (see http://getbootstrap.com/2.3.2/) I use this simply to make a few things prettier.   It is very small and included in the project so you don't have to worry about getting it.  I've placed it in the \3rdPartyLibs directory. 
  2. jQuery  (see jQuery.com) I use it via the CDN so you don't have to worry about downloading it either. I use this to make development faster.  It's much easier to manipulate the DOM using jQuery. 
Keep in mind, however, that you do not have to use those libraries at all in order to implement the GrabberProxy.  w00t! Freedom!

 

Using the code   

ASP.Net / IIS  

This is an ASP.Net web site so you need .NET and IIS to run the GrabberProxy.  If you have any version of Visual Studio then you should be able to set it up easily by simply opening the source project and running it.  Visual Studio will do the rest (start mini-IIS or whatever they call it).

There is really only one file we will focus on in the GrabberProxy solution:  GrabIt.aspx.cs 

The GPSample.htm is a sample page to show you how you might use the proxy. 

Simple Code Retrieves Web Resources 

The code in Grabit.aspx.cs is very simple and if you've read my other article (DragonSharq Text Web Browser) the code is almost an exact duplicate (http://www.codeproject.com/Articles/594154/DragonSharq-Web-Browser-Safe-Browsing-Source-Viewi ).  

Let's take a look at the code that does the work.

Step 1 (of 3): Page_Load

When Grabit.aspx loads, we grab the target URL off of the QueryString and call the GetWebSource(string URL) method.  Simple.  

protected void Page_Load(object sender, EventArgs e)
        {
            if (Request.QueryString.Count <= 0)
            {
                string outError = string.Format("<p>Please provide a URL on the Querystring.</p>.<p>Example ?url=http://www.amazon.com</p>");
                sbOutput.Append(outError);
                return;
            }
            GetWebSource(Request.QueryString["url"]);
        }  

Step 2 (of 3): Retrieve the data from the target URL 

There's not a lot of error checking, however if the URL is an empty string I just return from the method.

After that we set up an HTTPWebRequest.  To do so you have to call the .Create() factory method which generates a HttpWebRequest object.  It is odd however that you have to cast that object or it won't work. 

Emulate A Browser 

Next you'll see that I set the webreq object's UserAgent property to a string which I Googled and found online.  Those UserAgent strings will make a web server respond as if the GrabberProxy is that specific web browser.  In this case I simply emulate IE 9.x running on Windows 7. 

I am simply doing it so I can get an expected response from the server. 

Read From the Stream 

Finally, you can see that I simply read the lines as strings as they come in from the server.  As I read each one I append to a StringBuilder object which performs far better than concatenating to a string (but that's a whole other planet to explore). 

When the GetWebSource method returns you've read the entire stream from the target URL, but you still haven't given it to the requester.  That's where the ASP.Net Page object's Render() method comes in. 

private void GetWebSource(string strUri)
        {
            if (strUri.Length == 0)
                return;
            System.Net.HttpWebRequest webreq;
            System.Net.WebResponse webres;
            try
            {
                webreq = (System.Net.HttpWebRequest)System.Net.HttpWebRequest.Create(strUri);
                // set UserAgent to emulate MSIE on Windows 7.
                webreq.UserAgent = @"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)";
                webres = webreq.GetResponse();
            }
            finally { }
            Stream stream = webres.GetResponseStream();
            StreamReader strrdr = new StreamReader(stream);
            string strLine;
            while ((strLine = strrdr.ReadLine()) != null)
            {
                sbOutput.Append(strLine);
            }
            /*sbOutput = sbOutput.Replace("<html>", string.Empty);
            sbOutput = sbOutput.Replace("</html>", string.Empty);
            sbOutput = sbOutput.Replace("<body>", string.Empty);
            sbOutput = sbOutput.Replace("</body>", string.Empty);*/
        } 

So far, this is all really simple, isn't it?  It only gets simpler from there.  Yes, seriously. 

Stick with me and I'll finish up the explanation of what the code does and then I explain how this all helps with the sample (GPSample.htm).   

Step 3 (of 3) Write Target URL's Data To Requester  

Override Page Object's Render Method 

Sweet! Only one line of code in the Render() method.    

       protected override void Render(HtmlTextWriter writer)
        {
            Response.Write(sbOutput.ToString());
        } 

The Render() method is a function of the ASP.Net Page object that most people don't mess with.  That is because they are scaredy-babies who aren't really programming.  Just kidding.  This method is what normally spews the HTML that has been generated during the Page object processing.  But we don't need none o' dat.

We just want to spew the HTML that we retrieved from the target URL.  So I've overridden the method and hijacked the Page object's normal functionality.  It is easy and fun.  But, if you never heared of no such thing as the Render() method, you wouldn't know they was all this good stuff.  Now you know.  Smile | :)  Run and tell all your friends so they can start really programming.  Here comes the good part: using the GrabberProxy! 

GrabberProxy In Action 

XSS : The Problem 

To understand the solution, you must understand the problem.   It's vice versa too. 

The issue is that any time you want one page to get data or retrieve the source of another page the target must be hosted from the same domain as the source.   

Domain and Port: Must Be Same 

That means if you are running in a test web server and the domain / port number looks like : localhost:8235, then any other  data you want to retrieve must be served from that same domain / port.  Bummer, dude.  Cuz sometimes I just want to grab something else from far off lands.  Now you can, with GrabberProxy. 

jQuery .Load Method 

Let me say all of this in a more specific way.  Have you ever used the jQuery Load method?  It's pretty cool, but it is really just a wrapper for the old XHR (XmlHttpRequest : aka AJAX).  And since it is AJAX (which should never be capitalized) the data you're targeting must be on the same domain.  Don't bum me out, dude.

The jQuery Load Method lets you load content into an element on your web page and the call looks like the following:

 $('#result').load('ajax/test.html'); 

That just means : Make a call to load the file found (on the local server) at ajax/test.html, then select the DOM element with id="result" and copy the data found at the URL (ajax/test.html) into the element.  Note that is a local URL.  It has to be and if you use something like: http://www.yahoo.com it will fail - and fail silently, because the browser blocks the request. 

Here's Exactly How Grabber Proxy Solves the XSS Problem 

That is why I've created this great Pass-Thru GrabberProxy.  Now, when you request a remote URL, you actually

  1. make that request to a local domain where your GrabberProxy is running
  2. pass the remote URL into the GrabberProxy (so it knows what you want to retrieve)
  3. The GrabberProxy will retrieve the remote data and load it into the StringBuilder object you saw earlier.  That StringBuilder object is built in the context of the local GrabIt.aspx page and then spewed to the requester as if it is a local page that is being requested. Bing, bang, sha-boom! You have just been hit by a smooth operator.  The remote source has been localized and your XSS problem is solved.
Building Your URLs To Use the GrabberProxy

Once you've set up IIS or you've opened the ASP.Net project (available with this article) then you'll be able to use your browser to test some URLs.

Here are some samples you can try:

http://localhost:<port>/GrabIt.aspx?url=http://www.yahoo.com   // yahoo.com loads in the page.

http://localhost:<port>/GrabIt.aspx?url=http://www.amazon.com // amazon.com 

Note: <port> is the port number that your web project uses (gen'd by Visual Studio when you run.

Warning: Not Much Error Checking - notice that you must pass the "http://" along with the URL.  You can work on handling all of those options.

GPSample.htm

You will also find the GPSample.htm in the project and you can load it from your web server project.  

In that  sample I have implemented the jQuery Load (which I spoke of above) to load some DIVs with remote web sources and even some json data.

Load up that page and you'll see something like what shows up in the image below:

GPSample.htm - Using GrabberProxy 

Load All Button 

If you click the [Load All] button, then by default I will go out and grab Amazon.com and load it into DIV1 and TechnologyReview.com (MIT Tech Magazine) into DIV2 and finally I will request an English dictionary lookup of the word "heuristic" and return JSON into DIV3. 

Asynchronous Too: w00t! 

Keep in mind that when it loads those three DIVs that they are all happening at the same time (basically) because they are asynchronous AJAX calls.  Wow. 

JSON Dictionary Lookup : My Web Remote Procedure Call

That last one comes from my web site: http://NewtonSaber.com where I've written yet another ASP.Net component which looks up any English word in a dictionary and returns the definition.  Freaking amazing, yes? Uh, yes.  Smile | :)  

Doing Something Different
Text Box 1: Valid URL 

If you type a valid URL in the first text box shown, then it will load that target's HTML into DIV1 when you click the [Load All] button. DIV2 will still load the TechnologyReview.com site, unless you alter it on the page. 

Text Box 2: Valid English Term  

If you type any valid English Term into Text Box 2, then the term will be looked up in the English dictionary and returned into DIV3.  

Try it out.  If you are too excited to even get the source and run your own server you can point your browser at the place where I've deployed this on my own web site, but it won't be near as earth-shattering.  You have to see the XSS defeated to really enjoy it.    

Too Excited To Wait? : Try It Here

http://NewtonSaber.com/temp/GPSample.htm 

jQuery Load Example 

You can examine the  GPSample.htm closer on your own, but here's the jQuery Load method as it calls it if you don't type a URL in the first text box:

         $("#mainContent").load("http://localhost:53326/GrabIt.aspx?url=http://amazon.com"); 

And here's the jQuery Load method for the 2nd one: 

 $("#2ndDiv").load("http://localhost:53326/GrabIt.aspx?url=http://technologyreview.com"); 

Finally, here is the one that looks up the term.  You can see that it actually calls a service at my web site to do that.  And on that service it also passes in the term.  Fairly cool.

$("#jsonLoader").load("http://localhost:53326/GrabIt.aspx?url=http://newtonsaber.com/temp/deflookup.aspx?term=" + lookupTerm); 

Important Issues For Deployment 

I've set up the Visual Studio solution so that it will always use the same port: 53326.

You can see where you set that in the Visual Studio Project in the following snapshot:

Grabber Proxy setup in Visual Studio (port for localhost) 

I have done that so you will not have to alter the base URLs in the GPSample.htm.  Obviously, if you run the service from any other URL (domain and/or port change) you will have to change those parts of the URL.

For example, if you run it from http://YourSite.com/MyThing/ then you'd have to change all of the URLs in the project from http://localhost:53326/GrabIt.aspx

to http://YourSite.com/MyThing/GrabIt.aspx.  I had to do this exact thing for the GPSample.htm that runs at my NewtonSaber.com site.

You should have everything you need to solve the XSS problem now and I think it is fairly cool.  Hopefully you do too.

Now you can use it to test and learn and do a whole lot more.   

Bonus Quickie: Lookup Any English Term 

Try the following URL and then try altering it with your own term to see how cool it is to look up any English word's definition:

http://newtonsaber.com/temp/deflookup.aspx?term=factorization 

History  

Version 1.0.0.0 released on 08/22/2013


License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

newton.saber
Architect
United States United States
My newest book is Learn Python, Think Python (amazon link opens in new window/tab)
 
My previous book is Object-Oriented JavaScript (See it at Amazon.com)
 
My book, Learn JavaScript - amazon.com link is available at Amazon.
 
My upcoming book, Learn AngularJS - Think AngularJS, will be releasing later in 2014.
 
You can learn more about me and my other books, at, NewtonSaber.com
Follow on   Twitter

Comments and Discussions

 
GeneralMy vote of 3 Pinmembersjelen6-Jan-14 2:15 
GeneralRe: My vote of 3 Pinmembernewton.saber6-Jan-14 2:59 
GeneralRe: My vote of 3 Pinmembersjelen6-Jan-14 4:32 
GeneralRe: My vote of 3 Pinmembermbowles2017-Jan-14 5:34 
GeneralRe: My vote of 3 Pinmembernewton.saber8-Jan-14 8:49 
GeneralMy vote of 1 PinmemberTheQult5-Jan-14 14:34 
GeneralRe: My vote of 1 Pinmembernewton.saber8-Jan-14 8:56 
Obs tamf blitterschuss es flimtle taf eigenstacht. Iffen das fundustat meegenzein de la plaza.
Unt bassenfeffer shlimptenstein fur vergunzee dallisfung. Graziaendo ala fibtingiz.
āýÒÕçø ăïÄ®ªʁɷɏȾɗ ɶɎȺʝʝ
QuestionHttpHandler PinmemberJoaopaulocarreiro4-Sep-13 6:51 
AnswerRe: HttpHandler Pinmembernewton.saber4-Sep-13 8:33 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.141223.1 | Last Updated 5 Jan 2014
Article Copyright 2014 by newton.saber
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid