Suggestions have been incorporated. Kindly suggest, vote, comment to improve it.
Step by Step guide for scraping can be seen Here
Introduction
The Facebook graph API and many other Facebook C# SDKs allow the user to do Facebook tasks from the code. But all need to have an application key of a Facebook app, requires OATH Permission and keeping record of session tokens. I thought of doing some of the basic tasks like Login, Search, Status update, from the C# code with the help of HTTP GET and POST. For this purpose, I am using the mobile version of the Facebook website. The Response
is obtained by HTTPWebRequest
/Response
and HTML is parsed for the required data by Regex. The Response
where required is posted as HTTP Post
method.
Background
In my precious article Multi-Threaded WebScraping in CSharpDotNetTech, I explained web scraping techniques using Regex, WebClient
and HTTPWebRequest
and Response
. Based on techniques explained in that article, the attached piece of code has been written. The purpose of the article is to explain
- how to HTTPWebRequest and HTTPWebReponse
- How to GET and POST data to the websrever
- How to collect Cookies
- How to Maintain Session
Class Methods
Name
|
Description
|
public static Facebook Login(string username, string password) |
Fetch the Cookies of the WebBrowser Control and store in Cookie Collection to maintain session
On Success returns the Facebook Object, by which further public methods can be called
On Failure, Shows the Response HTML in Default Browser
|
public bool StatusUpdate(string txt) |
Updates the Text Status on User’s Timeline
|
public void UploadPhoto(string filepath, string caption) |
Uploads the Picture from given path to the mobile uploads folder of the current user profile with specified caption
|
public Dictionary<string, string> SearchGroups(string keyword, int scount) |
Performs Search for Groups for the given keyword, and returns Dictionary of GroupID and Name from the specified start count of search result
|
public Dictionary<string, string> SearchPages(string keyword, int scount) |
Performs Search for Pages for the given keyword and returns Dictionary of URLs and Name of Pages from the specified start count of the search result
|
Using the Code
Performing Login
Facebook fb = Facebook.Login("USERNAME", "PASSWORD");
Status
fb.StatusUpdate("Text Status");
Upload Photo
fb.UploadPhoto(@"C:\Users\..\...\...\IMAGE.jpg", "Its the Caption");
Searching For Groups
Dictionary<string, string> groups;
int startCount = 1;
do
{
groups = fb.SearchGroups("diet coke", startCount);
startCount += groups.Count;
string txt = "";
foreach (string k in groups.Keys)
txt += k + "\t\t" + groups[k] + Environment.NewLine;
MessageBox.Show(txt);
} while (groups.Count > 0);
The same way you can Search for Pages.
How the Code Works?
-
Before doing login by C#, let's perform login in Mozilla and analyze the HTTP Header by LiveHTTPHeaders. You can install the LiveHTTPHeader Adon from the Firefox site. Start LiveHTTPHeader, browse to http://www.facebook.com, enter username and password, click Login button. The HTTP Web Request sent by the Browser will look something like this:
-
The First Line is the URL to which your username and password are being sent (later in Example 3, we will see how to find this URL). Second line tells the HTTP method and version used, which is POST and 1.1 respectively.
-
Then all the fields are just like normal HTTP Header as we saw in Example 1. The important stuff starts from Cookie Header, in Example 1, once we browse to http://www.facebook.com, there was no Cookie Header where as we received some Cookies in the Response Header, now when we click on the Login Button, the previously received set of Cookies is being sent in this Cookie Header.
-
Next Header shows Content Type, there are two major content types used to POST data, application/x-www-form-urlencoded and multipart/form-data. You can find more information about these here .
-
Next Header shows Content Length and in last line Content is being shown. You will see your email address and password in this line. Actually, the last line shows the data which is being sent to the server by HTTP Post
method.
-
There are several other values also, later in the example, we will see what these values are and from where to obtain these values!!!
-
Let's examine the Response Header for the above Request.
-
The Response Header shows a lot of Cookies, these are the Cookies which are issued by the server on successful login, now for any subsequect request, the browser will send these Cookies to the server and in this way session will be maintained.
-
Got to Tools->Clear Recent History and delete the Cookies, then try to browse to your Facebook profile page, and you will see that you will be redirected to the Facebook login page.
- Now let's create the same login Request header as we saw in the above screenshot and test that either we are able to successfully log in or not.
string getUrl = "https://www.facebook.com/login.php?login_attempt=1";
string postData = "lsd=AVo_jqIy&email=YourEmailAddress
&pass=YourPassword&default_persistent=0&
charset_test=%E2%82%AC%2C%C2%B4%2C%E2%82%AC%2C%C2%B4%2C%E6%B0%B4%2C%D0%94%2C%D0%84&
timezone=-300&lgnrnd=072342_0iYK&lgnjs=1348842228&locale=en_US";
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Firefox 15.0.0.1";
getRequest.CookieContainer = new CookieContainer();
getRequest.CookieContainer.Add(cookies);
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.ProtocolVersion = HttpVersion.Version11;
getRequest.AllowAutoRedirect = false;
getRequest.ContentType = "application/x-www-form-urlencoded";
getRequest.Referer = "https://www.facebook.com";
getRequest.KeepAlive = true;
-
The getUrl
is assigned to the address to which data will be posted, postData
variable is copy of the Content from above HTTP Request Packet. Then we have created an HTTPWebRequest
Object, and set its User-Agent Header.
-
The Cookies which we received in Response to the Request for http://www.facebook.com are added to the HTTPWebRequest
object, if we don't add these Cookies, then instead of entertaining our request for login, Server will redirect us to Login page. Next we are setting HTTP Method to Post and Version to 1.1 (used for HTTPS).
-
Setting the AllowAutoRedirect
property to false
for requests in which we try to login is very important, if this property is set to true
, then the HTTPWebRequest
object will follow the Redirection Responses. And during the redirections, you may lose access to the Cookies which server sent in response to Login Request.
-
Now let's send the Login Info to the Server.
byte[] byteArray = Encoding.ASCII.GetBytes(postData);
getRequest.ContentLength = byteArray.Length;
Stream newStream = getRequest.GetRequestStream();
newStream.Write(byteArray, 0, byteArray.Length);
newStream.Close();
- Data is written to stream, now let's get the Response and see what all Cookies we receive:
-
We successfully logged into the system and received 9 Cookies, the snapshot above shows very little info about the received Cookies, you can get more info by accessing the properties of the Cookies. Add the received Cookies to globally defined CookieCollection so that it can be used in subsequent requests. How to Check Login was Successful or Not? Normally Cookies Count is an easy way to determine whether Login was successfully or not, to be more sure, you can try getting HTML of Home Page, if you are not redirected to Login Page, that means you are successfully logged in.
This way, you can inspect post data for each task and write Regex to find that data in the HTML of the page and subsequent post data request can be generated. Most of the functions above use this mechanism.
Points of Interest
Although these are very basic functions, but with understanding of these, code can be enhanced to perform more advanced functions like joining the groups, liking the page, getting stories from timeline, posting images to groups and pages, commenting on posts, sending friend requests, etc.
Note: The Facebook class uses Mozilla Firefox useragent, you shall replace it with user agent of your browser which you use to browse Facebook.
Use of code for any SPAM activity may get your account blocked.