Efficient data entry through browser automation
Table of content
I am a .NET trainer and 80 students attended my course...
Do you imagine ? I had to give marks on an exam to 80 students, BY HAND AND KEYBOARD
I have 3 fears in the world : doing nothing, doing a repetitive task, and spiders.
When I can, I prefer to create exams myself, so I just instruct to my students that
they need to do whatever they can to make unit tests pass : 1 question, 1 unit test,
1 point. Simple, effective, pragmatic, scalable, and I teach them the real use case
of unit tests.
"Don't waste your (my) time twice (80 times) with the same bug, find it once, write
a test, let it pass and move on."
"I'm paid to make you a great developer, you will be paid to deliver value in softwares,
we will not be paid to fix bugs." (Ok... in reality we are often paid to fix bugs)
But this time... this time someone else have created the exam... and without unit
testing ! 80 exams !
So, as a conscious developer, I made my computer works for me... with 2 visual studio
- The first was a XAML parser/code parser to mark exams automatically. (Have you created
a Button in a Grid in your xaml file ? 1 point)
- The second was a web crawler that enter all marks automatically in the website of
It was funnier than doing all by hand... and guess what ? Students never knew it
until I told them ! :)
"Why I got this mark ??"
"Let me check my notes (logs)... you forgot to use a Pivot, - 1, you did not use
transitions -2, your button has no event handler -1..."
"Ok stop I understand ! Wow you took note on each of us and all of that in one day
"You bet ! I worked very hard this night ! Exhausting !... I release it under MS-PL,
do you want the sources ?"
But what interest us today is the second project... the browser automation crawler
This crawler passed the relevance test : it has already two customers, me and another
trainer too lazy to copy marks from excel to school's website.
You may ask : "Why would you browser automation ? why don't you use a simple, classic
HTTP crawler and emit HTTP requests ?"
As a teacher/trainer, I need to enter marks in a website written in ASP.NET/AJAX,
so HTTP requests were very complicated to create.
Even with a tool like Fiddler and the fantastic
Request to Code plugin, I didn't managed to hack the right HTTP requests
to send marks after 20 min.
So I took another approach : browser automation.
With HTTP crawler you say :
- Make a POST to LOGIN URL, with parameter Login="Nicolas" and Password="Password"
- Save the cookie
- Generate other HTTP requests with this cookie...
With browser automation you say :
- Click on the "loginBox" enter Nicolas
- Click on the "password" enter Password
- Click on the submitForm button
- Click on next
- Click on the ddl dropdownlist and wait I make a choice
- Click on continue...
In summary, you use the browser to make requests for you.
Disclaimer to CodeProject admins : I never used such shameful method to artificially
have more vote... You can check by yourself with the pitifull vote number I get
for my articles... That's the first time !
I know that all of you are very busy BUT very curious. So this use case is about
using your curiosity to give me a 5.
So with a classic crawler you might say :
- Get request to http://www.codeproject.com/Articles/338036/BrowserAutomationCrawler
- POST request with login parameter and password parameter to action "submitLogin"
- Save the cookie
- POST vote=5, articleId=MyArticleId to action=Vote, with the cookie
It might work, except that I completely make up the parameters and action name and
so you would need fiddler to fine tunes the requests correctly. (And depending on
the website it can be very very hard, especially with AJAX stuff)
Another way to do the same thing is to say :
- Go to http://www.codeproject.com/Articles/338036/BrowserAutomationCrawler
- If "logout" is present then you are already logged,
Wait that I click on sign in (so I can manually fill email and password)
- Then click the option vote 5
- Fill the comment textbox with "5 for me, great Nicolas, thanks for you work ! :)"
- Click on vote button
Go ahead download the sources
and try yourself !!!
Let's check the code
First I have to instantiate a
WebBrowser (the control I use to automate
the web browser), and setup the url of my article, the vote you give me, the comment, and maybe your credentials for CodeProject (optional, since you will be able to enter them manually).
static void Main(string args)
Form form = new Form();
form.Width = 1024;
form.Height = 780;
WebBrowser browser = new WebBrowser();
browser.Dock = DockStyle.Fill;
WebBrowser = browser,
ArticleUrl = "http://www.codeproject.com/Articles/338036/BrowserAutomationCrawler",
Rating = 5,
Comment = "5 for me, great Nicolas, thanks for you work ! :)"
Then here is the code to vote automatically:
public class VoteCrawler : Crawler
public string ArticleUrl
public string Comment
public int Rating
public string Email
public string Password
protected override void Automate()
Click("ctl00_RateArticle_VoteRBL_" + (Rating - 1).ToString());
Fill(new ClassSelector("RateComment"), Comment);
var isLogged = Actions.Ask(() => WebBrowser.Document.GetElementById("ctl00_MemberMenu_Signout") != null);
if(Email != null)
if(Password != null)
var submit = new IdSelector("subForm").SelectChildren(e => e.GetAttribute("type") == "submit");
if(Email == null || Password == null)
If IE has saved your cookie, you don't need to enter login/password, it will just
vote for you.
I just inherit from the
Crawler class and override
the code is self descriptive.
var isLogged = Actions.Ask(()=> WebBrowser.Document.GetElementById("ctl00_MemberMenu_Signout") != null);
You can see that I use the
WebBrowser class to do my stuff, with
. As I will explain in under the hood,
I use a winform component. Automate does not run in the UI thread, Actions will
invoke the action inside the UI thread.
The question to easily find id's of HTML elements like ctl00_RateArticle_VoteRBL_4,
and what if there is no id ?
Finding id's is easy, with chrome right click on the element, inspect element and
it brings you where you need to go.
For more complex request you can use custom or built-in
are implicitely converted to
For example, to fill the comment box once you click on the 5 option, here is the
code to select and fill it : Fill the textbox with class RateComment.
Fill(new ClassSelector("RateComment"), "5 for me, great Nicolas, thanks for you work ! :)");
Here is the class Crawler and selectors :
ok it takes more words of article to describe than words of code, since the code
itself is only approximately 300 lines.
How it works ?
System.Windows.Forms.WebBrowser is a class to embbed browser inside
The interesting part is that it is a wrapper around COM interface of IE7. (or IE6
I don't remember)
For exemple, in
in the browser.
public Crawler Click(Selector selector)
Or I modify the DOM directly :
public Crawler Fill(Selector selector, string value)
selector.ForEach(WebBrowser.Document, e => e.InnerText = value);
WebBrowser is a winform component, so it run on the UI thread. Since
Automate should run sequentially but without blocking the running thread,
I run it on a separate thread.
public void Crawl()
if(WebBrowser == null)
throw new InvalidOperationException("WebBrowser should be affected");
AutomationActions is just a wrapper around the
or the UI thread.
public class AutomationActions
AutoResetEvent _AutoReset = new AutoResetEvent(false);
_UiContext = SynchronizationContext.Current;
public T Ask<T>(Func<T> request)
T result = default(T);
result = request();
public void Do(Action action)
public void Do(Action<Action> action)
if(_UiContext == SynchronizationContext.Current)
throw new InvalidCastException("Cannot call AutomationActions in the UI thread");
Do(Action<Action> action) allow to move on the next action when
the UI thread decide it by calling the Action parameter.
I use it for the
WhenLoaded action for example.
public Crawler WhenLoaded()
WebBrowserNavigatedEventHandler onNavigated = null;
onNavigated = (s, e) =>
WebBrowser.Navigated -= onNavigated;
WebBrowser.Document.Window.Load += (s1, e2) =>
WebBrowser.Navigated += onNavigated;
The obvious limitation of this method of crawling is that it is not suited for web
spider or text mining application that need to browser large data.
It is very well suited to code semi automated data entry.
The other limitation is the
WebBrowser use IE76 or 7...
do you know
how I can use another browser instead ? it can make an interesting interoperability
Selenium is over fitted to the use case of testing, that explain why it has some friction as a tool for assisted/semi-automated data entry.