CodeProject Statistics Calculator Using WebDriver

Anton Angelov

5.00/5 (2 votes)

Jan 18, 2017

Ms-PL

3 min read

6320

A tool for creating a report for author's articles for a specific year or from the beginning of time. Calculates the total number of views.

Download Application

Download full source code

Introduction

I decided to create a new series of articles dedicated to automation tools- Automation Tools Series. The first tool that I am going to share with you I built for my last article- Jump-start Your 2017 with the Best of Automate The Planet 2016. There I had to calculate the total views of all of my CodeProject articles. I did that exercise a couple of times the old fashioned way with a calculator. This time I told myself that I am too lazy to do that a third time and a new tool was born.

What Is the Problem?

I am a man that always has a plan, loves numbers and statistics. So to measure my success toward writing, I wanted to know how many times my articles are read in total. However, CodeProject doesn't provide such information. You can see only the current number of views per article.

The page has an RSS feed. Unfortunately, the views are not included, they are always equal to 0.

Initially, I planned to base my application on the RSS, but because of the mentioned drawback, I couldn't do it. So I needed to figure out another way to get the numbers.

Also, another requirement that I had for the tool was that I wanted to be able to extract the mentioned information only for a particular year and for the beginning of time. This way I could observe the popularity of my newest articles.

How to Use the Application?

I designed the app to be called from the Console with arguments.

A sheet in the following format will be generated based on your inputs. You can extract the articles' information for a particular year or from the beginning of time. The included columns are- Views, Title, PublishDate and Url.

CodeProjectStatisticsCalculator.exe --y -1 --p "allTime.csv" --i 11449574

The above line will generate a report containing the information about all of your articles from the beginning of time.

Arguments

--y -1 Using this argument you specify the year for which you want to generate the report. If you change -1 with 2016, the tool will extract only the data for this particular year.

--p "allTime.csv" After --p you need to specify the path and the name of the excel sheet where the data will be stored. It should be in CSV file format.

--i 11449574 After --i you need to specify the ID of your public CodeProject account. You can get it from the URL.

How Does It Work?

The tools is based on UI tests' automation using WebDriver. It utilizes the Page Object design pattern. The main logic is placed in the ArticlesPage class.

public partial class ArticlesPage : BasePage
{
    private string viewsRegex = @".*Views: (?<Views>[0-9,]{1,})";
    private string publishDateRegex = @".*Posted: (?<PublishDate>[0-9,A-Za-z ]{1,})";
    private readonly int profileId;
    public ArticlesPage(IWebDriver driver, int profileId) : base(driver)
    {
        this.profileId = profileId;
    }
    public override string Url
    {
        get
        {
            return string.Format("https://www.codeproject.com/script/Articles/MemberArticles.aspx?amid={0}", this.profileId);
        }
    }
    public void Navigate(string part)
    {
        base.Open(part);
    }
    public List<Article> GetArticlesByUrl(string sectionPart)
    {
        this.Navigate(sectionPart);
        var articlesInfos = new List<Article>();
        foreach (var articleRow in this.ArticlesRows.ToList())
        {
            if (!articleRow.Displayed)
            {
                continue;
            }
            var article = new Article();
            var articleTitleElement = this.GetArticleTitleElement(articleRow);
            article.Title = articleTitleElement.GetAttribute("innerHTML");
            article.Url = articleTitleElement.GetAttribute("href");
            var articleStatisticsElement = this.GetArticleStatisticsElement(articleRow);
            string articleStatisticsElementSource = articleStatisticsElement.GetAttribute("innerHTML");
            if (!string.IsNullOrEmpty(articleStatisticsElementSource))
            {
                article.Views = this.GetViewsCount(articleStatisticsElementSource);
                article.PublishDate = this.GetPublishDate(articleStatisticsElementSource);
            }
            articlesInfos.Add(article);
        }
        return articlesInfos;
    }
    private double GetViewsCount(string articleStatisticsElementSource)
    {
        var regexViews = new Regex(viewsRegex, RegexOptions.Singleline);
        Match currentMatch = regexViews.Match(articleStatisticsElementSource);
        if (!currentMatch.Success)
        {
            throw new ArgumentException("No content for the current statistics element.");
        }
        return double.Parse(currentMatch.Groups["Views"].Value);
    }
    private DateTime GetPublishDate(string articleStatisticsElementSource)
    {
        var regexPublishDate = new Regex(publishDateRegex, RegexOptions.IgnorePatternWhitespace);
        Match currentMatch = currentMatch = regexPublishDate.Match(articleStatisticsElementSource);
        if (!currentMatch.Success)
        {
            throw new ArgumentException("No content for the current statistics element.");
        }
        return DateTime.Parse(currentMatch.Groups["PublishDate"].Value);
    }
}

The main workflow is placed in the public method GetArticlesByUrl. You need to specify the part of the URL that you want to load- #Article, #TechnicalBlog or #Tip. The page's constructor requires the profile's ID to build the whole URL.

Extract Title and URL

We locate all articles' rows and iterate through them. We find them through the below XPath expression.

public ReadOnlyCollection<iwebelement> ArticlesRows
{
    get
    {
        return this.driver.FindElements(By.XPath("//tr[contains(@id,'CAR_MainArticleRow')]"));
    }
}

For every row we extract the title and the URL. To do that we find the anchor element inside each row. Extract the URL from the href attribute and the title from the inner HTML.

Extract Views and PublishDate

To get the PublishDate and the Views' count we need to locate the statistics DIV.

public IWebElement GetArticleStatisticsElement(IWebElement articleRow)
{
    return articleRow.FindElement(By.CssSelector("div[id$='CAR_SbD']"));
}

We use XPath again to find this element inside the current row.

private string viewsRegex = @".*Views: (?<Views>[0-9,]{1,})";
private string publishDateRegex = @".*Posted: (?<PublishDate>[0-9,A-Za-z ]{1,})";
private double GetViewsCount(string articleStatisticsElementSource)
{
    var regexViews = new Regex(viewsRegex, RegexOptions.Singleline);
    Match currentMatch = regexViews.Match(articleStatisticsElementSource);
    if (!currentMatch.Success)
    {
        throw new ArgumentException("No content for the current statistics element.");
    }
    return double.Parse(currentMatch.Groups["Views"].Value);
}
private DateTime GetPublishDate(string articleStatisticsElementSource)
{
    var regexPublishDate = new Regex(publishDateRegex, RegexOptions.IgnorePatternWhitespace);
    Match currentMatch = currentMatch = regexPublishDate.Match(articleStatisticsElementSource);
    if (!currentMatch.Success)
    {
        throw new ArgumentException("No content for the current statistics element.");
    }
    return DateTime.Parse(currentMatch.Groups["PublishDate"].Value);
}

I decided that the easiest way to extract the information is to use Regex expressions.

Program's Main Body

class Program
{
    private static string filePath = string.Empty;
    private static string yearInput = string.Empty;
    private static int year = -1;
    private static int profileId = 0;
    private static string profileIdInput = string.Empty;
    private static List<Article> articlesInfos;
    static void Main(string[] args)
    {
        var commandLineParser = new FluentCommandLineParser();
        commandLineParser.Setup<string>('p', "path").Callback(s => filePath = s);
        commandLineParser.Setup<string>('y', "year").Callback(y => yearInput = y);
        commandLineParser.Setup<string>('i', "profileId").Callback(p => profileIdInput = p);
        commandLineParser.Parse(args);
        bool isProfileIdCorrect = int.TryParse(profileIdInput, out profileId);
        if (string.IsNullOrEmpty(profileIdInput) || !isProfileIdCorrect)
        {
            Console.WriteLine("Please specify a correct profileId.");
            return;
        }
        if (string.IsNullOrEmpty(filePath))
        {
            Console.WriteLine("Please specify a correct file path.");
            return;
        }
        if (!string.IsNullOrEmpty(yearInput))
        {
            bool isYearCorrect = int.TryParse(yearInput, out year);
            if (!isYearCorrect)
            {
                Console.WriteLine("Please specify a correct year!");
                return;
            }
        }
        articlesInfos = GetAllArticlesInfos();
        if (year == -1)
        {
            CreateReportAllTime();
        }
        else
        {
            CreateReportYear();
        }
        Console.WriteLine("Total VIEWS: {0}", articlesInfos.Sum(x => x.Views));
        Console.ReadLine();
    }
    private static void CreateReportAllTime()
    {
        TextWriter textWriter = new StreamWriter(filePath);
        var csv = new CsvWriter(textWriter);
        csv.WriteRecords(articlesInfos.OrderByDescending(x => x.Views));
    }
    private static void CreateReportYear()
    {
        TextWriter currentYearTextWriter = new StreamWriter(filePath);
        var csv = new CsvWriter(currentYearTextWriter);
        csv.WriteRecords(articlesInfos.Where(x => x.PublishDate.Year.Equals(year)).OrderByDescending(x => x.Views));
    }
    private static List<Article> GetAllArticlesInfos()
    {
        var articlesInfos = new List<Article>();
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#Articles"));
        }
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#TechnicalBlog"));
        }
        using (var driver = new ChromeDriver())
        {
            var articlePage = new ArticlesPage(driver, profileId);
            articlesInfos.AddRange(articlePage.GetArticlesByUrl("#Tip"));
        }
        return articlesInfos;
    }
}

There are a few important parts in the above code.

Arguments Parser

For the arguments parser, I used FluentCommandLineParser NuGet package. The below code defines which logic should be executed if a particular argument is specified.

var commandLineParser = new FluentCommandLineParser();
commandLineParser.Setup<string>('p', "path").Callback(s => filePath = s);
commandLineParser.Setup<string>('y', "year").Callback(y => yearInput = y);
commandLineParser.Setup<string>('i', "profileId").Callback(p => profileIdInput = p);
commandLineParser.Parse(args);

CSV Utility

I decided that the easiest way to create a CSV file from the list of articles' data is to use the CsvHelper NuGet package. We first create a TextWriter pointing to the CSV file path and then create the CsvWriter object. Finally, we call the WriteRecords method.

private static void CreateReportAllTime()
{
    TextWriter textWriter = new StreamWriter(filePath);
    var csv = new CsvWriter(textWriter);
    csv.WriteRecords(articlesInfos.OrderByDescending(x => x.Views));

Download Application

Download full source code

All images are purchased from DepositPhotos.com and cannot be downloaded and used for free.
License Agreement