Click here to Skip to main content
Click here to Skip to main content

Extract/Save Article Headers from Newsgroups

, 9 Sep 2006 CPOL
Rate this:
Please Sign up or sign in to vote.
A simple C# program that can extract/save article headers from Newsgroups
Sample Image - newsgroup.png

Introduction

This article introduces a program that can extract message headers from Newsgroups, and can export to text file or Microsoft Excel file.

Background

For a long time, I was looking for a tool that can extract information from newsgroup server, which can help in analyzing what topic people mostly focus on, who are concerning with what content, where are they from, who are the most active people in the newsgroup, etc. But unfortunately, I could not get one from the Internet. Some can be used, but they need payment. So, I decided to write a program myself.

NNTP Wrapper Class

There are already some articles on The Code Project describing NNTP commands. The one I mostly referred to is the article written by TY Lee. I extended a few of the classes to make them more suitable for exposing user’s interface.

Basically the classes follow the NNTP definition, and use network stream to communicate with newsgroup server. Class NewsgroupClient wraps the methods that can send commands to server and retrieve data correspondingly. For example, ListGroup() method lists the newsgroup created on the server. SelectGroup() method selects a group as the active one and retrieves article ranges. DownloadHeaders() method gets article headers from the current active group.

The connections are constructed in the same thread of UI, so the Application.DoEvents() method is inserted in some places of the code to make sure that the user can operate the application while data is transferring.

When each article header is retrieved, an event is fired so that the information can be displayed immediately.

User Interface

The user interface of this program is divided into two portions: the Newsgroups tree on the left side of the Form, and the article list on the right. Here I use SpringSys OrchidGrid to construct the main parts of the UI, because it can work in tree mode and support exporting data to Excel.

The left tree view has two levels, the top level is for the server nodes, the second level displays the newsgroups on each server.

After adding a Newsgroup server, a NewsgroupClient object will be created and stored in the server node. Meanwhile, all the newsgroups created on the server will be listed by calling the ListGroup() method. Obviously, one NewsgroupClient corresponds to a Newsgroup server and is responsible for all the later network communications.

The code below retrieves the Newsgroups from a server and adds them to the server node:

private bool _updating = false;
private void UpdateGroups(GridTreeNode node)
{
    // gets NewsgroupClient object from the tree node
    NewsgroupClient ngClient = node.Tag as NewsgroupClient;
    if (ngClient == null) return;
    
    _updating = true;
    this.StartProgress();

    // connects the server if necessary
    if (!ngClient.Connected)
    {
        string server = node.Data as string;
        this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);
        if (!ngClient.Connect(server))
        {
            _updating = false;
            this.StopProgress();
            MessageBox.Show("Error connecting to server " + server);
            return;
        }
    }

    // resets the children of the server node
    node.ClearChildren();
    int index = node.Row.Index + 1;

    // retrieves the newsgroups from the server
    this.lblMsg.Text = "Retrieving groups from server......";
    string[] newsgroups = ngClient.ListGroup();

    // adds the newsgroups as the child nodes
    ogServer.Redraw = false;
    for (int i = 0; i < newsgroups.Length; i++)
    {
        Row row = this.ogServer.Rows.Insert(index + i, newsgroups[i]);
        row.TreeImage = this.imageList1.Images[1];
    }
    ogServer.Redraw = true;

    this.lblMsg.Text = "Completed!";

    this.StopProgress();

    // adjust the column's width
    this.ogServer.AutoSizeColWidth();
    _updating = false;
}

The data of the tree will be persisted into a text file when the application is closed. Next run the program, and data will be restored from the text file. This is done by the PersistGroups() and LoadGroups() methods.

By checking the nodes, we can specify which groups are going to be explored before downloading the article headers. Here we use the check box node feature of the grid. The code below would make sure that checking a server node would check all the group nodes belonging to that server.

this.ogServer.Tree.CheckAction = TreeCheckAction.Children;

The following code browses each row of the tree grid and downloads only for the nodes that are checked.

private bool _downloading = false;
private void btnDownload_Click(object sender, EventArgs e)
{
    if (_downloading || _updating) return;

    try
    {
        _downloading = true;
        this.StartProgress();
        foreach (Row row in ogServer.Rows)
        {
            if (row.IsNode) continue; // skips none node row
            // skips the node that has already been visited
            if (row.UserData != null) continue; 
 
            if (row.TreeChecked == CheckState.Checked)
               DownLoadHeaders(row); // downloads headers for the specific newsgroup
        }
    }
    catch
    {
    }
    finally
    {
        this.StopProgress();
        _downloading = false;

        if (_currentClient != null)
        _currentClient._forceStop = false;
    }
}

NewsgroupClient _currentClient;
private void DownLoadHeaders(Row row)
{
    GridTreeNode node = row.Node;

    // gets NewsgroupClient from the node
    NewsgroupClient ngClient = node.Tag as NewsgroupClient;
    if (ngClient == null) return;

    // connects if necessary
    _currentClient = ngClient;
    _currentClient._forceStop = false;
    if (!ngClient.Connected)
    {
        string server = node.Data as string;
        this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);
        if (!ngClient.Connect(server))
        {
            _updating = false;
            this.StopProgress();
            MessageBox.Show("Error connecting to server " + server);
            return;
        }
    }

    // gets the group name
    string group = row[0] as string;
    if (group == null) return;
 
    this.lblMsg.Text = string.Format("Downloading from group {0} ......", group);
    try
    {
        // selects the group
        ngClient.SelectGroup(group);
 
        // downloads article headers
        // note: headers will be filled back to UI throw OnDownloadHeader event
        ArrayList headers = ngClient.DownloadHeaders
		(ngClient.CurrentGroup.LowID, ngClient.CurrentGroup.HighID);
        if (headers == null)
        {
            this.lblMsg.Text = "Download message header failed";
        }
        else
        {
            this.lblMsg.Text = "Success!";
            row.Style = _visitedStyle;
            row.UserData = "Visited";
        }
    }
    catch
    {
        this.lblMsg.Text = string.Format
		("Error happen while downloading from group {0}", group);
    }
    finally 
    {
    }
}

Our target is not only to display the article headers in a friendly user interface, but also the most important one is to export the data into a text file or Excel file for later processing. Fortunately, OrchidGrid has built-in support for data exporting, they are methods ExportToDelimitedFile() and ExportToExcel(). We don't need to write extra code for this functionality. Please look at the code.

You can also write your own exporting code if you like. In this application, I commented some code that would export only the email address of the article author to a text file.

During the network operations, progress bar and message titles are all active to indicate the progress. You can stop or cancel an operation at any moment as well.

If you need other header data in addition to “Subject,” “From”, “Date”, you can modify the ArticleHeader class and adjust the code for your project.

Try a Sample Server

Let’s try a newsgroup server for example – “msnews.microsoft.com”, it has a bunch of Newgroups, and some contain thousands of articles.

Input the server address “msnews.microsoft.com” and press the Enter key, the server and the newsgroups on that server are listed on the left tree. Check some newsgroups as you like and click the button “Download Message Headers”, you will get all the headers in the selected newsgroups. Then, you can export the headers to text or Excel file.

Please see the screen shot of this application at the top of this page.

Hope you like this tool and think it is useful.

History

  • 9th September, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Jacky S
Web Developer
China China
No Biography provided

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.141216.1 | Last Updated 10 Sep 2006
Article Copyright 2006 by Jacky S
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid