5,693,062 members and growing! (15,463 online)
Email Password   helpLost your password?
Languages » C# » General     Intermediate

Extract/save article headers from Newsgroups.

By Jacky S

A simple C# program that can extract/save article headers from Newsgroups.
C#, Windows, .NET, Visual Studio, Dev

Posted: 9 Sep 2006
Updated: 9 Sep 2006
Views: 7,897
Bookmarked: 10 times
Announcements
Loading...



Search    
Advanced Search
Sitemap
1 vote for this Article.
Popularity: 0.00 Rating: 3.00 out of 5
0 votes, 0.0%
1
0 votes, 0.0%
2
1 vote, 100.0%
3
0 votes, 0.0%
4
0 votes, 0.0%
5
Note: This is an unedited contribution. If this article is inappropriate, needs attention or copies someone else's work without reference then please Report This Article

Sample Image - newsgroup.png

Introduction

  This article introduces a program that can extract message headers from Newsgroups, and can export

to text file or Microsoft Excel file.

 

Background

 

  For a long time, I was looking for a tool that can extract information from newsgroup server, which

can help analyzing what topic people focus mostly, who are concerning what content, where are they

from, who is the most active people in the newsgroup etc.

 

  But unfortunately, I can’t get one from internet, or some can be used but they need payment. So, I

decided to write a program myself.

 

 

NNTP wrapper class

 

  There are already some articles in Code Project describing NNTP commands. The one I referred

mostly is the article written by TY Lee. I extended a little bit of the classes to make them more suitable for

user’s interface exposing.

 

  Basically the classes follow the NNTP definition, and use network stream to communicate with newsgroup

server. Class NewsgroupClient wraps the methods that can send commands to server and retrieve data

correspondingly. For example, ListGroup() method lists the newsgroup created on the server. SelectGroup

() method selects a group as the active one and retrieve article ranges. DownloadHeaders() method gets

article headers from the current active group.

 

  The connections are constructed in the same thread of UI, so the Application.DoEvents() method is

inserted in some places of the code to make sure user can operate the application while data is

transferring.

 

  When each article header is retrieved, event is fired so that the information can be displayed immediately.

 

User interface

 

  The user interface of this program is divided into two portions: The Newsgroups tree on the left side of

the Form, and the article list on the right. Here I use SpringSys OrchidGrid to construct the main parts

of the UI, because it can work in tree mode and support exporting data to Excel.

 

  The left tree view has two levels, the top level is for the server nodes, the second level displays the

newsgroups on each server.

 

  After adding a Newsgroup server, a NewsgroupClient object will be created and stored in the server

node, meanwhile, all the newsgroups created on the server will be listed by calling the ListGroup()

method. Obviously, one NewsgroupClient corresponds to a Newsgroup server and is responsible for

 all the later network communications.

 

  Below code retrieve the Newsgroups from a server and add them to the server node:

 

       private bool _updating = false;

        private void UpdateGroups(GridTreeNode node)

        {

            // gets NewsgroupClient object from the tree node

            NewsgroupClient ngClient = node.Tag as NewsgroupClient;

            if (ngClient == null) return;

 

            _updating = true;

            this.StartProgress();

 

            // connects the server if necessary

            if (!ngClient.Connected)

            {

                string server = node.Data as string;

                this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);

                if (!ngClient.Connect(server))

                {

                    _updating = false;

                    this.StopProgress();

                    MessageBox.Show("Error connecting to server " + server);

                    return;

                }

            }

 

            // resets the childre of the server node

            node.ClearChildren();

            int index = node.Row.Index + 1;

 

            // retrives the newsgroups from the server

            this.lblMsg.Text = "Retrieving groups from server......";

            string[] newsgroups = ngClient.ListGroup();

 

            // adds the newsgroups as the child nodes

            ogServer.Redraw = false;

            for (int i = 0; i < newsgroups.Length; i++)

            {

                Row row = this.ogServer.Rows.Insert(index + i, newsgroups[i]);

                row.TreeImage = this.imageList1.Images[1];

            }

            ogServer.Redraw = true;

 

            this.lblMsg.Text = "Completed!";

 

            this.StopProgress();

 

            // adjust the column's width

            this.ogServer.AutoSizeColWidth();

            _updating = false;

        }

 

 

  The data of the tree will be persisted into a text file when the application is closed. The next run the

program, data will be restored from the text file. This is done by the PersistGroups() and

LoadGroups() methods.

 

  By checking the nodes, we can specify which groups are going to be explored before downloading the

article headers. Here we use the check box node feature of the grid. Below code would make sure

checking a server node would check all the group nodes belong to that server.

 

this.ogServer.Tree.CheckAction = TreeCheckAction.Children;

 

Below code browses each row of the tree grid and download only for the nodes that are checked.

 

        private bool _downloading = false;

        private void btnDownload_Click(object sender, EventArgs e)

        {

            if (_downloading || _updating) return;

           

            try

            {

                _downloading = true;

                this.StartProgress();

                foreach (Row row in ogServer.Rows)

                {

                    if (row.IsNode) continue; // skips none node row

                    if (row.UserData != null) continue; // skips the node that has already been visited

 

                    if (row.TreeChecked == CheckState.Checked)

                        DownLoadHeaders(row); // downloads headers for the specific newsgroup

                }

            }

            catch

            {

            }

            finally

            {

                this.StopProgress();

                _downloading = false;

 

                if (_currentClient != null)

                    _currentClient._forceStop = false;

            }

        }

 

        NewsgroupClient _currentClient;

        private void DownLoadHeaders(Row row)

        {

            GridTreeNode node = row.Node;

 

            // gets NewsgroupClient from the node

            NewsgroupClient ngClient = node.Tag as NewsgroupClient;

            if (ngClient == null) return;

 

            // connects if necessary

            _currentClient = ngClient;

            _currentClient._forceStop = false;

            if (!ngClient.Connected)

            {

                string server = node.Data as string;

                this.lblMsg.Text = string.Format("Connecting to server {0} ......", server);

                if (!ngClient.Connect(server))

                {

                    _updating = false;

                    this.StopProgress();

                    MessageBox.Show("Error connecting to server " + server);

                    return;

                }

            }

 

            // gets the group name

            string group = row[0] as string;

            if (group == null) return;

 

            this.lblMsg.Text = string.Format("Downloading from group {0} ......", group);

            try

            {

                // selects the group

                ngClient.SelectGroup(group);

 

                // downloads article headers

                // note: headers will be filled back to UI throw OnDownloadHeader event

                ArrayList headers = ngClient.DownloadHeaders(ngClient.CurrentGroup.LowID, ngClient.CurrentGroup.HighID);

                if (headers == null)

                {

                    this.lblMsg.Text = "Download message header failed";

                }

                else

                {

                    this.lblMsg.Text = "Success!";

                    row.Style = _visitedStyle;

                    row.UserData = "Visited";

                }

            }

            catch

            {

                this.lblMsg.Text = string.Format("Erorr happen while downloading from group {0}", group);

            }

            finally

            {

            }

        }

 

  Our target is not only to display the article headers in a friendly user interface, but also the most

important one is to export the data into text file or Excel file for later processing. Fortunately,

OrchidGrid has the built-in support for data exporting, they are methods ExportToDelimitedFile()

and ExportToExcel(). We don’t need to write extra code for this functionality. Please see the below

code:

  

  You can also write your own exporting code if you like. In this application, I commented some code

that would export only the email address of the article author to a text file.

 

  During the network operations, progress bar and message titles are all active to indicate the progress.

You can stop or cancel an operation at any moment as well.

 

  If you need other header data in addition to “Subject,” “From”, “Date”, you can modify the

ArticleHeader class and adjust the code for your project.

 

Try a sample server

 

  Let’s try a newsgroup server for example – “msnews.microsoft.com”, it has a bunch of Newgroups ,

and some contain thousands of articles.

 

  Input the server address “msnews.microsoft.com” and press Enter key, the server and the newsgroups

on that server are listed on the left tree. Check some newsgroups as you like and click the button

“Download Message Headers”, you will get all the headers in the selected newsgroups. Then, you can

export the headers to text or Excel file.

 

  Please see the screen shot of this application at the top of this page.

 

  Hope you like this tool and think it is useful.

 

 

 

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jacky S



Occupation: Web Developer
Location: China China

Other popular C# articles:

Article Top
Sign Up to vote for this article
You must Sign In to use this message board.
FAQ FAQ Noise ToleranceSearch Search Messages 
 Layout  Per page   
  (Refresh) 
-- There are no messages in this forum --

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 9 Sep 2006
Editor:
Copyright 2006 by Jacky S
Everything else Copyright © CodeProject, 1999-2008
Web16 | Advertise on the Code Project