Click here to Skip to main content
12,360,348 members (51,171 online)
Click here to Skip to main content

Tagged as

Stats

18.9K views
297 downloads
23 bookmarked
Posted

Solving complex parsing tasks with RegexTreeer

, 23 Oct 2008 LGPL3
Solving complex parsing tasks by utilizing Regular Expression trees built with RegexTreeer.
//********************************************************************************************
//Author: Sergey Stoyan, CliverSoft.com
//        http://cliversoft.com
//        stoyan@cliversoft.com
//        sergey.stoyan@gmail.com
//        22 March 2008
//Copyright: (C) 2008, Sergey Stoyan
//********************************************************************************************
using System;
using System.IO;
using Cliver;

namespace Test
{
    class Test2
    {
        internal void Run()
        {
            //text to be parsed
            string page = File.ReadAllText("../../_pages/Companies.txt");

            process_company_list(page);
        }

        //create CliverSoft.Parser from the regex tree stored in the file
        Cliver.Parser company_parser = new Parser("../../_config_files/Companies.rgx");

        /// <summary>
        /// Process the page by Cliver.Parser
        /// </summary>
        /// <param name="page">text to be parsed</param>
        void process_company_list(string page)
        {
            Cliver.GroupCapture gc = company_parser.Parse(page);

            foreach (Cliver.GroupCapture company in gc["Company"])
            {
                Console.WriteLine("\n\n>>>>>>>Company:>>>>>>>");

                //generally number of group captures can be more than 1 
                //that's why we always have to ask to retrieve the first capture 
                //in spite of that we know that company name is only one for each company
                Console.WriteLine("Name: " + company.FirstValueOf("CompanyName"));
                Console.WriteLine("Address: " + company.FirstValueOf("CompanyAddress"));
                Console.WriteLine("Site: " + company.FirstValueOf("CompanySite"));

                foreach (Cliver.GroupCapture employee in company["Employee"])
                {
                    Console.WriteLine("\n-------Employee:-------");
                    Console.WriteLine("Name: " + employee.FirstValueOf("EmployeeName"));

                    //employee can have more than one phone in our sample, that's why we enum them in a cycle
                    foreach (string phone in employee.ValuesOf("EmployeePhone"))
                    {
                        Console.WriteLine("Pnone: " + phone);
                    }

                    Console.WriteLine("Mobile: " + employee.FirstValueOf("EmployeeMobile"));

                    Console.WriteLine("Email: " + employee.FirstValueOf("EmployeeEmail"));
                }
            }
        }
    }
}

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

Share

About the Author

Sergey Stoyan
Architect CliverSoft (www.cliversoft.com)
Ukraine Ukraine
Sergey is graduated as applied mathematician. He is specialized in client/server applications, backup systems, data parsing tools, web crawlers and search engines. Work for CliverSoft Co. Favorite languages are C#, C++, Perl

You may also be interested in...

| Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.160621.1 | Last Updated 23 Oct 2008
Article Copyright 2008 by Sergey Stoyan
Everything else Copyright © CodeProject, 1999-2016
Layout: fixed | fluid