Click here to Skip to main content
Click here to Skip to main content

vCard Reader with Lightweight Approach

By , 26 May 2008
 

Background

Up to October 20th, 2006, I tried to find a C# class which can read vCard and give data extraction. It is amazing that 6 years after .NET and C# came out, there is still no C# class hanging around that can parse vCard, in CodeProject.com or GotDotNet.com, though there are a load of classes that can export data to vCard format.

So far I only found one in Indy Internet Component Suit coded in Delphi, and another in Mozilla Project coded in C. The Indy one is quite good, though it contains a minor bug in handling ISO date format (I fixed that). The Mozilla one was originally from the initiative of vCard (including Apple, IBM and AT&T). The codes were generated by YACC. Both classes can handle vCard v2.1 only. Sometimes I just wonder why nobody in the world had modified the vcc.y file in order to support vCard 3.0.

So, I made one for myself, and for you.

Purpose

Create a .NET vCard reader.

  1. Create a class that can read vCard text and create model.
  2. Make it easier for others to modify the source codes for other use cases.

Some Considerations

  1. Performance/speed is not a concern. I will use a simple algorithm/structure to implement.
  2. Conventional parser will not be my approach. As I can see from the source codes of versit.dll (C) and Indy Internet Component Suit (Delphi), the algorithm and the structures of a conventional vCard parser look too complicated, though it might be efficient.
  3. The data extraction may not necessarily reflect the hard logical structures of vCard, nor will it fully support all attributes/types of vCard specification.

I am not a component developer, so I would just make this class work well with my current projects and mid-term projects, and I would give spaces to other programmers who may tailor the class.

Implementation

My approach is to use regular expressions to do the dirty work of parsing. As you will see below, the logical structures and algorithms are very simple, and the code is short, though it took me quite a few hours to develop those regular expressions. It is easy to maintain and tailor this class for your use cases.

    /// <summary>
    /// Read text and create data fields of collections.
    /// </summary>
    public class vCardReader
    {
        #region Singlar Properties

        private string formattedName;

        public string FormattedName
        {
            get { return formattedName; }
            set { formattedName = value; }
        }

        string surname;
        public string Surname
        {
            get { return surname; }
            set { surname = value; }
        }

// ................... other properties ............

        private DateTime rev;
        /// <summary>
        /// If Rev in vCard is UTC, Rev will convert utc to local datetime.
        /// </summary>
        public DateTime Rev
        {
            get { return rev; }
            set { rev = value; }
        }

        private string org;

        public string Org
        {
            get { return org; }
            set { org = value; }
        }

        private string note;

        public string Note
        {
            get { return note; }
            set { note = value; }
        }

        #endregion

        #region Property Collections with attribute

        private Address[] addresses;

        public Address[] Addresses
        {
            get { return addresses; }
            set { addresses = value; }
        }

// .......... Other properties ................


        #endregion

        /// <summary>
        /// Analyze s into vCard structures.
        /// </summary>
        public void ParseLines(string s)
        {
            RegexOptions options = RegexOptions.IgnoreCase | 
                RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace;

            Regex regex;
            Match m;
            MatchCollection mc;

            regex = new Regex(@"(?<strElement>(FN))   (:(?<strFN>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
                FormattedName = m.Groups["strFN"].Value;

            regex = new Regex(@"(\n(?<strElement>(N)))   
                    (:(?<strSurname>([^;]*))) (;(?<strGivenName>([^;]*)))  
                    (;(?<strMidName>([^;]*))) (;(?<strPrefix>([^;]*))) 
                    (;(?<strSuffix>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
            {
                Surname = m.Groups["strSurname"].Value;
                GivenName = m.Groups["strGivenName"].Value;
                MiddleName = m.Groups["strMidName"].Value;
                Prefix = m.Groups["strPrefix"].Value;
                Suffix = m.Groups["strSuffix"].Value;
            }

            ///Title
            regex = new Regex(@"(?<strElement>(TITLE))   
                    (:(?<strTITLE>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
                Title = m.Groups["strTITLE"].Value;

            ///ORG
            regex = new Regex(@"(?<strElement>(ORG))   
                    (:(?<strORG>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
                Org = m.Groups["strORG"].Value;

            ///Note
            regex = new Regex(@"((?<strElement>(NOTE)) 
                    (;*(?<strAttr>(ENCODING=QUOTED-PRINTABLE)))*  
                    ([^:]*)*  (:(?<strValue> 
                    (([^\n\r]*=[\n\r]+)*[^\n\r]*[^=][\n\r]*) )))", options);
            m = regex.Match(s);
            if (m.Success)
            {
                Note = m.Groups["strValue"].Value;
                //Remove connections and escape strings. The order is significant.
                Note = Note.Replace("=" + Environment.NewLine, "");
                Note = Note.Replace("=0D=0A" , Environment.NewLine);
                Note = Note.Replace("=3D", "=");
            }

            ///Birthday
            regex = new Regex(@"(?<strElement>(BDAY))   
                    (:(?<strBDAY>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
            {
                string[] expectedFormats = { "yyyyMMdd", "yyMMdd", "yyyy-MM-dd" };
                Birthday = DateTime.ParseExact
                           (m.Groups["strBDAY"].Value, expectedFormats, null, 
                           System.Globalization.DateTimeStyles.AllowWhiteSpaces);
            }

            ///Rev
            regex = new Regex(@"(?<strElement>(REV))   (:(?<strREV>[^\n\r]*))", options);
            m = regex.Match(s);
            if (m.Success)
            {
                string[] expectedFormats = { "yyyyMMddHHmmss", "yyyyMMddTHHmmssZ" };
                Rev = DateTime.ParseExact
                      (m.Groups["strREV"].Value, expectedFormats, null, 
                      System.Globalization.DateTimeStyles.AllowWhiteSpaces);
            }

            ///Emails
            string ss;

            regex = new Regex(@"((?<strElement>(EMAIL)) 
                    (;*(?<strAttr>(HOME|WORK)))*  (;(?<strPref>(PREF)))* 
                    (;[^:]*)*  (:(?<strValue>[^\n\r]*)))", options);
            mc = regex.Matches(s);
            if (mc.Count > 0)
            {
                Emails = new Email[mc.Count];
                for (int i = 0; i < mc.Count; i++)
                {
                    m = mc[i];
                    Emails[i].address = m.Groups["strValue"].Value;
                    ss = m.Groups["strAttr"].Value;
                    if (ss == "HOME")
                        Emails[i].homeWorkType = HomeWorkType.home;
                    else if (ss == "WORK")
                        Emails[i].homeWorkType = HomeWorkType.work;

                    if (m.Groups["strPref"].Value == "PREF")
                        Emails[i].pref = true;
                }
            }

            ///Phones
            regex = new Regex(@"(\n(?<strElement>(TEL)) 
                    (;*(?<strAttr>(HOME|WORK)))* 
                    (;(?<strType>(VOICE|CELL|PAGER|MSG|FAX)))*  
                    (;(?<strPref>(PREF)))* (;[^:]*)*  
                    (:(?<strValue>[^\n\r]*)))", options);
            mc = regex.Matches(s);
            if (mc.Count > 0)
            {
                Phones = new Phone[mc.Count];
                for (int i = 0; i < mc.Count; i++)
                {
                    m = mc[i];
                    Phones[i].number = m.Groups["strValue"].Value;
                    ss = m.Groups["strAttr"].Value;
                    if (ss == "HOME")
                        Phones[i].homeWorkType = HomeWorkType.home;
                    else if (ss == "WORK")
                        Phones[i].homeWorkType = HomeWorkType.work;

                    if (m.Groups["strPref"].Value == "PREF")
                        Phones[i].pref = true;

                    ss = m.Groups["strType"].Value;
                    if (ss == "VOICE")
                        Phones[i].phoneType = PhoneType.VOICE;
                    else if (ss == "CELL")
                        Phones[i].phoneType = PhoneType.CELL;
                    else if (ss == "PAGER")
                        Phones[i].phoneType = PhoneType.PAGER;
                    else if (ss == "MSG")
                        Phones[i].phoneType = PhoneType.MSG;
                    else if (ss == "FAX")
                        Phones[i].phoneType = PhoneType.FAX;
                }
            }
            ///Addresses
            regex = new Regex(@"(\n(?<strElement>(ADR))) 
                    (;*(?<strAttr>(HOME|WORK)))*  (:(?<strPo>([^;]*)))  
                    (;(?<strBlock>([^;]*)))  (;(?<strStreet>([^;]*)))  
                    (;(?<strCity>([^;]*))) (;(?<strRegion>([^;]*))) 
                    (;(?<strPostcode>([^;]*)))(;(?<strNation>[^\n\r]*)) ", options);
            mc = regex.Matches(s);
            if (mc.Count > 0)
            {
                Addresses = new Address[mc.Count];
                for (int i = 0; i < mc.Count; i++)
                {
                    m = mc[i];
                    ss = m.Groups["strAttr"].Value;
                    if (ss == "HOME")
                        Addresses[i].homeWorkType = HomeWorkType.home;
                    else if (ss == "WORK")
                        Addresses[i].homeWorkType = HomeWorkType.work;

                    Addresses[i].po = m.Groups["strPo"].Value;
                    Addresses[i].ext = m.Groups["strBlock"].Value;
                    Addresses[i].street = m.Groups["strStreet"].Value;
                    Addresses[i].locality = m.Groups["strCity"].Value;
                    Addresses[i].region = m.Groups["strRegion"].Value;
                    Addresses[i].postcode = m.Groups["strPostcode"].Value;
                    Addresses[i].country = m.Groups["strNation"].Value;
                }
            }
        }
    }

Evolving With Your Projects

As I do not intend to evolve this class into a big fat component, this class was not implemented for universal uses. Very likely you will need to modify it. I will give some hints below.

  1. The current algorithm will parse the whole text of vCard around 10 times. It is possible to parse the text much less while still using regular expressions to do parsing, and speed up the parsing. For example, parse the whole text once, and break into lines of different types, then use respective regular expression to do detailed parsing on lines of each type.
  2. This class was implemented for vCard 2.1. When vCard 3 becomes more popular in the future, it is better to have a vCard reader to handle both versions, of course with a different set of regular expressions. A builder pattern may be needed to talk to two implementations of vCard parser.
  3. As you will see from the attached source code, I just add a few more lines to implement vCardWriter derived from vCardReader. It is more comprehensive that a vCard writer can read vCard, since sometimes you just want to modify a vCard through programming.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Zijian
Software Developer
Australia Australia
Member
I started my IT career in programming on different embedded devices since 1992, such as credit card readers, smart card readers and Palm Pilot. Programming on the hardware was really fun, feeling like driving the hardware directly.
 
Beside technical works, I enjoy reading literatures, playing balls, cooking and gardening.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionLittle bug?memberJ.W. Albersen8 Dec '12 - 3:51 
First of all: Very good job. Rating of 5
 
I had a little problem with the adresses (Card VERSION 3.0).
I could solve it by changing part of the regex expression from the colon in a semicolon:
 
 (:(?<strPo>([^;]*)))  
changed to:
 (;(?<strPo>([^;]*)))  
Hopes it helps...
GeneralMy vote of 5memberdacard3 Feb '12 - 1:11 
Very well
GeneralThanks!membernst8882 Oct '08 - 5:00 
Fab article - this was just what I needed!
Nicki
Generalmultiple cards in a filememberCraig Lebowitz18 Aug '08 - 5:12 
I exported Skype contacts recently and realized that the current code (which is great, thanks Zijian) does not handle multiple cards in one file. I modified the winform slightly with a loop to handle this.
 
http://vcardreaderextensions.googlecode.com/files/testVCardReader.zip[^]
QuestionWhy would anyone rate this a 1?memberRajib Ahmed3 Jun '08 - 15:27 
Some people are real idiots. Any how, great article!
 
Rajib Ahmed
Prog Talk

AnswerRe: Why would anyone rate this a 1? [modified]memberZijian4 Jun '08 - 2:49 
Thanks, Rajib. I just found out why : likely these guys were using IE 7 to view. And IE 7 can not handle PRE in HTML (causing lines not wrapping if PRE contains long lines) and CDATA in XML (Data disappear if longer than a few hundreds characters) properly.
 
And it happened that the source code sitting in PRE contained some very long regular expressions, until the editor of CodeProject edited and broke the lines a few days ago.
 
If these guys ever used XML in programming, they will choose IE6 or Firefox or other alternatives, not IE 7, as web browser is handy for viewing XML. Who love IE7? Poke tongue | ;-P
 
Zijian
modified on Wednesday, June 4, 2008 9:02 AM

GeneralvCard Parser with Lightweight Approach IImemberZijian2 Jun '08 - 16:46 
A companion article was published at
vCard Parser with Lightweight Approach II[^]
 
Zijian

Generaldid not handle PHOTOmemberUnruled Boy12 Mar '08 - 23:46 
it could not read PHOTO
 
Regards,
unruledboy (at) gmail (dot) com

QuestionProblem with NamememberPoweRoy5 Nov '07 - 2:52 
Wonderfull vcard reader first of all Smile | :)
 
But i am experiencing a problem with the names.
 
my vcard from my mobile:
BEGIN:VCARD\r\nVERSION:2.1\r\nN;ENCODING=QUOTED-PRINTABLE:Van=20de=20Korput;Roy;;;\r\nTEL;CELL:0611111\r\nEMAIL:blaat@blaat.com\r\nEND:VCARD\r\n
 
colleague:
BEGIN:VCARD\r\nVERSION:2.1\r\nN:name;sname\r\nTITLE:Programmer\r\nORG:companyB.v.\r\nTEL;CELL:0611111\r\nEMAIL;INTERNET;PREF:blaat@blaat.com\r\nURL:http://www.google.com\r\nADR;HOME:;;;plaats;;;Netherlands \r\nBDAY:19830103\r\nX-IRMC-LUID:000200000000\r\nEND:VCARD\r\n
 
as you can see i got encoding in front of my name so i catched this by adding (;*(?(ENCODING=QUOTED-PRINTABLE))) between N and surname.
Now i have to catch the missing ; at my colleague's vcard:
\nN:name;sname\r\n
\nN;ENCODING=QUOTED-PRINTABLE:Van=20de=20Korput;Roy;;;\r\n
 
Anyway to catch this?
AnswerRe: Problem with NamememberZijian5 Nov '07 - 12:24 
Yes as you can see the regular expression is not yet able to handle Quoted-printable. I will refactor this project sometime this month but no date confirmed yet, to make it work with quoted-printable, vCard 2.1 and 3.0.
 
If you can't wait, please just re-do some of the regular expressions, and translate quoted-printable into Unicode, you are free to use the source codes.
 
Zijian

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 27 May 2008
Article Copyright 2007 by Zijian
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid