Click here to Skip to main content
15,860,943 members
Articles / Desktop Programming / MFC
Article

Name genderization

Rate me:
Please Sign up or sign in to vote.
4.80/5 (6 votes)
22 Apr 2002CPOL3 min read 110.6K   2.4K   19   7
Extrapolate the gender of a person based on their first name

Sample app

Introduction

Over time I have compiled a database of roughly six thousand unique first names along with the gender usually associated with that name.  The names in this database are primarily English names, but also contains some other nationalities such as German, French, Russian, etc. I have used this database on a number of occasions for processes such as data entry validation, and data extrapolation.  I believe some other people could benefit from this database and worker class, so I decided to post it to Code Project.  

Why would anyone need this?

Gender is a very common dimension in data marts.  A database project I worked on (some time ago) had a database of about 1.2 million names, addresses and phone numbers.  If the client had wanted gender for the names on this list it would have been unavailable because that data was not collected with the list. The only way to get gender was to contact these people directly (unreasonable) or extract the approximate gender based on each individuals names.

Another example is a data entry application where the data entry person did a poor job of entering the data.  I used this database to cross match the gender entered by the user to the approximate gender determined through the database.  The results were that roughly 15% of the data entered required review and nearly 10% was actually incorrect.

What is this?

I am including with this project 3 items.

  1. An MS Access database (NameDB.mdb) with a single table (FRST_NM_GNDR) which contains approximately 6,000 names and associated genders.
  2. The source code for a class (CFPSGenderizer) which loads this database and provides a simple API for looking up a name and returning the associated gender.
  3. A demo project which demonstrates how to use the CFPSGenderizer class.

Where did these names come from?

The names in this database have been collected from 3 primary sources.  1) A customers database, 2) freely available web site downloads, 3) the Social Security Administration's web-site (ssa.gov).  There are no license requirements for using these names nor are there any warranties as to the accuracy of the name/gender associations.

How accurate is the list?

Who knows!  From the few times I have used the list in verifiable scenarios it appeared that for the names in the list it was at least 85% accurate.  This means, of course, that it could be as much as 15% inaccurate.  I do not use this data when high-precision is needed, only for cross-verification and data extrapolation situations.  

How to use this class?

  1. Add the FPSGenderizer.cpp and FPSGenderizer.h files to your project.
  2. Instantiate an instance of the CFPSGenderizer class in your program at an appropriate location.  The class must be initialized through the Load function so your implementation should plan on performing this step only once if possible.
  3. Call one of the overridden Load functions to load the list from a database or serialized file.
  4. Call the CFPSGenderizer::Genderize function and pass in an LPCTSTR containing a first name you want to genderize.  It will return a char which will either be 'M' (Male), 'F' (Female) or 'U' (Unknown).  This function will return 'U' for names not on the list as well as for names on the list explicitly associated with 'U'.

Future Development?

As my job requires I will be updating the database by adding names and changing the associations of the names on the list.  I also plan to incorporate an edit-distance and metaphone algorithm (see my earlier Spell Checker app) to find suggestions for a name and based on the frequency of suggested male/female/unknown genders suggest a gender. Before I release this enhancement I need to test the results to see if they are even remotely reliable, though.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralThankyou Pin
Member 927458418-Jul-12 19:34
Member 927458418-Jul-12 19:34 
QuestionGender Data Pin
sm jacoby10-Feb-12 3:26
sm jacoby10-Feb-12 3:26 
GeneralBetter sex... Pin
politico7-Aug-06 9:42
politico7-Aug-06 9:42 
Generaljust wanted to say thanks Pin
shoi17-Dec-03 15:18
shoi17-Dec-03 15:18 
GeneralAmbigiuous names Pin
Claudius Mokler24-Apr-02 0:06
Claudius Mokler24-Apr-02 0:06 
GeneralRe: Ambigiuous names Pin
Matt Gullett24-Apr-02 1:30
Matt Gullett24-Apr-02 1:30 
GeneralRe: Ambigiuous names Pin
Philippe Lhoste2-May-02 23:17
Philippe Lhoste2-May-02 23:17 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.