Click here to Skip to main content
15,886,518 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hey,
I have a set of files that has email id and phone number that does not have a fixed file content format. The application should read the file line by line and check each line whether an email id or phone number exists or not. If exists it should be replaced with '_HIDE_'.
For e.g :
abcdef sjifdjkf sjdkAaasa 122.345 11/10/2016 [u'hello@xyz.com'] 8989878787
This should be replaced with :
abcdef sjifdjkf sjdkAaasa 122.345 11/10/2016 [u'_HIDE_@xyz.com'] _HIDE_

This regex should validated for whole line instead of word in each line.
Please help me to solve this.Thanks in advance.

What I have tried:

Regex for email id : \\w+([-+.']\\w+)*@\\w+([-.]\\w+)*\\.\\w+([-.]\\w+)*
Regex for phone number : '^\\(?([0-9]{3})\\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$
Posted
Updated 18-Oct-16 4:15am

0) I think your email regex is rather weak. Try this one:

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?


1) Why not just split the string on spaces, regex the email address, and then run the phone number through this extension method:

C#"
public static bool IsPhone(this string text)
{
    bool result = true;
    text = text.Replace("(","").Replace(")","").Replace("-","");
    long value;
    result = ((text.Length == 7 || text.Length == 10) && long.TryParse(text, out value));
    return result;
}


It's unfortunate that RegEx is probably the best parser for email address validation, but for phone numbers you don't need it. if you're parsing thousands of records, you'll save considerable time using regex only once instead of twice.
 
Share this answer
 
Comments
User1454 20-Oct-16 0:04am    
Hey John,
Actually I am splitting each word and perform the validation and it works fine. But when more number of files comes into picture this takes a lot of time to complete the process. So in order to avoid time consumption, I think its better to use regex.
Try this :

C#
string strval = "";
           string strFormat = "abcdef sjifdjkf sjdkAaasa 122.345 11/10/2016 [u'hello@xyz.com'] 8989878787";
           Regex rg = new Regex(@"[A-Za-z0-9_\-\+]+@");
           strval = rg.Replace(strFormat, "_HIDE_@");
           string pat = @"\d{10}";
           Regex r1 = new Regex(pat);
           Match m = r1.Match(strFormat);
           if (m.Success)
           {
               strval = r1.Replace(strFormat, "_HIDE_");
           }
 
Share this answer
 
Comments
User1454 19-Oct-16 23:58pm    
Hi Surendra,
This works fine but when email id like [u'amith.hello@xyz.com'], this replaces only [u'amith._HIDE_@xyz.com']. I need it as [u'_HIDE_@xyz.com']. How to replace the whole name except domain name? Can you please suggest a solution.
Surendra Reddy V 20-Oct-16 0:34am    
Hi,

Just Change regex for email:
Regex rg = new Regex(@"[A-Za-z0-9._\-\+]+@"); and try it now and see result.
User1454 21-Oct-16 8:03am    
Hey Surendra,
Thanks a ton! Works like charm
Surendra Reddy V 21-Oct-16 8:19am    
Welcome :-)
User1454 23-Oct-16 4:06am    
Hey Surendra,
For phone number @"\d{10}" if i am correct this is for 10 digit number right? So how can I make it for other countries where the phone number length varies?And also that come with "+" For e.g in this way +919580767876 also 333?
Just a few links about RegEx
Here is a link to RegEx documentation:
perlre - perldoc.perl.org[^]
Here is links to tools to help build RegEx and debug them:
.NET Regex Tester - Regex Storm[^]
Expresso Regular Expression Tool[^]
This one show you the RegEx as a nice graph which is really helpful to understand what is doing a RegEx:
Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900