Click here to Skip to main content
Click here to Skip to main content

Enhanced String Handling II

By , 16 Dec 2010
 

Introduction

In my previous article, Enhanced String Handling, EnhancedStringHandling.aspx, I ended the article with a section: Where do we go from here, where I pointed out a limitation: “The specifications as they stand now will not allow for a delimiter as part of the value. So for example: {identifier::value containing an open or close brace} will not pass the brace matching check. This is not of theoretical interest only. For example, if we would like to write a {Decrypt::encrypted value} decryption construct—if we cannot guarantee that the encrypted value has neither open nor close braces, then we cannot write such a ProcessDecrypt class. This problem is not necessarily confined to the case of a single charactered delimiter because the multi charactered delimiters are transformed into single charactered delimiters.”

A few days after I published the article, it occurred to me that the limitation was in my mind only.

The Perceived Problem

When we attempt to write a regular expression that will match on the form: {Crypt::encrypted-text} we are in trouble if the encrypted-text contains an open delimiter (“{”), a close delimiter (“}”) or a separator string (“::”). Therefore, I concluded, prematurely, that we needed more powerful machinery to handle such situations.

Fixing the Limitation

Up to now, throughout the article: Enhanced String Handling, we handled problems of this kind by substituting the offending string with a new string. In this case, we can use this very same technique by providing the encrypted text as unicode numbers converted to strings.

So if we are to encrypt the string: “01234 -- The quick brown fox jumped over the lazy dogs -- 56789!” we will first encrypt it into a string or characters. But instead of including the encrypted characters in a string as is (in the {Crypt::encrypted string} construct) we will include the unicode character representation of the encrypted string, comma separated. So the string:

“01234 -- The quick brown fox jumped over the lazy dogs -- 56789!” 

encrypted will be represented as:

“225,14,132,43,189,68,220,227,84,28,69,216,140,97,101,85,254,11,229,238,
 148,191,73,177,235,233,193,176,45,187,218,44,92,107,175,168,56,90,14,24,201,219,
 251,161,82,146,221,133,249,49,111,196,239,55,164,209,93,126,144,158,212,39,101,2
 9,197,221,62,174,210,137,124,134”

The ProcessCrypt(), therefore, will process a construct representing the above string, encrypted, like so: {Crypt::225,14,132,..}. Therefore, we solve the issue of an encrypted string that can contain any character including a delimiter or a separator. See CypherTest(), TestMethod. (The sample code in ProcessCrypt() is not meant to be the most secure encryption/decryption code, it is meant to illustrate the point—that we can overcome a delimiter or a separator in the string value to be processed.)

  [TestMethod]
  public void CypherTest()
  {
      string org = "01234 -- The quick brown fox jumped over the lazy dogs -- 56789!";
      string encripted = string.Format("Text: {{Crypt::{0}}}", 
				ProcessCrypt.Encrypt(org));
 
      var context = new List<IProcessEvaluate>();
      context.Add(new ProcessCrypt());
      var eval = new EnhancedStringEval(context);
      string decript = eval.EvaluateString(encripted);
 
      Assert.AreEqual("Text: " + org, decript);
  }

While ProcessCrypt is as follows:

 public sealed class ProcessCrypt : IProcessEvaluate
 {
  static ProcessCrypt()
  {
    var reo = RegexOptions.Singleline | RegexOptions.IgnoreCase;
      _reCrypt = new Regex(@"{\s*Crypt\s*::(?<cipher>[0-9,]*)}", reo);
  }

  public ProcessCrypt() { }
 
  #region IProcessEvaluate Members
 
  /// <summary>
  ///   
  /// </summary>
  /// <param name="src"></param>
  /// <param name="ea"></param>
  public void Evaluate(object src, EnhancedStringEventArgs ea)
  {
      // Task 1:
      string encrypted = ea.EhancedPairElem.Value;
 
      // ea.IsHandled == false, by default.
      Match m = _reCrypt.Match(encrypted);
      if (!m.Success) return;
 
      string deciphered = _reCrypt.Replace(encrypted, CipherReplace);
      if (deciphered == encrypted) return;
 
      // Task 2
      ea.IsHandled = true;
 
      // Task 3
      ea.EhancedPairElem.Value = deciphered;
  }
 
  #endregion
 
  /// <summary>
  /// 
  /// </summary>
  /// <param name="m"></param>
  /// <returns></returns>
  private string CipherReplace(Match m)
  {
      string encrypted = m.Groups["cipher"].Value;
      return Decrypt(encrypted);
  }
 
  private const string _criptSplitter = ",";
  private static Regex _reCrypt;
 
  ...
 
 }

Enjoy!

Avi

History

  • 15th December, 2010: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Avi Farah
United States United States
Member
avifarah@gmail.com

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberMichael Haephrati מיכאל האפרתי3 Dec '12 - 5:18 
Thanks for a great article. Please also look at mine own article about Strings Obfuscation: http://www.codeproject.com/Articles/502283/Strings-Obfuscation-System
GeneralRe: My vote of 5memberAvi Farah3 Dec '12 - 17:10 
Many thx
--Avi
GeneralMy vote of 5membermanoj kumar choubey8 Feb '12 - 20:02 
Nice
GeneralRe: My vote of 5memberAvi Farah3 Dec '12 - 17:09 
Many thanks
--Avi
Generalnice onememberPranay Rana18 Dec '10 - 10:07 
will use in my development
For any question : http://pranayamr.blogspot.com/
 
vote my article :

Learn SQL to LINQ ( Visual Representation )


Calling WCF Services using jQuery

GeneralRe: nice onememberAvi Farah18 Dec '10 - 17:00 
Thx buddy,
--Avi
GeneralMy vote of 5memberPaul Selormey16 Dec '10 - 11:18 
This is really useful. It does all that I was looking for, and needed currently for a project - Sandcastle Assist.
 
Thank you Avi, you have saved me time and energy.
 
Best regards,
Paul.
GeneralRe: My vote of 5memberAvi Farah16 Dec '10 - 13:26 
Paul,
 
Many thanks to you too, your encouraging word means way more to me than would appear on the surface.
 
God speed to you buddy,
--Avi
GeneralMy vote of 2memberJohn Simmons / outlaw programmer15 Dec '10 - 23:56 
Formatting sucks. I don't know what you used to actually write the content, but it added all kinds of junk tags.
 
This is also too short to be a "part 2", and there's really no reason you couldn't have just added it to part 1.

modified on Friday, December 17, 2010 8:20 AM

GeneralRe: My vote of 2subeditorIndivara16 Dec '10 - 3:12 
You mean the huge font at the end?
I think your sarcasm here will be beyond most people, since you've removed it Smile | :)
GeneralRe: My vote of 2memberJohn Simmons / outlaw programmer17 Dec '10 - 0:14 
Well, that too, but in general, I mean the formatting sucks. I actually edited part of it so that it didn't send the highlighted string of numbers off the right end of the screen, but there were too many junk tags to correct the rest of it.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

GeneralRe: My vote of 2memberPaul Selormey16 Dec '10 - 11:43 
John Simmons / outlaw programmer wrote:
Formatting sucks.

As someone pointed out, if all you saw in this article is his signature ending and then disregarding all the efforts that went into the design and implementation of this code, then you need to examine your approach to people.
 
Many here have posted many things in their profile, he did not, and but decided to uniquely sign up his article. At most you could have made a suggestion, like putting the signature in his profile.
 
John Simmons / outlaw programmer wrote:
I don't know what you used to actually write the content, but it added all kinds of junk tags.

You are picking a wrong target, for all the efforts that go into the outlook of the Codeproject, the site article editor is hardly useful. If you want to help, please talk to the right people.
 
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.

GeneralRe: My vote of 2memberJohn Simmons / outlaw programmer17 Dec '10 - 2:20 
It's also not long enough to be a part 2. He should have just added this to part 1.
.45 ACP - because shooting twice is just silly
-----
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"The staggering layers of obscenity in your statement make it a work of art on so many levels." - J. Jystad, 2001

GeneralRe: My vote of 2memberPaul Selormey17 Dec '10 - 3:52 
Thanks for the understanding.
 
John Simmons / outlaw programmer wrote:
It's also not long enough to be a part 2. He should have just added this to part 1.

That could be a good suggestion. The first is rather long 12 pages if you take away the comments/discussions and signature sections. So, another could be splitting the articles are into better parts.
 
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.

QuestionFew Questions...memberPaul Selormey15 Dec '10 - 22:32 
Hello Avi,
Thanks for the contribution.
I am really looking for something similar to this library to use in my documentation library.
 
I do have some questions...
 
1. What changed in this updated version?
2. What makes this approach better than a simple string search? say find '{' and then loop till next '}'
3. Will this handle nested items? say {Code{Guid}}?
4. Finally, this is odd but let me still ask (as the last time I was reading this article,
the story seems different), is the "::" a requirement?
 
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.
 
modified on Thursday, December 16, 2010 5:20 PM

GeneralRe: Few Questions...memberAvi Farah16 Dec '10 - 7:00 
Hey Paul,
 
I will start by saying that your questions indicate that you are on your way to understanding.
 
1. Changes: No change. I will elaborate after I answer point 4.
 
2a. Is this approach better: This is not a new approach it only comes to solve the issue: “If you have (or may have) a delimiter/separator in the value part of the {id, value} construct then how will the approach taken in the EnhancedStringHandling.aspx work?” The approach provided is away for it to work.
 
2b. Better than a simple string search? say find '{' and then loop till next '}': This is exactly what we have done except that we did not loop through in a c# loop but expressed “nearest closing delimiter” in a regular expression language. Our regular expression had more than just “nearest closing delimiter” it insisted on adhering to the expected expression as much as possible. Recall the example:
string pattern = @"({)\s*Counter\s*:Frown | :( <name>[^{}:]+)" +
@"(::\s*(?<extras>(init)|(next)|(previous)|" +
@"((?<op>[=+-])\s*" +
@"(?<direction>[+-])?\s*(?<val>[0-9]+)))?)?\s*(})";

3. Will this handle nested items? Absolutely, nothing changed.
 
4. Is the “::” separator a requirement? No. It never was. Neither the delimiters nor the separator is/are a requirement. You have the DelimitersAndSeparator class where you can change the default value of the delimiters and/or the separator.
 
Back to a complete answer to Q1: The reason that I published this second article, in a way an addendum or a correction type of a codicil, to the first article is the fact that after coming up with the solution, I looked into all the ways I could think of that the approach can fail.
 
In the first article, EnhancedStringHandling.aspx, I believe that we handled all the points of failure except for one. We could handle nested constructs, for which we needed to handle “how to express in a regular expression language: not this string”. This really was one of the major points of the first article. However there was one hole in things that the approach (in the first article) could not handle: if the construct {id, value} needs to contain a delimiter or separator in the value part, then I deemed the approach (in the first article) lacking.
 
In this second article, EhhancedStringHandlingII, I point out that the limitation was in my head only. If there is a need for the value part to have a delimiter and/or separator then one solution is to represent the value as a string of comma delimited unicode numbers. A real live example where such a thing can happen is handling of encryption/decryption (example provided as code).
 
Now, in all fairness, I expect the need for such handling to be rare. But if the value part does need to contain a delimiter or a separator then I provided a solution.
 
Paul, I hope this helps, if not please do not hesitate to point out what I am not explaining.
All the best,
Avi

GeneralRe: Few Questions...memberPaul Selormey16 Dec '10 - 11:20 
Avi Farah wrote:
Paul, I hope this helps, if not please do not hesitate to point out what I am not explaining.

 
Yes, it helps. Thank you for taking the time to provide this insight. I really appreciate it.
 
Best regards,
Paul.
Jesus Christ is LOVE! Please tell somebody.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 16 Dec 2010
Article Copyright 2010 by Avi Farah
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid