Need help to capture SSDeep string with regex

Question

0.00/5 (No votes)

See more:

My subject is pretty well the question here I have a regular expression that works in some cases but it returns other undesired values at times which is not what I want to see. I hope there is a better or more precise expression I can use to capture the SSDEEP string in question.

Here is the html code which I wish to capture the string from

HTML

<div class="floated-field-value">768:pHC0p5mwel+twV39TD8mRF5rKJZsF6No2:o0p5mwelJ9TD8mv5ImGo</div>

the regular expression I am working on looks like this

VB

Dim SSDEEP As New Regex("(?<=<div class=""floated-field-value"">)([^\""]+)(</div>)", RegexOptions.IgnoreCase)

I can only seem to get it close with

HTML

</div>

still remaining on the end of the string so I excluded ("div") off the string with some code

VB

For X = 0 To RichTextBox3.Lines.Length - 2
            Dim MyString As String = RichTextBox3.Lines(X).ToString
            Label28.Text = MyString 
next

I hope this is enough for someone to help me
thank you in advance!!

Posted 8-Aug-13 0:13am

Draco2013

Add a Solution

Comments

ledtech3 8-Aug-13 21:37pm

Are you trying to get the whole line or just the value ?
I forgot about this
http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
Is has a listing in it for Html tags.

Draco2013 9-Aug-13 14:29pm

I am trying to only get the SSDEEP string from the html tag

ledtech3 9-Aug-13 14:38pm

I was trying last night to get one to work and the best I got so far was to return the entire string.
The sample provided that is supposed to only return what is between the tags is not returning anything when dropped into a sample application.
I'm still trying to see what works.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

ledtech3 · Accepted Answer · 2013-08-09T13:35:00

Ok got it.
Instead of trying to get the tags, get the pattern of the data.

Input string:

<div class="floated-field-value">768:pHC0p5mwel+twV39TD8mRF5rKJZsF6No2:o0p5mwelJ9TD8mv5ImGo</div>

Regx1 used:

((\d{3}):(\w*)\+(\w*):(\w*))

Regx2 used:

((\d{3}):(\w*):(\w*)|(\d{3}):(\w*)\+(\w*):(\w*))

Regx3 used:

((\d*):(\w*):(\w*)|(\d*):(\w*)\+(\w*):(\w*))

Output:

768:pHC0p5mwel+twV39TD8mRF5rKJZsF6No2:o0p5mwelJ9TD8mv5ImGo

the 2 outer "()" contains the search terms.Not sure if they are needed when parsing a site or not.

"(\d{3})" looks for three numbers
":" that char next
"(\w*)" alphanumeric word of any length
"\+ escape the plus and look for the plus sign next
"(\w*)" alphanumeric word of any length
":" that char next
"(\w*)" last word to extract

Thats it like I said not sure how it would work on a real site.
It should work as long as all data values contain a "+" otherwise it would need to be modified for that type. like an "Or" statement that dosen't use the "+" in it but most everthing else the same.

It does work in a small test app.
I hope this is not your homework :)
EDIT:
After looking up what SSDEEP is I tested the other 2 Regx added.
the second one is for catching if the "+" is there or not.
The third one after a review of SSDEEP the first section could be longer the 3 Char's so I fixed it to get any length of digits.
The best I can tell the 2 outside "()" would need to be there to match the entire pattern.