Click here to Skip to main content
14,176,979 members
Rate this:
 
Please Sign up or sign in to vote.
See more:
Yes, the classic LCS problem. Given two or more strings find the longest common substring in each string.

One caveat: No C#, C, C++, Javascript or any C-based programming language[^], nor any VB-based language (VB, VB.NET, VBScript). That cuts out most of the easy languages. Spread your wings!

Your solution should take into account null strings and strings whose length is limited only by available storage.

What I have tried:

Melatonin for jetlag but it doesn't work.

Graeme_Grant again gets the guernsey for his outstanding and, I'm assuming, time consuming treatment of last week's problem. Noice.
Posted
Updated 5-Feb-17 16:41pm
Comments
Graeme_Grant 27-Jan-17 10:36am
   
Thanks Chris. But not really ... tinkering while watching a movie...

So looking at your Link, R is out but F# & PowerShell are okay then ;)
Patrice T 27-Jan-17 15:02pm
   
What is supposed to be the result ? Just the length ? The substring ?
Graeme_Grant 27-Jan-17 15:53pm
   
"find the longest common substring" .. my guess is the common substring...
Chris Maunder 27-Jan-17 16:58pm
   
Yes.
Jon McKee 27-Jan-17 16:43pm
   
Perl is considered C-based but F# isn't? That's a little odd tbh. If anything Perl has less in common with C than F#. Also that link includes B which was the precursor to C which I find baffling.
Graeme_Grant 27-Jan-17 18:07pm
   
I'd blow the dust off my C64 but basic was not allowed...
Jon McKee 27-Jan-17 18:40pm
   
He specified VB. Didn't say anything about just BASIC ;)
Graeme_Grant 27-Jan-17 18:57pm
   
I planted my flag with F# (second time I've every used it apart from 3 weeks ago - was/is a steep learning curve)... Busy week ahead though, will have to see how it pans out and if I can find the old C64...
PIEBALDconsult 27-Jan-17 18:42pm
   
VB is not allowed, real BASIC is.
I have VAX BASIC, HP BASIC, and Turbo BASIC...
PIEBALDconsult 27-Jan-17 18:45pm
   
Well, how are BCPL and B considered C-like? They came first! (And second.)
PIEBALDconsult 27-Jan-17 18:19pm
   
I think I have Pascal on one of my AlphaServers. I have a license. I also have a license for COBOL...
Graeme_Grant 28-Jan-17 5:28am
   
This may help: Free Turbo Pascal for Windows 10[^] - have not tried it thought...
PIEBALDconsult 28-Jan-17 10:04am
   
:cough: Win 7 :cough:
Graeme_Grant 28-Jan-17 11:27am
   
It is amazing what you can find with Google when one uses it... ;)
* Turbo Pascal for Windows 7[^] and
* How to Run Turbo Pascal v7.0 on Windows 7/8[^]
PIEBALDconsult 28-Jan-17 12:59pm
   
If I wanted it I'd have it already. I used to use Turbo Pascal (and Turbo C), but I don't anymore.
Graeme_Grant 28-Jan-17 15:19pm
   
fair enuf...
Graeme_Grant 28-Jan-17 18:51pm
   
I found the FreePascal online IDE a couple of hours ago, so as I have never done Pascal before, I gave it a go and you can see what I did below. Oh, how I missed my favorite IDE & the power of the .Net frameworks!
PIEBALDconsult 28-Jan-17 20:29pm
   
Yeah, I loved Pascal back in the day, particularly Turbo, but once I became comfortable with C I stopped using it. I can't even read my Pascal programs from college anymore.
Graeme_Grant 29-Jan-17 21:24pm
   
[wrong place, so moved]
PIEBALDconsult 29-Jan-17 20:52pm
   
On the other hand...
Turbo Pascal (5.5+) has OOP, and HP Pascal probably doesn't -- I learned "data structures" on VAX/DEC/HP Pascal back in the 80s.
I have a working implementation in C# to prove an algorithm I devised.
But I would have to implement HashSet and Dictionary myself and I don't think I'm up to that at this time.
Graeme_Grant 29-Jan-17 21:25pm
   
Pascal's 'type StrDict: Array of Array of String;'... Then, to use: 'var DictName: StrDict;' ? But then again, I'm only new to the language...
Mine [Pascal implementation] was based off the HashSet as a concept as it was based off the F# implementation (that was prototyped in a few minutes in C#) that in fact uses a HashSet. ;)
Graeme_Grant 29-Jan-17 21:29pm
   
Also, Pascal lacks many of the language tools found in the DotNet Framework, so you will also need to implement them too .... things like sorting, manually removing null entries from split strings, HashSets, etc...
PIEBALDconsult 29-Jan-17 23:55pm
   
Well, if I knew how to access .net from Pascal... :cool:
Graeme_Grant 30-Jan-17 2:18am
   
yes, modern-day devs are spoiled...
Graeme_Grant 29-Jan-17 21:35pm
   
A quick hint... while doing the Pascal version, I found that a HashSet is not required.
PIEBALDconsult 29-Jan-17 23:54pm
   
My algorithm uses it. If I could be assured of 64 or fewer strings, I'd use bitwise operators on an Int64 instead.
PIEBALDconsult 27-Jan-17 22:44pm
   
How about the wrongest substring...
Graeme_Grant 28-Jan-17 5:24am
   
Wouldn't that just be the longest input string?
PIEBALDconsult 28-Jan-17 10:04am
   
I suppose you're correct.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

As if F# was not as far away from normal for me (second attempt ever), here is a FreePascal version for PIEBALDConsultant. This is my first attempt at programming with this language, so please be gentle as I am sure there is plenty of room for improvement...

Program LongestSubstringContained(output);
type
    dynStrings = array of string;

procedure QuickSort(input : dynStrings; lowPos, highPos : integer);
var
  movablePointer, immovablePointer, temporaryPointer : integer;
  temporaryItem : string;

begin
  immovablePointer := lowPos; 
  movablePointer := highPos;

  while (movablePointer <> immovablePointer) do
  begin
    if(movablePointer > immovablePointer) then
    begin
      if(input[movablePointer] < input[immovablePointer]) then
      begin
        temporaryItem := input[movablePointer];
        input[movablePointer] := input[immovablePointer];
        input[immovablePointer] := temporaryItem;

        temporaryPointer := movablePointer; 
        movablePointer := immovablePointer;
        immovablePointer := temporaryPointer;
      end
      else
      begin
        dec(movablePointer); 
      end;
    end
    else
    begin
      if(input[movablePointer] > input[immovablePointer]) then
      begin
        temporaryItem := input[movablePointer];
        input[movablePointer] := input[immovablePointer];
        input[immovablePointer] := temporaryItem;

        temporaryPointer := movablePointer;
        movablePointer := immovablePointer;
        immovablePointer := temporaryPointer;
      end
      else
      begin
        inc(movablePointer);
      end;
    end;
  end;

  if(movablePointer > lowPos) then
    QuickSort(input,lowPos,movablePointer-1);

  if(movablePointer < highPos) then
    QuickSort(input,movablePointer+1,highPos);
end;

//remove blank array entries...
function TrimArray(input: dynStrings) : dynStrings;
var
    i, len, count: integer;
    ret: dynStrings;

begin
    count := 0;
    len := Length(input);
    for i := 0 to len do
    begin
        if (Length(input[i]) > 0) then
        begin
            count := count + 1;
            setlength(ret, count);
            ret[count - 1] := input[i];
        end;
    end;
    TrimArray := ret;
end;

function RetrieveBestResult(strings :dynStrings) : string;
var
    str, tmp: string;
    strLen, strCount, i, len, tmpCount: integer;

begin
    tmpCount := 0;
    strCount := 0;
    strLen := 0;
    tmp := '';
    str := '';

    // order the array
    QuickSort(strings, low(strings), high(strings));
    // find the most popular longest string
    for i := 0 to high(strings) do
    begin
        len := length(strings[i]);
        if (len >= strLen) then
        begin
            strCount := 1;
            str := strings[i];
            strLen := len;
        end
        else if (str = strings[i]) then
            strCount := strCount + 1

        else if (tmp = strings[i]) then
            tmpCount := tmpCount + 1
        else
        begin
            tmp := strings[i];
            tmpCount := 1;
        end;
    end;
    RetrieveBestResult := str;
end;

// check for a match
function StrInArray(value : string; strings :dynStrings) : Boolean;
var
    str : String;

begin
    for str in strings do
    begin
        if length(value) = 0 then
            exit(false);
       //if (lowercase(value) = lowercase(str)) then
       if (value = str) then
        begin
            exit(true);
        end;
    end;
  StrInArray := false;
end;

function Process(input: dynStrings) : string;
var
    matches: dynStrings;
    allMatches: dynStrings;
    i, c, cc, count, len: integer;
    str, sstr: string;

begin
    setlength(allMatches, 0);
    setlength(matches, 0);
    for i := 0 to high(input) do
    begin
        str := input[i];
        count := 0;
        len := length(str);
        for c := 0 to len do
        begin
            for cc := 1 to len - c do
            begin
                sstr := copy(str, c, cc);
                if (high(allmatches) = -1) or (StrInArray(sstr, allMatches)) then
                begin
                    count := count + 1;
                    setlength(matches, count);
                    matches[count - 1] := sstr;
                end;
            end;
        end;
        // bounce early if no matches
        if (high(matches) < 1) then
            exit('no match');
        allMatches := copy(matches, 0, length(matches));
        writeln('Matches found: ', high(allMatches));
        setlength(matches, 0);
    end;
    Process := RetrieveBestResult(allMAtches);
end;

function GetLCS(input: dynStrings) : string;
var
    count: integer;
    work: dynStrings;

begin
    // check and clean
    count := Length(input);
    if (count = 0) then
        exit('empty')
    else
    begin
        work := TrimArray(input);
        count := Length(work);
        if (count = 0) then
            exit('empty');
    end;
    // process if work to be done...
    writeln('processing...');
    GetLCS := Process(work);
end;

var
    tests: array[0..4] of string;
    result: string;

begin
    tests[0] := 'Yes, the classic LCS problem. Given two or more strings find the longest common substring in each string.';
    tests[1] := '';
    tests[2] := 'One caveat: No C#, C, C++, Javascript or any C-based programming language[^], nor any VB-based language (VB, VB.NET, VBScript). That cuts out most of the easy languages. Spread your wings!';
    tests[3] := '';
    tests[4] := 'Your solution should take into account null strings and strings whose length is limited only by available storage.';

    result := GetLCS(tests);
    writeln('Longest Substring: ', result);
end.

And here is the output...
sh-4.3$ fpc -vw main.pas                                                                                                                                              
Free Pascal Compiler version 2.6.4 [2015/06/17] for x86_64                                                                                                            
Copyright (c) 1993-2014 by Florian Klaempfl and others                                                                                                                
Target OS: Linux for x86-64                                                                                                                                           
Compiling main.pas                                                                                                                                                    
Linking main                                                                                                                                                                                                                                                                                                                                
/usr/bin/ld: warning: link.res contains output sections; did you forget -T?                                                                                           209 lines compiled, 0.1 sec                                                                                                                                           
sh-4.3$ main                                                                                                                                                          
processing...                                                                                                                                                         
Matches found: 5564                                                                                                                                                   
Matches found: 253                                                                                                                                                    
Matches found: 147                                                                                                                                                    
Longest Substring: ings                                                                                                                                               
sh-4.3$

This solution was built using the Coding Ground[^] online IDE. Here is a link to the solution[^] where you can compile and run it.

Enjoy! :)
   
v2
Comments
Bryian Tan 28-Jan-17 18:51pm
   
Two solution from Graeme_Grant!!! Look like you are the only participant !!!! :) anyway congrats on your previous two competitions. You done a great job. I think you getting this one too.
Graeme_Grant 28-Jan-17 18:53pm
   
Thanks Bryian!

The second one was only after a chat with PIEBALDConsultant... Never done Pascal before so I thought that I'd try it.

Jump in and have a go! [edit:] I've learned 3 new languages doing these challenges! :)
Bryian Tan 28-Jan-17 18:59pm
   
Good job anyway.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

Okay, as F# is acceptable, here is an F# version that will accept multiple inputs to be compared and return the "Longest common substring". I am using the challenge's text (including nulls) as the input for testing... ;)

open System
open System.Collections.Generic
open System.Linq

let msg = "Yes, the classic LCS problem. Given two or more strings find the longest common substring in each string.\n\
           \n\
           One caveat: No C#, C, C++, Javascript or any C-based programming language[^], nor any VB-based language (VB, VB.NET, VBScript). That cuts out most of the easy languages. Spread your wings!\n\
           \n\
           Your solution should take into account null strings and strings whose length is limited only by available storage."

let tests = msg.Split [|'\n'|]

let GetSubstrings (s : string)  = 
    [ for c1 in 0 .. s.Length - 1 do
        for c2 in c1 .. s.Length - 1 do
                yield s.[c1..c2]
        ]

let addRange input (targetSet : ICollection<_>) =
        input |> Seq.iter(fun x -> targetSet.Add(x))

let GetLongestCommonSubstring (strings : string[]) =
    match (Seq.toList (strings.Where( fun x -> not(x.Equals(""))))) with
    | [] -> "empty"
    | x::xs -> 
        let subStrings = new HashSet<string>()
        addRange (GetSubstrings(x)) subStrings
        xs |> List.map(fun y -> 
            let yy = new HashSet<string>() 
            addRange (GetSubstrings(y)) yy 
            subStrings.IntersectWith(yy)) |> ignore
        subStrings.OrderByDescending(fun s -> s.Length).First()
        
[<EntryPoint>]
let main argv = 
    printfn "-----------------------------------------------------------------------\r\n"
    printfn "Input strings (including nulls / blank lines):\r\n"
    printfn "%A\r\n" msg
    printfn "-----------------------------------------------------------------------\r\n"
    printfn "** Longest common substring found is: %A\r\n" (GetLongestCommonSubstring(tests))
    printfn "\r\n-- Press any key to exit --";
    Console.ReadKey() |> ignore;
    0

... and the output:
-----------------------------------------------------------------------

Input strings (including nulls / blank lines):

"Yes, the classic LCS problem. Given two or more strings find the longest common substring in each string.

One caveat: No C#, C, C++, Javascript or any C-based programming language[^], nor any VB-based language (VB, VB.NET, VBScript). That cuts out most of the easy languages. Spread your wings!

Your solution should take into account null strings and strings whose length is limited only by available storage."

-----------------------------------------------------------------------

** Longest common substring found is: "ings"


-- Press any key to exit --
   
Comments
Patrice T 27-Jan-17 17:37pm
   
May be I am tired, but in output, What is first string and what is second ?
Graeme_Grant 27-Jan-17 17:40pm
   
My input is in the quotes "Yes, the classic ... available storage." - that's 5 strings, 2 are nulls. The output LCS after running the code over the input is "ings"
Patrice T 27-Jan-17 17:47pm
   
Ok I see.
A couple of " around each string would have been clearer to me.
Graeme_Grant 27-Jan-17 17:53pm
   
We were not given anything to work with, so yes, it was a block of text, not just a bunch of strings in an array. The challenge's own text appeared to be the best candidate. This allows for any number of strings to be used, not just two (2). The first thing I do is convert it into a string array.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 3

Here's my C# implementation, just as a counter-example.

The PIEBALD.Type.Substring class is similar to the StringSlice class I used for the Bad Word Filter challenge, but with more bells and whistles.

public System.Collections.Generic.IList<string>
LCS
(
  params string[] Subject
)
{
  System.Collections.Generic.List<string> result = 
    new System.Collections.Generic.List<string>() ;

  System.Collections.Generic.Dictionary<PIEBALD.Type.Substring,System.Collections.Generic.HashSet<int>> temp =
    new System.Collections.Generic.Dictionary<PIEBALD.Type.Substring,System.Collections.Generic.HashSet<int>>() ;

  int shortest = System.Int32.MaxValue ;

  /* Find the shortest input length */
  for ( int i = 0 ; i < Subject.Length ; i++ )
  {
    if ( shortest > Subject [ i ].Length )
    {
      shortest = Subject [ i ].Length ;
    }
  }

  int max = 0 ;

  for ( ; ( max < Subject.Length ) && ( shortest > 0 ) ; shortest-- )
  {
    max = 0 ;

    temp.Clear() ;

    foreach 
    ( 
      PIEBALD.Type.Substring s 
    in 
      PIEBALD.Type.Substring.EnumSubstrings ( Subject [ 0 ] , shortest ) 
    )
    {
      temp [ s ] = new System.Collections.Generic.HashSet<int>() ;

      temp [ s ].Add ( 0 ) ;
    }

    for ( int i = 1 ; i < Subject.Length ; i++ )
    {
      foreach 
      ( 
        PIEBALD.Type.Substring s 
      in 
        PIEBALD.Type.Substring.EnumSubstrings ( Subject [ i ] , shortest ) 
      )
      {
        System.Collections.Generic.HashSet<int> v ;
                
        if ( temp.TryGetValue ( s , out v ) )
        {
          v.Add ( i ) ;

          if ( max < v.Count )
          {
            max = v.Count ;
          }
        }
      }

      if ( max < i )
      {
        break ;
      }
      else
      {
        // Remove dead entries maybe
      }
    }
  }

  foreach
  (
    System.Collections.Generic.KeyValuePair<PIEBALD.Type.Substring,System.Collections.Generic.HashSet<int>> kvp
  in 
    temp
  )
  {
    if ( kvp.Value.Count == Subject.Length )
    {
      result.Add ( kvp.Key.ToString() ) ;
    }
  }

  return ( result.AsReadOnly() ) ;
}
   
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Cookies | Terms of Service
Web06 | 2.8.190526.1 | Last Updated 5 Feb 2017
Copyright © CodeProject, 1999-2019
All Rights Reserved.
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100