Click here to Skip to main content
15,914,160 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hello,
Please assist. I have more than 100 text files in a folder. Each of the text file contains more than a million texts. For each file I want to read the data and pull out the strings when the character position or the count of each character is 1200th. I have almost completed this project but couldn't get the hang of this so I'm 80% through my project. Please assist with the code for this.

E.g to clarify.
Say I have the following texts in the file. I want to get the string when a count/length of 10 is reached.
Peter192019_Test_File1088-202000001009289271
130000001230WINALITE TST-TEHA S GHAGHSDGAHT
HGHAGSHGHAGSH HAGSHJAGDJGAHFDTFH EFAFGDFAG

So from the above my system will pull the strings as follows:
Peter19201
9_Test_Fil
e1088-2020
0000100928
9271130000
etc..
Note that for the fifth string above the next 6 characters are from teh 2nd line. Not necessarily would be 6 but it will depend on the characters required to make the next 10 charactered word.

Please assist as to how to do this.
Thanks in advance for any assistance.
Posted
Comments
[no name] 26-May-13 20:51pm    
Sounds to me that you are looking for String.Substring, http://msdn.microsoft.com/en-us/library/aka44szs.aspx
ekipongi 26-May-13 21:14pm    
Thanks ThePhantomUpvoter. Yes, I have used the String.Substring before but not too sure how to do that with looping to get the values/strings as I want from this. Note that the data in the files are at times more or less, etc.
Dave Kreskowiak 26-May-13 21:30pm    
From what you posted, it appears as though each line in the text file has no delimitters, which means that each line must be in a fixed format. Every column is exactly x number of characters wide, no matter what content in the column.

SubString would work because every column MUST be the same width for every single line in the file.
ekipongi 26-May-13 21:39pm    
Yes that's correct Dave. But i'm just lost as to how I'll work this out using the substring and the looping to get the strings I wanted. I have worked on this project and it's really tedious and I'm lost for this so please assist with the code if possible.Thanks.
Sergey Alexandrovich Kryukov 26-May-13 22:41pm    
Millions "texts" in each text file? What are "texts in a text file"? :-)
—SA

1 solution

I am not sure that I fully understand you goals, but assuming that that it is to parse a text file into strings of a fixed length, then something like this may work for you.

This code creates a memorystream containing text that is used to simulate reading from a text file.

Private ms As New IO.MemoryStream
Private strings As New List(Of String)
Sub test()
   makedata() ' makes a file stream to simulate reading from a text file
   MakeStrings() ' extract strings of a set length
   For Each s As String In strings
      Console.WriteLine(s)
   Next
End Sub

Private Sub makedata()
   Dim sw As New IO.StreamWriter(ms)
   With sw
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .WriteLine("0123")
      .Write("456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("01234567")
      .WriteLine("89")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .Write("0123456789")
      .WriteLine("01")
      .Flush() ' force writing out to stream
   End With
End Sub

Private Sub MakeStrings()

   Const strlen As Int32 = 10 ' length ofstrings to create
   ms.Position = 0

   Dim sr As New IO.StreamReader(ms)

   Dim sb As New System.Text.StringBuilder(strlen) ' temporary string storage
   Dim lineposition As Int32

   Do While sr.Peek <> -1
      lineposition = 0
      Dim line() As Char = sr.ReadLine.ToCharArray()
      Do While lineposition < line.Length
         If sb.Length < strlen Then
            sb.Append(line(lineposition))
            lineposition += 1
         Else
            strings.Add(sb.ToString())
            Debug.WriteLine(sb.ToString)

            sb.Length = 0
         End If
      Loop
   Loop

   If sb.Length > 0 Then strings.Add(sb.ToString) ' this will be a partial string
   sr.Close()

End Sub
 
Share this answer
 
Comments
ekipongi 27-May-13 1:20am    
Thank you TnTinMn. You have been a great help to me. I will try and modify your code to see if that will work as I required. Thanks once again.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900