Click here to Skip to main content
15,885,546 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
I have a very large text file that I want to increase its size, so I want to code a program that would do that by deleting some data I don't want in that file.
Here is a small sample of that file.

START
POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356
2.197266 1554.908813
2.278646 1309.400635
5.615234 572.443115
5.696615 572.070190
6.510417 616.282471
6.591797 611.210938
6.673177 615.655396
POINT 1050 5360 4770.576031 33044.253728 112.699997 10005360
2.197266 883.810486
2.278646 1237.972656
2.360026 1187.120972
2.522787 922.997620
2.604167 868.807739
2.685547 810.683044
2.766927 794.258240
2.929688 706.232666

The program should ask for the destination of the file. The first line "start" is not repeated and should be ignored (not deleted) then it should 'group' the files. All data between 2 'POINTS' should be a group of its own.
So for example, this would be considered as a group:
POINT 1000 5356 4720.589395 33044.474616 111.699997 10005356
2.197266 1554.908813
2.278646 1309.400635
5.615234 572.443115
5.696615 572.070190
6.510417 616.282471
6.591797 611.210938
6.673177 615.655396
And so on....

Then it would delete the groups according to their heading "POINT ......"
I want the program to ask me a couple of questions, such as:
Start point (from POINT 1000 ......) for example
End point (till POINT 3521 .....)
Increment (delete every 5 points, for example delete POINT 10 Then POINT 15 Then 20... till the end point)

I hope you understood me and I prefer that it is done in vb but I guess it won't won't work as the file is 9 million lines. So if not please tell me if it could be done in c++ or c# and please tell me the method or a tutorial(s) that could help me.
Thanks in advance
Posted
Comments
ZurdoDev 12-Dec-14 11:44am    
Where are you stuck?
Member 9472140 12-Dec-14 12:00pm    
I am stuck from the begging.
I mean, I don't know how to start.
OriginalGriff 12-Dec-14 12:01pm    
And what have you done so far?
Where are you stuck?
What help do you need?
And why do you think "it can't be done in VB"? If it can be done in C# (and it can) it can be done in VB: they both compile to the same IL...
Member 9472140 12-Dec-14 12:03pm    
I heard that vb can't import huge files like this.
Sergey Alexandrovich Kryukov 12-Dec-14 12:28pm    
Perhaps your pose the problem in a wrong, counter-productive way. Instead of trying to stick huge files somewhere, who not thinking at getting rid from them. Big text files really make little sense, no matter what's the purpose. But you can help if you explain your ultimate goals.
—SA

1 solution

Please see my comments to the question. Again, you are approaching it in a wrong way.

When I replied that using big text files is a bad thing and asked about your goals, you did not really explain them, but you mentioned that "some sort of software" which probably doesn't give you a choice. But then, why asking about "decreasing a size"? Who is going to decrease it? Isn't that logical.

And still, we don't know essential information, structure and semantic of the file. Okay, this is one of possible approaches:

You can index the file, to introduce the ability to read it by smaller chunks. Let's assume the file has some shallow structure; in particular, it would mean it can be decomposed on some smaller logical chunks we shall call "records". A record can be a line, but it could be a group of lines, like the group you've shown in your example. Then only problem then is that each group has different size; first of all, all lines have different size, so you don't know the location of each record before you read the whole file.

So, on first run, you can read the whole file line by line and create another, smaller file, the index file. In index file, you can write the location of each record as file position. It would be better to make the index file binary, to navigate faster in that file. You can have more then one index file, sorted by different criteria (one is sorted by record number in the order defined in the original file another one sorted by some kind of keyword, for example). Then, you can hold the index file in memory, and, if even the index files are big, store the only index of the index file, and read the index files on request.

Now, on request/query, you get the information on some record from one or another index file (take from memory or read from index file). From index information, get the position in the main original big file and seek this position in the file stream (open it once and keep open during the whole lifetime of the application). Then read your record from the original file.

One slightly different alternative: do everything as described above, but, on first run, completely rewrite the original text file in something more convenient for navigation, which could be much shorter binary file. In that binary file, don't store numbers as strings; it will save you a lot of space and, more importantly, greatly improve your performance.

—SA
 
Share this answer
 
Comments
Maciej Los 12-Dec-14 15:52pm    
There is no chance to help someone if he don't want to explain he's problem.
+5!
Sergey Alexandrovich Kryukov 12-Dec-14 17:18pm    
At least I just tried an thus eased up my soul. :-)
Thank you, Maciej.
—SA
Maciej Los 12-Dec-14 17:21pm    
Trying is not a sin, unless... ;)
Sergey Alexandrovich Kryukov 12-Dec-14 17:22pm    
...Unless this is a try to commit a sin? :-)
—SA
Maciej Los 12-Dec-14 17:26pm    
:laugh:
It's very dangerous. You're reading in my mind ;)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900