Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# .NET LINQ Parsing
i am working parsing textfile using LINQ but got struc on it,its going outof range exception
 string[] lines = File.ReadAllLines(input);
            var t1 = lines
                .Where(l => !l.StartsWith("#"))
                .Select(l => l.Split(' '))
                .Select(items => String.Format("{0}{1}{2}",
                    items[1].PadRight(32),
                    //items[1].PadRight(16)
                    items[2].PadRight(32),
                    items[3].PadRight(32)));
            var t2 = t1
                .Select(l => l.ToUpper());
            foreach (var t in t2)
                Console.WriteLine(t);
and file is about 200 to 500 lines and i want to extract specific information so i need to split that information to different structure so how to do it this..
Posted 29-Sep-12 10:05am
Edited 29-Sep-12 10:24am
v2
Comments
Zoltán Zörgő at 29-Sep-12 15:30pm
   
How exactly a row looks like and how you want to transform it?
saniaali at 29-Sep-12 15:34pm
   
data is in this format
#comments start with this
input = 12
output = 4
pin class direction no
io 1 up 0
io 3 rught 1
cb 6 up 2
io 1 up 0
 
ect ...
so i need to extrct its pin information
Zoltán Zörgő at 29-Sep-12 16:17pm
   
the etc... part is also interesting: after the input, output and table header there is only the table data in the file, or can be something else also? Comments can begin at any position in the row? Is the table of fixed structure in every file?
Greysontyrus at 10-Oct-12 5:08am
   
Could we have a few rows (or the whole) of the input file please.
Whenever I see items[#] where # is hard coded I always shudder. The message says that there is a line that has less than 4 elements in a 'item'
You can confirm this by testing the file with your code but replacing the t1 linq to:
var t1 = lines
.Where(l => !l.StartsWith("#"))
.Where(l => l.Split(' ').Length < 4);
.Select(l => "'" + l.Replace(" ", "' '") + "'");
the Console.WriteLine(t); with then print any rows that causes an error in quotes so you can see each line and item as the lamba see it. You may find that a row is missing some ' 's so the .Split(' ') is failing to find 4 seperate elements
saniaali at 29-Sep-12 16:36pm
   
yes comments can be in any place and more interesting is that there would be also space on others information in this format as well i need to find number of inputs ,output,other pins, there type,position ,rotation details..
Zoltán Zörgő at 29-Sep-12 17:19pm
   
Well, without a proper format specification it would be impossible. Even for us :) But as we can not see any of those files, and we don't know what file this could be, we will only be able to guide you - but that is the purpose of this site.
So:
1) try to find, or figure out the most general format of the file
2) figure out how could you identify the interesting part or the uninteresting part, so you can separate these
3) Use regular expression to cleanup interesting part from comments
4) Use split or regular expression to parse the interesting rows
...oh, and by the way try to use the "reply" feature so that one can see that you replied to a comment.
Good luck
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

I think better go for regex if possible rather than a linq. after its more of string/characters than data/tables/lists. so why not use something text oriented solution?
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

You need to parse a file, so, the starting point is not LINQ (this is implementation detail), but rather defining the grammar of the file to the needed detail level to allow implememting a good-enough parser.
 
The grammar in your case looks to me somethin like:
( {...} = 0-n repetitions, [...] = optional, a | b = a or b, '...' = literal text)
file     : { line } .
line     : { ws } [ input | output | pin | data ] rest .
rest     : [ comment ] EOL .
comment  : { ws } '#' { NOT_EOL } .
ws       : SPACE_NOT_EOL .
input    : 'input' { ws } '=' { ws } number .
output   : 'output' { sw } '=' { ws } number .
pin      : 'pin' ws { ws } 'class' ws { ws } 'direction' ws { ws } 'no' .
data     : word ws { ws } number ws { ws } word ws { ws } number .
number   : DIGIT { DIGIT } .
word     : WORDCHAR { WORDCHAR } .
Implementing:
// 1:   input|output|pin|data...
// 2-4: class,direction,no if pin
// 2-4: number, dir, number if data...
Regex scan = new Regex(
              @"^\s*(?:(\w+)\s*(?:=\s*\w+|(\w+)\s+(\w+)\s+(\w+)))?\s*(?:[#].*)?$",
              RegexOptions.Multiline);
 
var lines = scan.Matches(File.ReadAllText(@"..\..\data.txt"))
                .Cast<Match>()
                .Where(m=>m.Groups[1].Success)
foreach (var m in lines)
{
    switch (m.Groups[1].Value)
    {
        case "input": case "output": case "pin": // ignored
            break;
        default:
            Console.WriteLine("{0,-16} {1,4} {2,-6} {3,4}",
                              m.Groups[1].Value,
                              m.Groups[2].Value,
                              m.Groups[3].Value,
                              m.Groups[4].Value);
            break;
    }
}
With input file
# Header
input = 12
output = 4
# Data
pin class direction no     # Data Title
io    1      up        0
io    3      rught     1
cb    6      up        2
io    1      up        0
# End of data
Results in
io                  1 up        0
io                  3 rught     1
cb                  6 up        2
io                  1 up        0
Cheers
Andi
  Permalink  
v4

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 5,170
1 DamithSL 4,357
2 Maciej Los 3,750
3 Kornfeld Eliyahu Peter 3,470
4 Sergey Alexandrovich Kryukov 2,851


Advertise | Privacy | Mobile
Web01 | 2.8.141216.1 | Last Updated 9 Mar 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100