Click here to Skip to main content
13,666,184 members
Rate this:
 
Please Sign up or sign in to vote.
See more:
Hi developers,

This is my code to copy multiple files data into a single one:

public void generateMonthReport()
       {
           string[] inputFilePaths = Directory.GetFiles("//Path to directory for multiple files","*.txt.2018-"+DateTime.Now.ToString("MM")+"-*");

           using (var outputStream = File.Create("//Path to save the new file" + DateTime.Now.ToString("MM-yyyy") + "-Statistics.txt"))
           {
               foreach (var inputFilePath in inputFilePaths)
               {
                   using (var inputStream = File.OpenRead(inputFilePath))
                   {
                       inputStream.CopyTo(outputStream);
                   }

               }
           }
       }



Photo of output (please see)
https://image.ibb.co/etL5T9/Capture.jpg

in that photo there is a date highlited. That line is where a new file is being written.

The problem is that that highlited date is written different from the other and when i come to read it with regex an error is being outputted.

I tried to put that highlited line in regex101 text string and it was read as a bullet
Image : https://image.ibb.co/hF7YgU/Capture1.jpg


UPDATE AS REQUESTED :
This is my regex (multiline) :
^(?<date>[^ ]+) (?<time>[^A-Z]+) (?<errorMessage>[^[]+) \[1\] (?<programName>[^.]+)[.](?<formName>[^.]+)[.](?<event>[^ ]+)[^a-z]+(?<username>[^:]+):(?<message>[^.]+).+$"


In the hexEditor i found that the line begins with (maybe it can helps):
 -> EF BB BF

This is the code to read the data from the file :
var MyTextFileDataSet = new TextFileDataSet.TextFileDataSet();
          using (var filestream = new FileStream("//path of file to read.", FileMode.Open, FileAccess.Read,FileShare.ReadWrite))
          {
              MyTextFileDataSet.ContentExpression = new Regex(@"^(?<date>[^ ]+) (?<time>[^A-Z]+) (?<errorMessage>[^[]+) \[1\] (?<programName>[^.]+)[.](?<formName>[^.]+)[.](?<event>[^ ]+)[^a-z]+(?<username>[^:]+):(?<message>[^.]+).+$", RegexOptions.Multiline);

              MyTextFileDataSet.Fill(filestream);

          }


          int counterError = 0, counterFatal = 0, counterWarning = 0;



          var rows = MyTextFileDataSet.Tables[0].AsEnumerable();
          string errorMessage = "";
          string transactionsMessage = "";
          int counter = 0;
          foreach (var row in rows)
          {
              errorMessage = row.Field<string>("errorMessage");
              var date = DateTime.Parse(row.Field<string>("date"));
              var name = row.Field<string>("username").Trim();
              transactionsMessage = row.Field<string>("message");
              string transactionsString = "";
              if (transactionsMessage.Contains("generated"))
              {
                  transactionsString = getBetween(transactionsMessage, "generated", "transactions");
              }
              if (transactionsString != "")
              {
                  var transactions = Convert.ToInt32(transactionsString);
                  var logItem = logItems
                                            //.Where(n => n.Date == date)
                                            .Where(n => n.Name == name)
                                            .FirstOrDefault();
                  if (logItem == null)
                  {
                      logItems.Add(new LogItemGeneration
                      {
                          Date = date,
                          Name = name,
                          Transactions = transactions
                      });
                  }
                  else
                  {
                      logItem.Transactions += transactions;
                  }
              }


              switch (errorMessage)
              {
                  case "ERROR":
                      counterError++;
                      break;
                  case "FATAL":
                      counterFatal++;
                      break;
                  case "WARN":
                      counterWarning++;
                      break;

                  default:
                      break;
              }
              //GENERATING CHART

              counter++;

this line :
var date = DateTime.Parse(row.Field<string>("date"));
is giving a format exception when it comes to that line where the data is written different



If you did not understand something or want to clarify something do not hesitate to comment and i will answer :)

What I have tried:

When i changed that line like the others (manual) everything worked fine.

I tried to search for other methods on internet but failed to do so.
Posted 10-Aug-18 0:15am
Updated 10-Aug-18 2:22am
v2
Comments
Richard MacCutchan 10-Aug-18 6:33am
   
The difference you see is only present in the application that displays the data. The actual file content is the same for all the text. Maybe if you edit your question and show the actual content and explain what regex you are using and what error you receive, people will be able to help.
Sigmond Gatt 10-Aug-18 8:06am
   
Question updated , maybe it can help you more im really sorry but english is not my first language
Richard MacCutchan 10-Aug-18 8:20am
   
No need to apologise for your English, my Maltese is terrible.

<quote>In the hexEditor i found that the line begins with (maybe it can helps):
 -> EF BB BF
Those bytes just identify the content as encoded in UTF-8, but it does not affect the format. Any editor or text handling routine will skip over those bytes. However if they are in the middle of the file they will not be interpreted correctly. You should also check the file that contains the source of that text to see if the extra character(s) comes from the original file.
Richard MacCutchan 10-Aug-18 8:28am
   
I suspect the problem occurs because you are using a FileStream to read a text file. But the FileStream treats the content as binary, so it will copy control characters as well as ordinary characters. You should use a TextReader to read the input files and a TextWriter to write the updated file.
Richard MacCutchan 10-Aug-18 8:29am
   
See my latest comment above.
Sigmond Gatt 10-Aug-18 8:26am
   
First of all thanks for understanding me , I think it is comming in the middle of my file because i am inputting multiple files into 1 and i am reading the wholes files including these characters as well, i will try to check about streamreaders and streamwriters maybe i can work with them
Richard MacCutchan 10-Aug-18 8:43am
   
That's OK, Google will still find it, for anyone who is interested.
Sigmond Gatt 10-Aug-18 8:31am
   
Make this comment as a solution.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 2

I think the problem is that you're trying to merge lines of text files, which are composed of characters, without worrying about character encoding.

The hex that you mention looks like a Byte Order Mark (BOM) for UTF-8, see:

Byte order mark - Wikipedia[^]

Because you are treating the files as byte streams, you are copying the byte order marks verbatim. This works fine for the first line, since its OK to have a BOM at the beginning of a file, where it belongs.

However, for the second and subsequent files, it does not work. This is because you are copying the BOM into the middle of the file, where it does not belong.

You need to look into using a stream reader / stream writer, which do the character encoding for you. See:

How to: Read Text from a File | Microsoft Docs[^]
How to: Write Text to a File | Microsoft Docs[^]

Alternatively, the following can simplify reading the lines:

File.ReadLines Method (System.IO) | Microsoft Docs[^]

There is an equivalent for writing lines, but it is not well suited to merging multiple files:

File.WriteAllLines Method (System.IO) | Microsoft Docs[^]
  Permalink  
v3
Comments
Richard MacCutchan 10-Aug-18 8:44am
   
That definitely appears to be the issue. See my comments and OP's replies above.
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

Start by looking at the files your are opening: use a hex editor to look at the end of the first one, and the start of the next. It's important to use a hex editor, because a text editor (like notepad) will make assumptions about the file content that will "hide" what you need to look for.
what would the "combined file" look like? Is there anything in there other than straight letters, numbers, punctuation, and newline (either 0x0D, 0x0A, \n, \r or a combination)? If so, what?
Then look at the "combined" file using the same hex editor - what does the "join" look like? Is everything exactly what you would expect from your observations of the inputs?

You need to gather info on exactly what is happening - and we can't do that for you, we have no access to your file system!
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Cookies | Terms of Service
Web04-2016 | 2.8.180810.1 | Last Updated 10 Aug 2018
Copyright © CodeProject, 1999-2018
All Rights Reserved.
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100