Click here to Skip to main content
15,891,431 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I need to extract the lines which contains <title> and <Header> from 500 HTML files stored in a folder.
I tried the code below, but when I execute it, it is telling error in streamwriter.

What I have tried:

foreach (string arg in Directory.GetFiles(@"C:\Users\htmlfiles") )
           {
               string line;
               StreamReader file =new StreamReader(arg,Encoding.GetEncoding(1252));
               StreamWriter file2 = new  StreamWriter(@"C:\Users\outputfile.txt");
               while ((line = file.ReadLine()) != null)
               {
                       if (line.Contains("<TITLE>"))
                       {
                          file2.WriteLine(line);
                       }
                       if (line.Contains("<HEADER>"))
                       {
                           file2.WriteLine(line);
                       }

                   }
         file2.Close();
               }
           }
Posted
Updated 8-Feb-19 8:56am
v3
Comments
F-ES Sitecore 8-Feb-19 11:07am    
If your code is throwing an error always say what the error message is, it's there for a reason.

Secondly you might want to look at the HTML Agility Pack to do this for you. The issue with your code is that it won't work for when the tag is split across lines

<title>
My Title
</title>

C#
StreamWriter file2 = new StreamWriter(@"C:\Users\outputfile.txt");

You are not allowed to create files in C:\Users. Use a folder in one of the appropriate locations, e.g. Documents, AppData\Local etc.
 
Share this answer
 
Comments
Priya Karthish 8-Feb-19 13:41pm    
I need to extract the content between <title> tag and <H1> tag. First I thought, i ll just extract those two lines completely and keep it in a file. For that only I wrote that c# code.
Error: Couldn't write in Streamwriter.
I changed the directory and checked, still the same error is coming
Richard MacCutchan 9-Feb-19 4:18am    
What directory, and what error?
int counter = 0;
string line;

System.IO.StreamReader file = new System.IO.StreamReader(@"Test.html");
System.IO.File.WriteAllText("TestOut.txt", string.Empty);

while ((line = file.ReadLine()) != null)
{
    if (line.Contains("<TITLE>") || line.Contains("<H1>"))
    {
        System.Console.WriteLine(line);
        System.IO.File.AppendAllText("TestOut.txt", line + Environment.NewLine);
        counter++;
    }
}

file.Close();
System.Console.WriteLine("{0} lines found.", counter);
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900