Click here to Skip to main content
15,889,867 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello everyone. I am new in c#
Would like me to try this code from the link below in my app.
Thanks to GitHub, this Class automatically detects the code page from the text file.
But I don't know how to call this class in my application, I don't know how to use this class...
For example, if my file is located in the path: C:\TestFolder\Test.txt, how can I detect code page of this file Test.txt.
He is asking for your help if that is possible, if not then I apologize in advance.Thanks in advance.

Link of GitHub class:
https://gist.github.com/TaoK/945127

What I have tried:

Something like
C#
private void Button1_Click(object sender, EventArgs e)
       {
           string path = @"C:\TestFolder\Test.txt";
           TextFileEncodingDetector.DetectTextFileEncoding(path);

       }
Posted
Updated 23-Oct-22 5:52am
v2
Comments
PIEBALDconsult 23-Oct-22 10:00am    
You may be confusing two different things.
As far as I know, you can't detect the code page, and you shouldn't need to.
You may be able to detect the Unicode encoding (UTF-8, UTF-16, etc.), but you still shouldn't need to.

What is it you are actually trying to do? And why? Very likely you are simply causing yourself trouble for no reason.
Dave Kreskowiak 23-Oct-22 12:52pm    
Why would you do this? There is no way to reliably detect the encoding of a text file. If you think this is a sure-fire way of detecting the encoding, you would be wrong. This point is even covered in the code comments:

This class does NOT try to detect arbitrary codepages/charsets, it really only
aims to differentiate between some of the most common variants of Unicode
encoding, and a "default" (western / ascii-based) encoding alternative provided
by the caller.

As there is no "Reliable" way to distinguish between UTF-8 (without BOM) and
Windows-1252 (in .Net, also incorrectly called "ASCII") encodings, we use a
heuristic - so the more of the file we can sample the better the guess. If you
are going to read the whole file into memory at some point, then best to pass
in the whole byte byte array directly. Otherwise, decide how to trade off
reliability against performance / memory usage.

The UTF-8 detection heuristic only works for western text, as it relies on
the presence of UTF-8 encoded accented and other characters found in the upper
ranges of the Latin-1 and (particularly) Windows-1252 codepages.

If you don't know what "heuristic" means, it's basically an algorithm that makes a "best guess".

Go back to where you got the code from, and ask there: we aren't a tech support service for random packages!
 
Share this answer
 
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900