Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: encoding
Hi,
I was trying to find the encoding type of a file like unicode, utf8, utf8 with BOM, ANSI etc. I was able to find all the encoding type but ANSI(Encoding.Default/Windows- 1252). I am not able to differentiate ANSI and UTF8. Tried different custom class like (Ude, TextFileEncodingDetector etc) which guesses it but not exactly right. Is there any way to do it?
Posted 18-Sep-12 3:02am
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Unless the document uses any characters >= 0x80, 1252 and UTF-8 would be indistinguishable (unless a BOM is present).
 
If it does use characters >= 0x80, it would be a matter of checking the documents for tell-tale indicators, see:
 
http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout
  Permalink  
v2
Comments
jebin Cherian at 18-Sep-12 10:22am
   
Thanks for the reply Yvan. Will that differ according to the languages.
Yvan Rodrigues at 18-Sep-12 10:30am
   
This would be true of all languages.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 625
1 CPallini 320
2 Sergey Alexandrovich Kryukov 305
3 BillWoodruff 269
4 Peter Leow 255


Advertise | Privacy | Mobile
Web04 | 2.8.141029.1 | Last Updated 18 Sep 2012
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100