Click here to Skip to main content
12,248,742 members (37,846 online)
Rate this:
 
Please Sign up or sign in to vote.
See more: encoding
Hi,
I was trying to find the encoding type of a file like unicode, utf8, utf8 with BOM, ANSI etc. I was able to find all the encoding type but ANSI(Encoding.Default/Windows- 1252). I am not able to differentiate ANSI and UTF8. Tried different custom class like (Ude, TextFileEncodingDetector etc) which guesses it but not exactly right. Is there any way to do it?
Posted 18-Sep-12 4:02am
Rate this: bad
 
good
Please Sign up or sign in to vote.

Solution 1

Unless the document uses any characters >= 0x80, 1252 and UTF-8 would be indistinguishable (unless a BOM is present).

If it does use characters >= 0x80, it would be a matter of checking the documents for tell-tale indicators, see:

http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout
  Permalink  
v2
Comments
jebin Cherian 18-Sep-12 10:22am
   
Thanks for the reply Yvan. Will that differ according to the languages.
Yvan Rodrigues 18-Sep-12 10:30am
   
This would be true of all languages.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
Top Experts
Last 24hrsThis month


Advertise | Privacy | Mobile
Web01 | 2.8.160426.1 | Last Updated 18 Sep 2012
Copyright © CodeProject, 1999-2016
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100