Click here to Skip to main content
11,644,731 members (63,359 online)
Rate this: bad
Please Sign up or sign in to vote.
See more: encoding
I was trying to find the encoding type of a file like unicode, utf8, utf8 with BOM, ANSI etc. I was able to find all the encoding type but ANSI(Encoding.Default/Windows- 1252). I am not able to differentiate ANSI and UTF8. Tried different custom class like (Ude, TextFileEncodingDetector etc) which guesses it but not exactly right. Is there any way to do it?
Posted 18-Sep-12 3:02am
Rate this: bad
Please Sign up or sign in to vote.

Solution 1

Unless the document uses any characters >= 0x80, 1252 and UTF-8 would be indistinguishable (unless a BOM is present).

If it does use characters >= 0x80, it would be a matter of checking the documents for tell-tale indicators, see:
jebin Cherian at 18-Sep-12 10:22am
Thanks for the reply Yvan. Will that differ according to the languages.
Yvan Rodrigues at 18-Sep-12 10:30am
This would be true of all languages.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 680
1 494
2 Sergey Alexandrovich Kryukov 464
3 Afzaal Ahmad Zeeshan 404
4 CPallini 386
0 OriginalGriff 1,020
1 Sergey Alexandrovich Kryukov 673
2 DamithSL 656
3 Afzaal Ahmad Zeeshan 652
4 CPallini 555

Advertise | Privacy | Mobile
Web02 | 2.8.150731.1 | Last Updated 18 Sep 2012
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100