Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ .NET Visual-Studio
Hi folks - I need your help. I am tring and searching to get rid of this problem for several days but somehow I did not manage it: I like to use some .NET code to read and interpret my e-mails automatically.

Basically it works fine, only some UTF characters are disturbing my work. This is what happens: E-mail header says a mail is encoded with UTF-8. For to read my mails I use ReadLine() from StreamReader class. I store the return values in a String class object.
 
As far as I know, StreamReader is set to UTF-8 by default. I have also read that String class objects are unicode. Because UTF-8 also is unicode I do not understand that I get return values as "=C3=A4" or "=E2=80=9C" within the normal text.
 
Besides:
StreamReader^ reader = gcnew StreamReader(sslstream);
I have tried:
StreamReader^ reader = gcnew StreamReader(sslstream, Encoding::UTF8, false);
and
Encoding ^enc = Encoding::GetEncoding("utf-8");
StreamReader^ reader = gcnew StreamReader(sslstream, enc, false);
(where false is to prevent automatic search for some start up byte orders for encoding indentifiers)


Nothing changes and I don't know why...

What I find strange is (when debugging the StreamReader object) that I find StreamReader's "CurrentEncoding"-Value set to
CurrentEncoding = 0x00c6bfa4 { CodePageASCII=20127 ISO_8859_1=28591 ...}
 
I think the encoding mode is the problem. When StreamReader tries to read the mail in ASCII mode it must have a problem with special characters. The only questions is, how can I force it to switch to unicode/UFT-8. It seems to have no effect - whatever I do - when creating the StreamReader object.

Can you help? Thanks a lot!
Posted 20-Feb-11 11:42am
Edited 23-Feb-11 8:53am
v6

1 solution

Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

First of all, you usually should not assume encoding in StreamReader. Use the other constructors, those accepting Boolean parameter bool detectEncodingFromByteOrderMarks. This API accept BOM at the beginning of the stream.
 
For more information, see this http://en.wikipedia.org/wiki/Byte-order_mark[^] and this: http://www.unicode.org/faq/utf_bom.html#BOM[^].
 
—SA
  Permalink  
v3
Comments
Widder29 at 21-Feb-11 13:23pm
   
Hi, thank you very much for your answer!
 
First, let me tell you that meanwhile I have tried the other constructor:StreamReader^ reader = gcnew StreamReader(sslstream, true);(where true enables detectEncodingFromByteOrderMarks)
 
I am very sorry to tell you that this change had no effect.
 
I still get special characters.
 
Second, I already know about the BOM - and your suggestion to use the constructor above surely might be a good idea. But I wonder why StreamReader does not like to be forced into UTF-8 mode.
 
Anyhow - do you have another idea I could try?
SAKryukov at 21-Feb-11 16:42pm
   
Look, you know the BOM, but are you sure you really have it in your text? Just answer to close this part. You could do is in, sat, Notepad in "save as".
 
What do you mean: "StreamReader does not like to be forced into UTF-8 mode". It will use the mode you tell. Half of problem is reading, another half is writing: what is really in the text and why. Unicode will ultimately be read one-to-one, you can test it by reading and writing back a copy and comparing it.
 
So, as soon as we sort out encoding and reading, the remaining part is: what's in your file? You can always read the file as binary and compare what's expected with what you read. Essentially, in Unicode there are not "special characters" except BOMs and surrogates. What you see is probably something else. How do I know if it's wrong or not. This is something in the file, that's it. Why? OK, do you have a code which had written the file?
--SA
 
Widder29 at 22-Feb-11 16:40pm
   
Hi again. No, I cannot say if there is a BOM in the emails. I also have no "SAVE/SAVE AS" functionality in my webmail, so I can not save it to file and look for it.

With "StreamReader does not like to be forced into UTF-8 mode" I would like to say that (while debugging the StreamReader object) I always find find StreamReader's "CurrentEncoding"-Value set to: CurrentEncoding = 0x00c6bfa4 { CodePageASCII=20127 ISO_8859_1=28591 ...}

This happens any time, no matter what constructor and encoding mode I have used to create the StreamReader object. I expected to see there something else like - let's say - "CodePageUTF8=65001" or similar. I seems clear to me that with using some ASCII codepage there will never be a correct decoding.

Ok, "special characters" maybe was the wrong phrase in this context. I change this phrase to "characters that are specific to certain languages". EXAMPLE: In this special case instead of a german letter 'ä' I get "=C3=A4" - and instead of german letter 'ö' I get "=C3=B6" within the mail body. Same behaviour I see at specific french and norwegian letters.
 
No, sorry - there is no code that has written some of the mails. In full they are some thousands of (more or less) regular emails from over the world and number is increasing every day. I do not know which mailing software did generate them.

Hope this all helps you understanding what I wanted to say initially. Thank you a lot for trying to help!
SAKryukov at 22-Feb-11 18:26pm
   
What do you mean "I cannot say if there is a BOM in the emails."? You can use binary editor to see, or check up StreamReader under debugger.
E-mails usually (or maybe by standard) go without BOMs. In this case, you should enforce the encoding written in the parameter "charset" of the e-mail, if any.
If may so happen that e-mail is sub-standard. Your "codes" look familiar. Can you post or reference a sample of your e-mail with explanation what is supposed to be there. Also, please tell me, can you open your e-mail with "Microsoft Outlook Express" (Important: Express) and see if you can see what's expected. This application also has View/Encoding option, you can select what's looks right as so figure out encoding.
 
--SA
Widder29 at 23-Feb-11 5:56am
   
"You can use binary editor to see, or check up StreamReader under debugger."

While debugging I did not see anything looking like a BOM. I am afraid I can not use a binary editor because I can not save the mails to disk and look into them with the editor. Or do you know a binary editor that is able to connect to a webmail service? Then I would check for it, of course.

No, I have no Outlook/Outlook Express - and if I can avoid - I won't install any client based mail software.

I will post some email here. It is a common PayPal e-mail and (therefore I believe) it is not sub-standard. But who knows for shure...
 
###########################################################
E-MAIL FOLLOWS - All "should be"-Text is marked with ***
All blocks with "should be"-Text are repeated by me in ()-brackets
###########################################################
 
* OK Gimap ready for requests from ##.###.###.### i12if1339990bkh.43
* CAPABILITY IMAP4rev1 UNSELECT LITERAL+ IDLE NAMESPACE QUOTA ID XLIST CHILDREN X-GM-EXT-1 UIDPLUS COMPRESS=DEFLATE
* FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
* OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted \Seen \*)]
* OK [UIDVALIDITY 598686366]
* 24310 EXISTS
* 0 RECENT
* OK [UIDNEXT 37895]
A0002 OK [READ-WRITE] INBOX selected. (Success)
* 22916 FETCH (BODY[] {14521}
Delivered-To: USERNAME@googlemail.com
Received: by ##.###.###.### with SMTP id ##############;
Wed, 15 Dec 2010 08:22:03 -0800 (PST)
Received: by ##.###.###.### with SMTP id ###############.############;
Wed, 15 Dec 2010 08:22:01 -0800 (PST)
Return-Path: <payment@paypal.com>
Received: from someserver.com (someserver.com [##.##.###.###])
by mx.google.com with ESMTP id ######################;
Wed, 15 Dec 2010 08:22:00 -0800 (PST)
Received-SPF: softfail (google.com: domain of transitioning payment@paypal.com does not designate ##.##.###.### as permitted sender) client-ip=##.##.###.###;
DomainKey-Status: good
Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning payment@paypal.com does not designate ##.##.###.### as permitted sender) smtp.mail=payment@paypal.com; domainkeys=pass header.From=sendmail@paypal.com
Received: from mx0.phx.paypal.com (mx0.phx.paypal.com [##.###.###.###])
by someserver.com (Postfix) with ESMTP id ###############
for ; Wed, 15 Dec 2010 17:21:58 +0100 (CET)
DomainKey-Signature: s=dkim; d=paypal.com; c=nofws; q=dns;
h=Received:Date:Message-Id:Subject:X-MaxCode-Template:To:
From:Sender:X-Email-Type-Id:X-XPT-XSL-Name:Content-Type:
MIME-Version;
b=iIo9Uhm+7eu7KDz6w1S/YSRLwjpr0x///rdj18ZudQDh8B7CGzpyzRFR
pnr+5ct6/T4gw/un81kwRohizSwj7PFhxfRcbNjF1zY691gbUarkSHsX8
cOt0e07llFWdKD73+Xmvsk6qCYbAqJ2I92YQ5/fJ97D19tuj3OCMpIwnZ
c=;
Received: (qmail 4368 invoked by uid 993); 15 Dec 2010 16:21:57 -0000
Date: Wed, 15 Dec 2010 08:21:57 -0800
Message-Id: <#########.####@paypal.com>
Subject: PayPal-Zahlungsanforderung von ### & ###
X-MaxCode-Template: email-transaction-counterparty
To: ### & ###
From: "######@web.de" <###@web.de>
Sender: sendmail@paypal.com
X-Email-Type-Id: PP274
X-XPT-XSL-Name:
email_pimp/default/de_DE/transaction/seller/TransactionCounterparty.xsl
Content-Type: multipart/alternative;
boundary=--NextPart_048F8BC8A2197DE2036A
MIME-Version: 1.0
 
----NextPart_048F8BC8A2197DE2036A
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=windows-1252
 
Guten Tag, ### & ###=21
 
#### hat Ihnen eine Zahlung gesendet.
 

----------------------------------------------------------------
 

-----------------------------------
Zahlungsdetails
-----------------------------------
 
Betrag: =##,## EUR
=20
Transaktionsdatum: 15. Dezember 2010
=20
Transaktionscode: ##################
=20
Betreff: PayPal-Zahlungsanforderung von ### & ###
 
Loggen Sie sich in Ihr Konto ein, und =F6ffnen Sie die Registerkarte =
=22Kontoauszug=22, um die Details zu dieser Transaktion ei
SAKryukov at 23-Feb-11 14:59pm
   
First of all: You can save you eml file as is. You should read it as binary stream (maybe not for your application but for research of the problem). I did not know that you did not do it.
Use extension of ".EML"). Do it and open with Outlook Express. Also, you will have a valid binary sample to off-line tests.
 
(Then, most likely there no BOMs. You could also detect BOM by resulting encoding, under debugger.)
 
--SA
Widder29 at 23-Feb-11 6:03am
   
#################################################
The e-mail was too long for a comment - it seems.
Here is the rest of it:
#################################################

Loggen Sie sich in Ihr Konto ein, und =F6ffnen Sie die Registerkarte =
=22Kontoauszug=22, um die Details zu dieser Transaktion einzublenden.
 
(Loggen Sie sich in Ihr Konto ein, und *** öffnen Sie die Registerkarte =
*** "Kontoauszug", um die Details zu dieser Transaktion einzublenden.)
 

 

https://www.paypal.com/de/vst/id=##################
 

 
############# ist ein verifizierter K=E4ufer.
(############# ist ein verifizierter ***Käufer.)
 
############# ist verifizierter PayPal-Kunde und verf=FCgt =FCber ein =
best=E4tigtes Bankkonto oder hat eine Genehmigung f=FCr eine PayPal Extras =
MasterCard=AE erhalten.
 
(############# ist verifizierter PayPal-Kunde und *** verfügt *** über ein =
*** bestätigtes Bankkonto oder hat eine Genehmigung *** für eine PayPal Extras =
MasterCard *** ® erhalten.)
 
... some more text ...
 

... some more text ...
 

----------------------------------------------------------------
Copyright =A9 1999-2010 PayPal. Alle Rechte vorbehalten.
 
PayPal (Europe) S.=E0 r.l. & Cie, S.C.A.
Soci=E9t=E9 en Commandite par Actions
(*** Société en Commandite par Actions)
 
... some more text ...
 
----NextPart_048F8BC8A2197DE2036A--
 
)
A0003 OK Success
* BYE LOGOUT Requested
A0004 OK 73 good day (Success)

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 7,740
1 DamithSL 5,769
2 Sergey Alexandrovich Kryukov 5,424
3 Maciej Los 5,076
4 Kornfeld Eliyahu Peter 4,539


Advertise | Privacy | Mobile
Web02 | 2.8.141223.1 | Last Updated 23 Feb 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100