Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ Win32 Unicode
I have tried to display URL(contain unicode characters) response in Message Box.
 
But garbage characters displaying, so that i tried to made different conversions, but alphabits only displaying remaining unicode characters displaying as ?(or)garbage values.
 
im trying this from 7 days..plz help me...im very new to coding plz write the complete code.
HINTERNET hInternet = InternetOpen( _T(""), INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0 );
 
HINTERNET hConnect = InternetConnect( hInternet, L"http://xxxxxxxxxx.com", INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 1);
 
 DWORD options = INTERNET_FLAG_NEED_FILE|INTERNET_FLAG_HYPERLINK|INTERNET_FLAG_RESYNCHRONIZE|INTERNET_FLAG_RELOAD;
 
 HINTERNET hRequest  = InternetOpenUrl(hInternet,  L"http://xxxxxxxxxxx.com/c.php?varname=artxt&text=يبتىلمينبىمنيبغعاanand123",  NULL, 0, options, 0); 
 

TCHAR buffer[100];
DWORD bytesRead;
InternetReadFile(hRequest, buffer,100, &bytesRead);
MessageBoxW(NULL,ATL::CA2W(buffer),L"Check",MB_OK);
}
  
    InternetCloseHandle(hRequest);
    InternetCloseHandle(hConnect);
 
when i added below code:
 
HttpQueryInfo (hRequest,HTTP_QUERY_RAW_HEADERS_CRLF, (LPVOID) lpHeadersA,
&dwSize, NULL);
MessageBoxW(NULL,(LPWSTR)lpHeadersA,L"Type",MB_OK);
 
It giving output:
 
HTTP/1.1 200 OK
Date: Mon, 05 Nov 2012 03:55:10 GMT
Server: Apache
Keep-Alive: timeout=5, max=74
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
Posted 1-Nov-12 21:23pm
Edited 4-Nov-12 19:18pm
v8
Comments
Jochen Arndt at 2-Nov-12 4:33am
   
You are using CA2W() to convert the received data. This will only work if the data is ANSI encoded using the same code page as your application. But many web pages use UTF-8 encoding nowadays. So you must check which encoding is used by the requested web page (usually part of the HTML header) and convert this to Unicode or the encoding of your application.
 
To avoid mixing of ANSI/Multi-Byte and Unicode strings in your application, it should be a Unicode application (I think it is already because otherwise you would get no connection when passing wide strings to the ANSI versions of the InternetOpen() functions).
venkat.yva at 2-Nov-12 23:52pm
   
This is my URL
http://convert.wajihah.com/c.php?varname=artxt&text=يبتىلمينبىمنيبغعاanand123
i got it is of chunked-encoding But cant say exactly..plz can you do the code.
venkat.yva at 2-Nov-12 6:59am
   
Can please tell me, what is the Method is used to check which encoding is used by the requested web page.
Mohibur Rashid at 3-Nov-12 0:58am
   
Reading header. Header must have to tell the encoding type
venkat.yva at 3-Nov-12 1:49am
   
This is my URL
http://convert.wajihah.com/c.php?varname=artxt&text=يبتىلمينبىمنيبغعاanand123
i found it is of chunked-encoding from headers...and still cant able to convert and display in MessageBox..plz can you solve the code.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

If the text from the requested web site uses UTF-8 encoding, it must be converted to show it in a mesage box:
 
// Use char here. InternetReadFile() reads bytes.
char buffer[100];
DWORD bytesRead;
InternetReadFile(hRequest, buffer, sizeof(buffer), &bytesRead);
 
// Get size of converted string including terminating NULL byte.
// With UTF-8 we may omit this and use bytesRead + 1 as size
//  because the resulting string would not have more than bytesRead wide
//  characters.
int nSize = ::MultiByteToWideChar(CP_UTF8, 0, buffer, bytesRead + 1, NULL, 0);
if (nSize)
{
    LPWSTR lpszText = new WCHAR[nSize];
    // Convert UTF-8 string to wide string
    ::MultiByteToWideChar(CP_UTF8, 0, buffer, bytesRead, lpszText, nSize);
    // Terminate string. buffer may be not NULL terminated.
    lpszText[nSize - 1] = L'\0';
    MessageBoxW(NULL, lpszText, L"Check", MB_OK);
    delete [] lpszText;
}
If the page uses other encodings, pass this instead of CP_UTF8 (e.g. 28596 with ISO-8859-6, see Code-Page Identifiers[^] in the MSDN).
  Permalink  
Comments
venkat.yva at 3-Nov-12 7:57am
   
Code-Page Identifiers[^] in the MSDN). not working to display arabic characters.
is there any another way
Sergey Chepurin at 3-Nov-12 9:00am
   
Do you use VS2010 to compile?
venkat.yva at 4-Nov-12 23:09pm
   
yeah, in visual studio 2010.
Sergey Chepurin at 9-Nov-12 6:46am
   
venkat.yva: Sorry for the late answer, but it took me some time to understand what you really want. I just checked the code from Baracat.S and it works fine with your site. It prints proper text in Arabic. I guess, the answer given by Jochen Arndt is also coded correctly (simply didn't check it). I could add the C++11 almost universal solution, but don't see any need in that after solutions provided work fine.
Jochen Arndt at 3-Nov-12 9:34am
   
The link posted in your comments does not contain any headers. It is just some PHP generated output using UTF-8 encoding. So using my code, you should see the content.
venkat.yva at 4-Nov-12 23:38pm
   
i am not getting the exact output, when i am using arabic related identifiers from Code-Page Identifiers[^] in the MSDN..i am getting different type of characters(nearly like arabic) but not showing exact output what i have given in my URL.
please check the above code using my URL.
Jochen Arndt at 5-Nov-12 3:24am
   
You are right. It is not the same. But it is UTF-8. When you save the output to a file and open it with a hex editor, you will see that the codes are in the UTF-8 range EF BA xx and EF BB xx (Unicode code points U+FE7x to U+FEFx). These codes are from the 'Arabic Presentation Forms-B' range.
 
So the PHP script performs some sort of conversion.
venkat.yva at 5-Nov-12 6:19am
   
i do want to do using C++ and win32 only....with out PHP...is it possible? can you send me any process for that.
Jochen Arndt at 5-Nov-12 6:39am
   
You can't do anything else on the C++ side.
 
My code is showing the same text as shown by a web browser. It is the text produced by the PHP script running on the web server.
 
The web site you are using is a converter: It is intended that the output is not the same as the input.
 
At least you should make clear what you want to do. But this would be probably a new question.
 
The original question is answered:
You no longer have garbage displayed but the returned text.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

I made some modifications on the code you posted, I assumed that your web page uses utf-8:
 
Test web application:
 
#!python
# -*- coding: utf-8 -*-
from bottle import *
 
content_type = 'text/html; charset=utf-8'
 
@route("/ar/")
def ar():
    response.content_type = content_type
    return "تجربة"
 
@route("/en/")
def en():
    response.content_type = content_type
    return "Test"
 
@route("/zn/")
def zn():
    response.content_type = content_type
    return "测试"
 
run(port=8080)
 
The code:
 
#include <windows.h>
#include <wininet.h>

#pragma comment(lib, "wininet.lib")
 
INT WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nShowCmd)
{
    HINTERNET hInternet = InternetOpen(TEXT("foo"), INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
 
    HINTERNET hConnect = InternetConnect(hInternet, TEXT("http://localhost:8080/"), INTERNET_DEFAULT_HTTP_PORT, NULL, NULL, INTERNET_SERVICE_HTTP, 0, 1);
 
    DWORD options = INTERNET_OPTION_HTTP_DECODING | INTERNET_FLAG_NEED_FILE | INTERNET_FLAG_HYPERLINK | INTERNET_FLAG_RESYNCHRONIZE | INTERNET_FLAG_RELOAD;
 
    HINTERNET hRequest  = InternetOpenUrl(hInternet,  TEXT("http://localhost:8080/ar/"), NULL, 0, options, 0); 
 
    BYTE buffer[100] = {0};
    TCHAR szResp[100] = {0};
    DWORD bytesRead;
 
    InternetReadFile(hRequest, &buffer[0], sizeof(buffer) - 1, &bytesRead);
 
    MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, (LPCSTR) &buffer[0], sizeof(buffer), &szResp[0], sizeof(szResp));
 
    MessageBox(NULL, &szResp[0], TEXT("Check"), MB_OK);
 
    InternetCloseHandle(hRequest);
    InternetCloseHandle(hConnect);
 
	return 0;
}
 
test
 
Note, make sure you handle the errors.
  Permalink  
Comments
venkat.yva at 4-Nov-12 23:41pm
   
i am not getting the exact output,showing garbage values.
please check the above code using my URL.
venkat.yva at 4-Nov-12 23:48pm
   
//when i added below code in my code
 
HttpQueryInfo (hRequest,HTTP_QUERY_RAW_HEADERS_CRLF, (LPVOID) lpHeadersA,
&dwSize, NULL);
 
//It showing like this
 
HTTP/1.1 200 OK
Date: Mon, 05 Nov 2012 03:55:10 GMT
Server: Apache
Keep-Alive: timeout=5, max=74
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
venkat.yva at 4-Nov-12 23:51pm
   
and observe plz there, there is NO "charset=utf-8".
when i am using another URLs showing charset=utf-8.
Barakat S. at 5-Nov-12 7:09am
   
Your page encoding is "ISO-8859-1", you didn't specify any encoding. Check ISO-8859-1 - Latin 1 : http://www.terena.org/activities/multiling/ml-docs/iso-8859.html#ISO-8859-1
 
To make it UTF-8, add the flowing in top of your php file:
 
header("Content-type: text/html; charset=utf-8");
 
for example:
 
<?php
header("Content-type: text/html; charset=utf-8");
 
if( isset($_GET["name"]) ) {
echo "name = " . htmlspecialchars($_GET["name"]);
} else {
echo "name = None";
}
 
?>
 
It should work fine.
venkat.yva at 14-Nov-12 2:18am
   
i dont have any PHP file to add your code...so i want total code in win32.
Sergey Chepurin at 9-Nov-12 6:46am
   
venkat.yva: Sorry for the late answer, but it took me some time to understand what you really want. I just checked the code from Baracat.S and it works fine with your site. It prints proper text in Arabic. I guess, the answer given by Jochen Arndt is also coded correctly (simply didn't check it). I could add the C++11 almost universal solution, but don't see any need in that after solutions provided work fine.
venkat.yva at 14-Nov-12 2:16am
   
Barakat S. given PHP code to do arabic conversion...that is may work..but i want total code in win32 only.
Sergey Chepurin at 14-Nov-12 13:03pm
   
If you hardcode URL of your site (http://convert.wajihah.com/c.php?varname=artxt&text=يبتىلمينبىمنيبغعاanand123 ) in InternetConnect() ant InternetOpenUrl() in the given code insted of localhost, you will get the message "artxt=٣٢١dnanaﺎﻌﻐﺒﻴﻨﻣﻰﺒﻨﻴﻤﻟﻰﺘﺒﻳ&done=2" printed in Message Box. PHP code is nice (but not necessary) addition from Baracat.S. Create sample Windows 32 application in VC++2010 and add this code in proper place.
venkat.yva at 20-Nov-12 4:58am
   
no im not getting the result that you posted... can you please send me the code what you have used.
Sergey Chepurin at 20-Nov-12 16:01pm
   
It does not work this way. I checked the solution and it works, then i told you how it can be done but you should code it yourself.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

just add the below statement in your header
 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 
and if you are reading this page with your c++ then first read the header i.e. http header, it will tell you what encoding you need to follow then simply convert entire page to unicode(wide character). it will be easy for you to manipulate the text. otherwise you will have to follow special method to manipulate.
  Permalink  
Comments
venkat.yva at 3-Nov-12 6:05am
   
I am using visual studio 2010.. so not accepting that code which you have given.
And i am just one month experience on software field..so cant able to understand clearly.
so can u plz send me the complete code..And i already found that URL giving chunked-encoding response.
Mohibur Rashid at 3-Nov-12 6:07am
   
this code is not for visual studio. this code is for html............
 
follow jochen arndit answer
venkat.yva at 3-Nov-12 6:11am
   
im doing in visual studio 2010 only...im struggling to get the out put.. plz requesting you to do the code in visual studio 2010.
venkat.yva at 3-Nov-12 6:14am
   
now i just want the code to transfer chunked-encoding response to normal form to display in MessageBox.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 280
1 Sergey Alexandrovich Kryukov 279
2 CPallini 205
3 Maciej Los 162
4 Afzaal Ahmad Zeeshan 160
0 OriginalGriff 5,635
1 DamithSL 4,496
2 Maciej Los 3,942
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,180


Advertise | Privacy | Mobile
Web01 | 2.8.141216.1 | Last Updated 2 May 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100