Click here to Skip to main content
Click here to Skip to main content

Fun with Google Text To Speech eBook reader using minimalistic approach

By , 9 Feb 2013
 

GoogleTTS-Ebook-Reader/GoogleTTS2.jpg

Introduction

It's interesting to see how much free tools Google started to provide to people thanks to it's massive cloud computing capability.

I was so blown away by sound quality that I created this simple program to read my favorite eBooks aloud. So far there are 7 different language voices in exceptional quality

English French Italian Spanish German Czech Haitian-Creole Hindi 

And unfortunately 27 [sub-par] quality voices that got recently integrated via 3th party OpenSource ESpeech engine.

Afrikaans, Albanian, Catalan, Chinese (Mandarin), Croatian, Danish, Dutch, Finnish, Greek, Hungarian, Icelandic, Indonesian, Latvian, Macedonian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Swahili, Swedish, Turkish, Vietnamese, Welsh.

Google keeps replacing them with more quality versions as time progresses. As was lately added Czech language. 

Let Google TTS say your text of chosen language via simple URL

http://translate.google.com/translate_tts?tl=en&q=hello+world  

Yes. It's the same service that is integrated to Google's Android and powers pronunciation in Google's translate.

Anyway even though it's web based service. It's free and it's sending you back mp3 with TTS that is for some languages light years ahead of most paid for TTS engines.

Let Google Translate detect language of your text 

http://translate.google.com/translate_a/t?client=t&sl=auto&text=hello+world 

What we receive is detected language which we in turn use to let TTS know which voice we wana hear. notice sl=auto. it denotes "source language" parameter autodetection

Yes google language detection is often unreliable as you can see on official google translate page. So you better set language in your app manually but it's interesting feature to test anyway.      

Code 

The Code is slightly larger because we need to detect language per line + split text to max 100 char chunks and send it as URL encoded HTTP GET request. Google sends back mp3 file which we stream as it's received thanks to DirectShow streaming nature and installed mp3 codec. This is minimal sample so you can focus on how it works. Unimportant code like hook is folded in snippet form but feel free to unfold and format the code in the way you like. Replace all static buffers if you plan to use code safely plus cleanup and more robust error handling was left out s you can focus on important parts but still a lot of fun.

So enjoy ;)

#include <windows.h>
#include <shlwapi.h>
#include <Richedit.h>
#include <dshow.h>
#include <winsock.h>

#pragma comment(lib,"Strmiids.lib")
#pragma comment(lib,"Shlwapi.lib") 
#pragma comment(lib,"wsock32.lib")

#define DsHook(a,b,c) if (!c##_) { 
               INT_PTR* p=b+*(INT_PTR**)a;  VirtualProtect(&c##_,4,PAGE_EXECUTE_READWRITE,&no);
                      *(INT_PTR*)&c##_=*p;  VirtualProtect(p,4,PAGE_EXECUTE_READWRITE,&no);  *p=(INT_PTR)c; }

HRESULT ( __stdcall * SyncReadAlligned_ ) ( void* inst, IMediaSample *smp ) ; HANDLE out;
HRESULT   __stdcall   SyncReadAlligned    ( void* inst, IMediaSample *smp ) {	
    HRESULT ret =     SyncReadAlligned_   ( inst, smp );
    BYTE*   buf;      smp->GetPointer(&buf); 
    DWORD   len =     smp->GetActualDataLength(),no;	WriteFile(out,buf,len,&no,0);
    return  ret;  
}

int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show) {
    MSG msg={0}; WSADATA wsa; DWORD no; HRESULT hr; 

    CoInitialize(0);   WSAStartup(MAKEWORD(1,1),&wsa);   LoadLibraryA("RichEd20"); 

    // connect to google translate for text language autodetection
    SOCKET s=socket(AF_INET,SOCK_STREAM,IPPROTO_TCP);  sockaddr_in addr={AF_INET,htons(80)};
    HOSTENT* dns=gethostbyname("translate.google.com"); memcpy(&addr.sin_addr.s_addr,dns->h_addr,4);
 
    if(connect(s,(sockaddr*)&addr,sizeof(addr)) != 0)  return 0;

    HWND hwnd = CreateWindowA("RICHEDIT20W",0,WS_SIZEBOX|ES_MULTILINE|WS_VISIBLE|ES_AUTOVSCROLL|ES_AUTOHSCROLL|WS_SYSMENU|WS_CAPTION|WS_MINIMIZE|WS_HSCROLL|WS_VSCROLL,500,500,500,300,0,0,0,0);

    while ( IsWindowVisible(hwnd) ) {
        if( PeekMessage(&msg,0,0,0,1) ) { TranslateMessage( &msg ); DispatchMessage( &msg ); }
        if( msg.wParam==VK_RETURN && msg.message == WM_KEYDOWN ) {

            DWORD  len =  2+GetWindowTextLength(hwnd)*2; CHARRANGE ch={0,-1}; SendMessage(hwnd,EM_EXSETSEL,0,(LPARAM)&ch);
            WCHAR* Txt =  (WCHAR*)calloc(len,1),*e,*txt=Txt; SendMessage(hwnd,EM_GETSELTEXT,0,(LPARAM)Txt); ch.cpMin=-1; 
                                                             SendMessage(hwnd,EM_EXSETSEL,0,(LPARAM)&ch); 
                   out =  CreateFile("c:/out.mp3",GENERIC_WRITE,FILE_SHARE_READ,0,CREATE_ALWAYS,0,0);

            while(*txt) { 
                // since sended text can not be larger than 100 we try to break sentences
                if((e=wcschr(txt,L'.')))                              *e=0; 
                if(wcslen(txt)>100 &&(e=wcschr(txt,L',')))            *e=0;
                if(wcslen(txt)>100) { e=txt+100; while(*e!=L' ') e--; *e=0; }
                
                // detect language by asking google translate service so we can switch voice language per sentence as needed                
                char utf[1000],esc[1000]={0},*a,*b=utf; WideCharToMultiByte(CP_UTF8,0,txt,-1,utf,1000,0,0); 
                while(*b) sprintf(esc+strlen(esc),"%%%0.2x",*(BYTE*)b++); txt+=wcslen(txt)+1; //escape utf-8 chars

                char buf[1000]; sprintf(buf,"GET /translate_a/t?client=t&sl=auto&text=%s HTTP/1.1\r\nUser-Agent: Mozilla/5.0\r\n\r\n\r\n\r\n",esc);
                send(s,buf,strlen(buf),0);    // we send text sentence to google translate server
                recv(s,buf,sizeof(buf),0);    // and receive detected language
                char lng[3]={"en"}; if((a=strstr(buf,"]],,\""))) memcpy(lng,a+5,2);	

                // This triplet with RenderFile is all you need to play anything with aprropriate codec on windows. 
                IGraphBuilder* graph= 0; CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
                IMediaControl* ctrl = 0; graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
                IMediaEvent*   event= 0; graph->QueryInterface( IID_IMediaEventEx, (void **)&event ); 

                // This sends text (sentence) encoded in get request  and progressively plays mp3 stream from google as it is received. 
                // So all TTS is done on server and this is only work that client does
                WCHAR url[1000];     wsprintfW(url,L"http://translate.google.com/translate_tts?tl=%S&q=%S",lng,esc); 
                if((hr=ctrl->RenderFile(url))) continue;   

                // we hook the source filter and append to global mp3 file on disk
                IBaseFilter*  filter;  graph->FindFilterByName(url,&filter);
                IPin*         pin;     filter->FindPin(L"Output",&pin); 
                IAsyncReader* reader;  pin->QueryInterface(IID_IAsyncReader,(void**)&reader);
                
                //  redirect  7th member func of IAsyncReader (SyncReadAlligned) to grab mp3 data from output pin of source filter
                DsHook(reader,6,SyncReadAlligned);

                // we run and wait for mp3 to finish before we ask another sentence
                hr=ctrl->Run(); long code=0,c; 
                while( code != EC_COMPLETE ) { 
                    if( PeekMessage(&msg,0,0,0,1) ) { TranslateMessage( &msg ); DispatchMessage( &msg ); } event->GetEvent(&code, &c, &c, 0); 
                    Sleep(1); 
                } 

                ctrl->Release(); event->Release(); graph->Release();
            } 
            free(Txt); 
            CloseHandle(out);
        }
    }
} 

Points of Interest

Notice that we are passing web address directly to DirectShow. RenderFile() call actually generates whole graph including stream splitter, mp3 decoder and output to sound device.

GoogleTTS-Ebook-Reader/graph.JPG

This simple trick allows us to for example listen to internet radios etc without much work. It requires you to have at least some mp3 codec installed. Which most of you probably have. And if not. Then install ffdshow which is free multicodec that plays pretty much everything you throw at it.

Another thing is that the correct way to grab received data would be to implement and connect sample grabber filter between src and splitter filter. But Then that would require you to use DirectShow SDK which is pretty complicated thing to make compilable and hardly beats implementing just one procedure.

Unknown languages are not played but you can make a lot of substitutions like let's say "nl" do "de" etc. Mix sentences in different languages just for fun ;)

History

  • 28.3 first version
  • 31.3 combo box for manual language selection replaced by automatic language detection (per sentence)
  • 3.4 added capturing of received stream to mp3 file on disk
  • 15.5 added info that 29 new languages are synthesized now. Poor quality thou.
  • 16.5 changed code to Unicode and uploaded fixed exe so Chinese Hindi etc works now
  • 2.6.2011 uses DNS instead of IP + updated language detection to reflect Google changes. added source to zip

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Ladislav Nevery
Software Developer (Senior)
Slovakia Slovakia
Member
Past Projects:
[Siemens.sk]Mobile network software: HLR-Inovation for telering.at (Corba)
Medical software: CorRea module for CT scanner
[cauldron.sk]Computer Games:XboxLive/net code for Conan, Knights of the temple II, GeneTroopers, CivilWar, Soldier of fortune II
[www.elveon.com]Computer Games:XboxLive/net code for Elveon game based on Unreal Engine 3
ESET Reasearch.
Looking for job

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralMy vote of 5memberFidel Pérez22 May '13 - 3:14 
Very convenient, light and efficient script to add to any other program!!
SuggestionAmazing article, Thanks!! [modified]memberFidel Pérez21 May '13 - 23:07 
Thank you very much for this article and code, I find it incredibly useful.
 
I dont know C but I would love to be able to use it with the following command line through DOS:
 
googletts.exe -f TXTfileName.txt -o Mp3OutputName.mp3 -Reading_Speed
 
So it will read a txt and then output the mp3 (and play it as it does now) in the user wanted speed. This would make it an incredible addon for any program done with any other programming language!! Could you please do the change? If you are not, I will do and upload, but will take me a *bit* more since im not familiar with the code or with C programming Smile | :) Just tell me if you dont have time to make that change and I will start to study on how to do it.
 
Very much appreciated!!

modified 3 days ago.

GeneralRe: Amazing article, Thanks!!memberLadislav Nevery23 May '13 - 0:55 
Hi Fidel.
I am glad that article was usefull for you.
unfortunately I hava no time for such changes as changing speed.
But due to plugin likearchitecture of directshow and plethora of free directshow processing filters floating around such idea should not be hard. just search in google "free directshow speed" returns as first hit http://reclock.free.fr/[^] just connect it and grab from its otput and you are set. as how to do it use google "msdn connect directshow filter" . google is your fiend Wink | ;)
"There is always a better way"

GeneralMy vote of 5 [modified]memberplagwitz24 Feb '13 - 15:44 
This is great, thanks. To produce reusable language learning materials, I managed to combine and save the streamed mp3 (http://thomasplagwitz.com/2013/02/24/automating-language-learning-material-creation-with-google-translate-text-to-speech/).
 
But now Google's error-prone language   detection trips me up: Some non-English chunks always get treated (pronounced) as if they were English (here is an example with partially   misrecognized French:   http://goo.gl/Yw8V3).
 
So, as I assumed, I could just hard-code the language selection into the source and recompile, pretty bad hack, but results are much better this way: http://goo.gl/RcLUd
 
Thanks again,

-- modified 25 Feb '13 - 21:42.
GeneralMy vote of 5membereslipak12 Feb '13 - 10:43 
Excellent article. But Avast has flagged the exe as a virus container. I want to compile the source. What compiler was used? Anyway, an excelent work.
GeneralRe: My vote of 5memberLadislav Nevery12 Feb '13 - 22:37 
Thx for vote Smile | :) as for av flagging small code as a virus Maybe this article will allow some insight what is going on within av industry [^] as for compiller i dont remember but i guess vs2008? right now i am on vs 2012
"There is always a better way"

GeneralRe: My vote of 5membereslipak13 Feb '13 - 2:56 
OK. Sometime ago, the AVG marked as a virus container an EXE just compiled from C with LCC32 compiler from Joseph Navia. This was the reason for switching to Avast.
So, you are not alone.
Best wishes.
GeneralMy vote of 4memberPhat (Phillip) H. VU29 Dec '12 - 14:16 
Good job.Keep going on.
QuestionMy vote 5memberserega4673 Apr '12 - 23:54 
Great mini approach.
http://www.haxx.lv/

Generalany service for OCR by Google ?memberMember 15084529 Jun '11 - 0:59 
Thank you for this software which works fine. A complementary function could be to do OCR (Optical Character Recognition), allowing to read a text from a scanned document. I know that there is a Google service to do OCR for generating Google Docs, but does it exist a service where one could provide an image and get back an unformatted file of text ?
GeneralRe: any service for OCR by Google ?memberLadislav Nevery15 Jun '11 - 8:10 
Check this out http://code.google.com/intl/sk-SK/apis/documents/docs/3.0/developers_guide_protocol.html#DownloadingDocuments[^]
As you can seeyou upload img/pdf an can chose raw txt as output. Hope it helps Smile | :)
"There is always a better way"

GeneralArticle fixedmemberLadislav Nevery2 Jun '11 - 10:29 
Sorry guys that I didn't had time to fix minor changes on google side sooner (IP and lng detection).
New exe + cpp also attached.
Take care.
"There is always a better way"

GeneralMy vote of 4memberSergey Chepurin17 May '11 - 23:13 
4 just because now the service is closed and program does not work anymore.
GeneralProgram doesn't work with Windows XP!!!memberTheEvilGerman23 Jan '11 - 2:55 
The program starts and shows up in the task manager, but doesn't do anything.
GeneralRe: Program doesn't work with Windows XP!!!memberSergey Chepurin17 May '11 - 23:12 
Because Google closed (changed) the access to translation services.
Use official Google page.
Sergey Chepurin.
GeneralLanguage detection brokenmemberxtract1 Jul '10 - 4:55 
a=strstr(buf,"src\":") always fails.
The language id is there but like this:
...
[[["a small","eine kleine",""]],,"de"]
...
 
This works for me:
 
if(recv(s, buf, sizeof(buf), 0) > 0 && (a = strrchr(buf, ']')) && a - buf >= 3) strncpy(lng, &a[-3], 2);
GeneralCode in C#memberNitin Sawant16 May '10 - 23:15 
Is there any similar library available in C#?
============================================
The grass is always greener on the other side of the fence

QuestionReverse way, STT possible?memberTSchind12 Apr '10 - 22:17 
I need the reverse way, Speech to text. (I am DEAF)
How to reach this with Google(Youtube) Real-Time Speech To Subtitle machine Service in minimalist approach?
AnswerRe: Reverse way, STT possible?memberLadislav Nevery13 Apr '10 - 1:39 
Hmm interesting idea.
"There is always a better way"

GeneralRe: Reverse way, STT possible?memberphildal22 Jun '10 - 4:36 
This is known as Voice Recognition Software, like "Dragon Naturally Speaking".
AnswerRe: Reverse way, STT possible?memberhobnob5 Jun '11 - 4:35 
Microsoft Speech API can do that.
GeneralNorton Fixed their False PositivememberLadislav Nevery9 Apr '10 - 20:49 
They do not label it as suspicious anymore. I guess they did read my rant article that I created for this special purpose ;D Moreover they seem to had scrapped the whole suspicious detection thing globally
"There is always a better way"

GeneralNorton is blocking thismemberMark C. Malburg6 Apr '10 - 1:51 
My Norton security software is blocking this app. Why is that the case? Should I ignore Norton and run the app anyway?
GeneralRe: Norton is blocking this [modified]memberLadislav Nevery6 Apr '10 - 3:07 
I see. norton cloud based scan marks this as "suspicious-insight"
The problem is so serious that I even made article about it where developers can discuss solutions to false AV positives

The Case of Evil WinMain[^]
 
But here is short form of it. In an attempt to find and remove part of code that he finds ehm ... suspicious I started removing parts of code a and started to scan it by 41 antiviruses on virustotal.
And guess what ! I ended up with this code being still blocked by symantec as "suspicious-insight"
#include <windows.h>

int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show) {
    return 0;
}
 
YES empty winmain. Go ahead make empty c++ project in visualstudio add cpp with empty main code above build with static libc and check it on virustotal.
You will be surprised. nearly THIRD of antiviruses will go berserk running around and screaming all kinds of scary malware names at your direction.
This is kinda sad isn't it ?
I guess that small and efficient code without 10Mb runtime junk linked to it is nowadays not only considered suspicious but straight dangerous. ;D
On one hand quality of those programs is questionable on the other hand I wasn't expecting such obvious false pos from symantec.
Since tis is kinda serious allegation here is exe that I posted to vtotal
http://sites.google.com/site/minimalistscorner/EmptyWinmainFalsePos.zip?attredirects=0&d=1[^] along with it's source and antivirus scan results
http://www.virustotal.com/analisis/fad21b2be19f5619bbe538736ea720ff91ecc1f991087caa9d2cecb34d29c8ce-1270576549[^]
"There is always a better way"
modified on Friday, April 9, 2010 9:15 AM

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web01 | 2.6.130523.1 | Last Updated 10 Feb 2013
Article Copyright 2010 by Ladislav Nevery
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid