Click here to Skip to main content
Click here to Skip to main content

Text to Speech (tts) for the Web

, 11 Aug 2012
Rate this:
Please Sign up or sign in to vote.
This article discusses text to speech conversion for web applications.

Introduction

This article will show you how to perform text-to-speech (tts) conversion from your web application. It will discuss the different approaches to accomplish this task. It will provide practical implementations of code and their explanation that will make it happen. The code presented in this article is focused on showing  you how to perform text to speech conversion using different approaches and playing the speech(audio) on the webpage and/or saving that audio on your server (local/production). 

This article does not discuss the technology used in the process of conversion of text to speech, rather it discusses upon how to use the existing tts technologies for the open web environment.   

Demonstration

Motivation 

I am posting this article in regards to a comment/question posted on my article by Arun Bala Rai:

http://www.codeproject.com/Articles/249239/SPEECH-RECOGNITION-FOR-THE-WEB 

Question : Is it possible to convert the text in voice and save the same with .mp3 extension in Html5?

Answer : You can do this by two approaches : 

1)server side - you can send your text to the server and there you can forward the text to an app. that does the tts for you and saves the resultant mp3 file and then you can warp all that in a html page and sent back to the user.
(This is an abstract way, mostly will require a local web server or a dedicated hosting server to configure a custom app to handle such text and result)

2)Client side - Some of the ways to do that would be if you could take the text through javascript and just create on the fly audio for it using HTML5 audio synthesis (it is dev phase moz and chrome support it to some extent).

2-a)Or you can write some client side browser extension to do your job. 

State of TTS in HTML Specification 

The draft of HTML Text to Speech (TTS) API Specification is located here.

The way to use tts in a web page as proposed in the draft specification is : 

<tts autoplay value="hello world"> "hello world" is spoken when the page has loaded. 

The way to interact with the tts element using javascript is proposed as:

<span style="background-color: transparent; "><tts id="say" lang="es"></span> 
<script>
var tts = document.getElementById('say');
tts.value = "hello world";
tts.play(); 
</script>  

Lets Get Started!  

What is TTS? 

Text To Speech (abbreviation: TTS), also called "Speech synthesis" is the artificial production of human speech. It  is the digitized audio rendering software that  converts text into speech. The computer system used to produce TTS is called a speech engineA Text to Speech engine transforms any text into speech in real time. It literally reads out loud any written information with a smooth and natural sounding voice. The automatic intonation reflects the meaning of the text, with respect to pauses, breath groups, punctuation and context. 

A sample visual showing how a typical TTS engine works: 

Approaches 

I will categorize the approaches used for implementing tts for the web in two categories based on the location of the tts engine:

Server Side Approach : In this approach, the tts engine (the system that converts text to speech) is located at a remote server. The client (in our case the web browser) sends the text inputted by the user to the tts engine at the remote server, the tts engine at the remote server does the text to speech conversion and returns the output speech/audio to the client. The output speech/audio could be saved or played on the client for the user

 

Client Side Approach :  In this approach, the tts engine is located at the client itself. The text inputted by the user to the client is converted to speech by the tts engine at the client itself. The output speech/audio could be saved or played on the client for the user.

 

Solutions based on server side approach 

1. Google Translate :

Google Translate has a new feature which enables the users to listen to the text written by them through the text to speech conversion process. There are multiple language sets available for speech depending on the translated language. It is limited to a maximum of 100 characters. To see a demonstration of this feature visit Google Translate

The screenshot above shows the Google translate tts feature. Clicking on the "LISTEN" button converts the text "Hello World" into speech and outputs that audio through the speakers of the client. 

Lets have a look what happens when we click the "LISTEN" button.

The screenshot above shows the underlying HTTP request and response that happens when the "LISTEN" button is clicked. When the "LISTEN" button is clicked an AJAX call is made that causes the HTTP request and response shown above.

Lets see what happens in the HTTP request. The HTTP request headers show that it is a HTTP GET request. The Request Url used in this request is http://translate.google.com/translate_tts?ie=UTF-8&q=HELLO%20WORLD&tl=en&total=1&idx=0&textlen=11. This request url shows that http://translate.google.com/translate_tts is the service that does the text to speech conversion for us and the required parameters it is sent are q=HELLO%20WORLD and tl=en. This can be easily be verified if you directly enter the below url in your address bar 

http://translate.google.com/translate_tts?q=TEXTTOSPEECH&tl=en 

Now lets see what HTTP response do we receive from this service. As evident from the screenshot above, the Response Headers section shows that content of MIME type audio/mpeg is returned from the server which in easy terms means a mp3 audio file is returned from that tts service.

Now let us see how we can use this Google's tts service for our use in our web pages.

<!DOCTYPE HTML>
<html>
<body>
<h1>In this demonstration:<br />
>tts is done on server side (i.e by using Google-translate server)<br/>
>then the audio received from Google-translate server is saved on your server (local/production)<br />
>and then that saved audio is played through that saved file on this webpage.</h1>
<h3>
Tested with:
Chrome v21 [Working],
Firefox v14 [Not Working, firefox does not support mp3 audio format playback],
IE v9[Working]
</h3>
<hr />
<form method="POST" style="font-size:25px">
Text to convert : <input name="txt" type="text" /><br />
Filename to save (without the extension) : <input name="filename" type="text" /><br />
Convert text to speech : <input name="submit" type="submit" value="Convert" />
</form>

<?php
if (isset($_POST['txt']) && isset($_POST['filename']))
{
	$text=htmlentities($_POST['txt']);
	$filename=$_POST['filename'].'.mp3';
	
	$querystring = http_build_query(array(
		//you can try other language codes here like en-english,hi-hindi,es-spanish etc
		"tl" => "en",
		"q" => $text
	));
	
	if ($soundfile = file_get_contents("http://translate.google.com/translate_tts?".$querystring))
	{
		file_put_contents($filename,$soundfile);
		echo ('
			<audio autoplay="autoplay" controls="controls">
			<source src="'.$filename.'" type="audio/mp3" />
			</audio>
			<br />
			Saved mp3 location : '.dirname(__FILE__).'\\'.$filename.'
			<br />
			Saved mp3 uri : <a href="'.$filename.'">'.$_SERVER['SERVER_NAME'].'/webtts/'.$filename.'</a>'
		);
	}
	else echo("<br />Audio could not be saved");
}
?>

The code above is from the file named "tts_serverside_google.php" which can be downloaded from the Downloads section on the top of this page. Note that this code uses a server side proxy (php proxy) this is one way of doing it.

Let us now discuss how this code works, as we have seen earlier that if want our text converted to speech we have to send the text as parameter to the Google's Translate service like this : http://translate.google.com/translate_tts?q=TEXTTOSPEECH&tl=en and we will get speech/audio back from google. So the code is pretty easy to understand, the main important line in the code that gets our mp3 audio from google is : $soundfile = file_get_contents("http://translate.google.com/translate_tts?".$querystring). The audio content returned is saved in the $soundfile variable and then the contens of this variable are dumped into a file : file_put_contents($filename,$soundfile);. Now since we have now saved the mp3 file at our server, we create a HTML5 audio tag and appropriately point its src attribute to the file we just saved and also correctly set the type to type=audio/mp3. 

2.  SPEECHUTIL.COM

Speechutil.com offers text to speech conversion service for no charge. It offers a similar solution like the one mentioned above, it provides a url for the tts service to which we can pass parameters using the HTTP GET method and it returns our speech/audio. It can return wav and ogg audio file formats. For more information about it, visit http://speechutil.com/

The TTS Request Url in case of speechutil is http://speechutil.com/convert/ogg?text='Hello World'. The tts service url used here is http://speechutil.com/convert/ogg and the parameter passed to it is text='Hello World'. This service url returns audio of ogg file format, for returning wav file format use http://speechutil.com/convert/wav 

Now let us see how can we implement this in our web pages. 

<!DOCTYPE HTML>
<html>
<body>
<h1>In this demonstration:<br />
>tts is done on server side (i.e by using SpeechUtil server)<br/>
>then the audio received from speechutil.com is saved on your server (local/production)<br />
>and then that saved audio is played through that saved file on this webpage.</h1>
<h3>
Tested with:
Chrome v21 [Working],
Firefox v14 [Working],
IE v9[Not Working, IE does not support ogg audio format playback]
</h3>
<hr />
<form method="POST" style="font-size:25px">
Text to convert : <input name="txt" type="text" /><br />
Filename to save (without the extension) : <input name="filename" type="text" /><br />
Convert text to speech : <input name="submit" type="submit" value="Convert" />
</form>

<?php
if (isset($_POST['txt']) && isset($_POST['filename']))
{
	$text=htmlentities($_POST['txt']);
	$filename=$_POST['filename'].'.ogg';
	
	$querystring = http_build_query(array(
		"text" => $text
	));
	
	//for wav file format use http://speechutil.com/convert/wav? below
	if ($soundfile = file_get_contents("http://speechutil.com/convert/ogg?".$querystring))
	{
		file_put_contents($filename,$soundfile);
		echo ('
			<audio autoplay="autoplay" controls="controls">
			<source src="'.$filename.'" type="audio/ogg" />
			</audio>
			<br />
			Saved mp3 location : '.dirname(__FILE__).'\\'.$filename.'
			<br />
			Saved mp3 uri : <a href="'.$filename.'">'.$_SERVER['SERVER_NAME'].'/webtts/'.$filename.'</a>'
		);
	}
	else echo("<br />Audio could not be saved");
}
?>

The code above is from the file named "tts_serverside_speechutilcom.php" which can be downloaded from the Downloads section on the top of this page. The code mentioned above is mostly similar to the code discussed earlier, the main important line in the code that gets our ogg audio is : $soundfile = file_get_contents("http://speechutil.com/convert/ogg?".$querystring). The audio content returned is saved in the $soundfile variable and then the contens of this variable are dumped into a file : file_put_contents($filename,$soundfile);. Now since we have now saved the ogg file at our server, we create a HTML5 audio tag and appropriately point its src attribute to the file we just saved and also correctly set the type to type=audio/ogg.

3. TEXT2SPEECH.ORG

Text2speech.org offers text to speech conversion service for no charge. It is different from the solutions discussed above because it does not return the audio content directly instead it saves the converted audio on its server at a temporary location and provides the url of that saved audio. It uses the HTTP POST method to send request and parameters to the tts service. For more information about it, visit http://speechutil.com/ 

The TTS Request Url in case of text2speech.org is http://www.text2speech.org. The parameters are passed to it using the HTTP POST method. The parameters passed to it are : speech => text, voice=>nitech_us_rms_arctic_hts, volume_scale=>5, make_audio=>Convert Text To Speech. These four parameters are necessary for it to work. The voice parameter can be supplied with other options :"nitech_us_rms_arctic_hts","nitech_us_bdl_arctic_hts","nitech_us_slt_arctic_hts","nitech_us_awb_arctic_hts". The volume_scale parameter can have values ranging from 1 to 10

Text2speech responds to the request by outputting a HTML document which has among all other things the download link for the speech/audio resulted from the tts conversion. 

Now let us see how can we implement this in our web pages.

<!DOCTYPE HTML>
<html>
<body>
<h1>In this demonstration:<br />
>tts is done on server side (i.e by using text2speech.org server)<br/>
>then the audio received from text2speech.org is saved on your server (local/production)<br />
>and then that saved audio is played through that saved file on this webpage.</h1>
<h3>
Tested with:
Chrome v21 [Working],
Firefox v14 [Not Working, firefox does not support mp3 audio format playback],
IE v9[Working]
</h3>
<hr />
<form method="POST" style="font-size:25px">
Text to convert : <input name="txt" type="text" /><br />
Filename to save (without the extension) : <input name="filename" type="text" /><br />
Convert text to speech : <input name="submit" type="submit" value="Convert" />
</form>

<?php
if (isset($_POST['txt']) && isset($_POST['filename']))
{
	$text=htmlentities($_POST['txt']);
	$filename=$_POST['filename'].'.mp3';
		
	$postdata = http_build_query(
		array(
			"speech" => $text,
			//options for voice are:"nitech_us_rms_arctic_hts","nitech_us_bdl_arctic_hts","nitech_us_slt_arctic_hts","nitech_us_awb_arctic_hts"
			"voice"=>"nitech_us_rms_arctic_hts",
			//options for volume_scale are: 1 to 10
			"volume_scale"=>5,
			"make_audio"=>"Convert Text To Speech"
		)
	);
	
	$opts = array('http' =>
		array(
			'method'  => 'POST',
			'header'  => 'Content-type: application/x-www-form-urlencoded',
			'content' => $postdata
		)
	);
	
	$context  = stream_context_create($opts);
	
	if ($htmldocwithlink = file_get_contents("http://www.text2speech.org", false, $context))
	{
		$htmldoc = new DOMDocument();
		$htmldoc->loadHTML($htmldocwithlink);
		$soundfilelink=$htmldoc->getElementById('downloadlink')->getElementsByTagName('a')->item(0)->getAttribute('href');
		$soundfile=file_get_contents('http://www.text2speech.org/'.$soundfilelink);
		file_put_contents($filename,$soundfile);
		echo ('
				<audio autoplay="autoplay" controls="controls">
				<source src="'.$filename.'" type="audio/mp3" />
				</audio>
				<br />
				Saved mp3 location : '.dirname(__FILE__).'\\'.$filename.'
				<br />
				Saved mp3 uri : <a href="'.$filename.'">'.$_SERVER['SERVER_NAME'].'/webtts/'.$filename.'</a>'
			);
	}
	else echo("<br />Audio could not be saved");
}
?>

The code above is from the file named "tts_serverside_text2speechorg.php" which can be downloaded from the Downloads section on the top of this page. Let us now discuss how this code works:

1. We set the data (parameters and their values) to be posted to the tts request url.

$postdata = http_build_query(
		array(
			"speech" => $text,
			//options for voice are:"nitech_us_rms_arctic_hts","nitech_us_bdl_arctic_hts","nitech_us_slt_arctic_hts","nitech_us_awb_arctic_hts"
			"voice"=>"nitech_us_rms_arctic_hts",
			//options for volume_scale are: 1 to 10
			"volume_scale"=>5,
			"make_audio"=>"Convert Text To Speech"
		)
	); 

2. We set the HTTP request method.

$opts = array('http' =>
		array(
			'method'  => 'POST',
			'header'  => 'Content-type: application/x-www-form-urlencoded',
			'content' => $postdata
		)
	); 

3. We use file_get_contents function to send the request to the url and get the result back which is a html document containing the download link.

$htmldocwithlink = file_get_contents("http://www.text2speech.org", false, $context) 

4. Now we have to get the download link from the returned html document. For that we use the HTMLDocument class of PHP that will help us in DOM traversal to reach to the download link section in that document which is located in a <div> whose id="downloadlink".

$htmldoc = new DOMDocument();
	$htmldoc->loadHTML($htmldocwithlink);
		$soundfilelink=$htmldoc->getElementById('downloadlink')->getElementsByTagName('a')->item(0)->getAttribute('href'); 

5. After we get the download link we get the file from this download link using the file_get_contents function and then save the file.

$soundfile=file_get_contents('http://www.text2speech.org/'.$soundfilelink); file_put_contents($filename,$soundfile); 

Solutions based on client side approach

1. E-SPEAK

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. It has been developed in C/C++. For more details about espeak visit http://espeak.sourceforge.net. Here we will be using a port of the eSpeak speech synthesizer from C++ to JavaScript using Emscripten which enables text-to-speech conversion on the web using only javascript and HTML5.

The eSpeak javascript port gives us a javascript file named "speechGenerator.js" that contains the javascript code converted from C++ to javascript that does the work of text to speech conversion. One point is worth to note that this file is around 2.18 mb in size and may take a while to download on the web browser from a remote host. 

The speechGenerator.js file contains generateSpeech(text) function that takes the text to be converted to speech as an argument and return the output speech/audio in wav format as a Uint8Array javascript object

The javascript object containing speech wav data can be used to play the wav audio locally or could be sent to your local/production server for saving it as a wav audio file. 

Now let us see how can we implement this in our web pages.

<!DOCTYPE HTML>
<html>
<head>
<script type="text/javascript" src="speakGenerator.js"></script>
</head>
<body>
<?php
if (isset($_GET['filename']))
{
	$filename=$_GET['filename'].'.wav';
	
	if ($soundfile = file_get_contents('php://input'))
	{
		file_put_contents($filename,$soundfile);
		echo ('
		  <audio autoplay="autoplay" controls="controls">
		  <source src="'.$filename.'" type="audio/x-wav" />
		  </audio>
		  <br />
		  Saved mp3 location : '.dirname(__FILE__).'\\'.$filename.'
		  <br />
		  Saved mp3 uri : <a href="'.$filename.'">'.$_SERVER['SERVER_NAME'].'/webtts/'.$filename.'</a>'
		);
	}
	else echo("<br />Audio could not be saved");
	exit();
}
?>

The first important part of code we will discuss is this : 

function do_tts()
{
	text=document.getElementsByName('txt').item(0).value;
	if (text!="")
	{
		var bytearray = generateSpeech(text);
		if (document.getElementsByName('save').item(0).checked)
		{
		  var url='tts_clientside_espeak.php?filename='+document.getElementsByName('filename').item(0).value;
		  var xhr = new XMLHttpRequest();
		  xhr.open('POST', url, true);	  	
		  xhr.send(bytearray.buffer);	  
		  xhr.onload = function(e) {
			  if (this.status == 200) {
				  document.getElementById('player').innerHTML=this.response;
			  }
		  }
		}
		else
		{
			document.getElementById('player').innerHTML='<audio autoplay="autoplay" controls="controls" src="data:audio/x-wav;base64,'+encode64(bytearray)+'">';
		}
	}
} 

This javascript function do_tts() is called when the user clicks on the "Convert" button. This functions takes the text inputted by the user and sends it as an argument to the generateSpeech(text) function which converts that text to speech/audio in wav format and returns that data which is stored in bytearray variable. Now if the user has selected to save the file, the data in that variable is sent to the php script by an ajax call that saves that data into a .wav file else the data in the variable is directly embedded in a HTML5 <audio> tag using data-uri mechanism (data:xxxx).

When the user chooses to save the speech/audio the audio data is sent to the php script :

<?php
if (isset($_GET['filename']))
{
	$filename=$_GET['filename'].'.wav';
	
	if ($soundfile = file_get_contents('php://input'))
	{
		file_put_contents($filename,$soundfile);
		echo ('
		  <audio autoplay="autoplay" controls="controls">
		  <source src="'.$filename.'" type="audio/x-wav" />
		  </audio>
		  <br />
		  Saved mp3 location : '.dirname(__FILE__).'\\'.$filename.'
		  <br />
		  Saved mp3 uri : <a href="'.$filename.'">'.$_SERVER['SERVER_NAME'].'/webtts/'.$filename.'</a>'
		);
	}
	else echo("<br />Audio could not be saved");
	exit();
}
?> 

Here we get the data sent from the client by using file_get_contents('php://input'). Here php://input is a read-only stream that allows you to read raw data from the request body. And after the data is received, it is stored in the $soundfile variable and then it is saved to a file by file_put_contents($filename,$soundfile)

Other viable solutions

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Robin Rizvi
Software Developer Databorough India
India India
Currently working as software developer for Databorough India - Division of Fresche Legacy.
 
Developing for the open-source community and writing articles is my way of thanking the community. I have developed commercial as well as non-commercial/open-source projects for the web and windows as my work and hobby. Just trying very hard so that someday I could contribute a little for this world. I would like to send out my regards to all for your rating and comments because these comments keep me going. Thank you all.
 
Certifications:
Microsoft Certified Professional (Programming in C#)
Microsoft Certified Professional (Programming in HTML5 with JavaScript and CSS3)
 
GET IN TOUCH:
http://robinrizvi.info
http://blog.robinrizvi.info
 
If you wish to express your appreciation
Donate @ http://blog.robinrizvi.info
Follow on   Twitter   Google+   LinkedIn

Comments and Discussions

 
Questioncan i have the code for TTS(not for web but for other purpose) PinmemberPrince Jeelani20-Nov-13 4:55 
AnswerRe: can i have the code for TTS(not for web but for other purpose) PinmemberRobin Rizvi27-Nov-13 7:39 
Questiontext to speech PinmemberHighCommand13-Dec-12 6:58 
QuestionHow PinmemberNoah Chinnu22-Sep-12 2:58 
GeneralGood effort. +5! Pinmember_Amy22-Aug-12 21:10 
GeneralRe: Good effort. +5! PinmemberRobin Rizvi23-Aug-12 3:39 
GeneralAwesome PinmemberShailesh Kumar Singh17-Aug-12 20:50 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140814.1 | Last Updated 11 Aug 2012
Article Copyright 2012 by Robin Rizvi
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid