Speaking ASP.NET Website






4.83/5 (8 votes)
Text-to-speech in an ASP.NET MVC website - This tip shows how to setup a website to generate a text-to-speech MP3, then stream it for a browser client using HTML 5 audio controls.
Introduction
Text-to-speech in an ASP.NET MVC website - this tip shows how to setup a website to generate a text-to-speech MP3, then stream it for a browser client using HTML 5 audio controls.
Background
Finding a good text-to-speech implementation in ASP.NET was rather difficult for the requirements of my project. I was able to find enough forums and documentation to assemble a simple solution to generate text to speech MP3 audio to a browser client. The voice generator comes from the .NET Microsoft Speech Synthesizer. A WAV audio stream is created that is then passed through the Naudio Lame framework to be converted to a MP3 stream. Why a MP3 format versus the standard WAV file? MP3s are smaller in file size and play nicer with most modern browser clients.
Using the Code
I recommend downloading and running the project code attached to this tip to see a working example.
Prerequisites
- Using IIS 7.5 or newer (It's also been tested on IIS 10 express)
- Using application in integrated mode
- Application pool identity of website needs to be Local System
- MVC3 or newer
- Reference
System.Speech
- Nuget packages:
Naudio
Naudio Lame
Place the proper references in the home controller.
using NAudio.Lame;
using NAudio.Wave;
using System;
using System.Globalization;
using System.IO;
using System.Speech.AudioFormat;
using System.Speech.Synthesis;
using System.Threading;
using System.Web;
using System.Web.Mvc;
Place the following method called TextToMp3
in the home controller. The only input is the text in which to be converted. The text gets converted to a WAV stream using the Microsoft speech synthesizer. The WAV stream is then converted to Mp3 stream using Naudio.Lame
framework. The result is returned in bytes, as a FileResult
, to the browser client. It can then be played via html5 audio controls.
public FileResult TextToMp3(string text)
{
//Primary memory stream for storing mp3 audio
var mp3Stream = new MemoryStream();
//Speech format
var speechAudioFormatConfig = new SpeechAudioFormatInfo
(samplesPerSecond: 8000, bitsPerSample: AudioBitsPerSample.Sixteen,
channel: AudioChannel.Stereo);
//Naudio's wave format used for mp3 conversion.
//Mirror configuration of speech config.
var waveFormat = new WaveFormat(speechAudioFormatConfig.SamplesPerSecond,
speechAudioFormatConfig.BitsPerSample, speechAudioFormatConfig.ChannelCount);
try
{
//Build a voice prompt to have the voice talk slower
//and with an emphasis on words
var prompt = new PromptBuilder
{ Culture = CultureInfo.CreateSpecificCulture("en-US") };
prompt.StartVoice(prompt.Culture);
prompt.StartSentence();
prompt.StartStyle(new PromptStyle()
{ Emphasis = PromptEmphasis.Reduced, Rate = PromptRate.Slow });
prompt.AppendText(text);
prompt.EndStyle();
prompt.EndSentence();
prompt.EndVoice();
//Wav stream output of converted text to speech
using (var synthWavMs = new MemoryStream())
{
//Spin off a new thread that's safe for an ASP.NET application pool.
var resetEvent = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(arg =>
{
try
{
//initialize a voice with standard settings
var siteSpeechSynth = new SpeechSynthesizer();
//Set memory stream and audio format to speech synthesizer
siteSpeechSynth.SetOutputToAudioStream
(synthWavMs, speechAudioFormatConfig);
//build a speech prompt
siteSpeechSynth.Speak(prompt);
}
catch (Exception ex)
{
//This is here to diagnostic any issues with the conversion process.
//It can be removed after testing.
Response.AddHeader
("EXCEPTION", ex.GetBaseException().ToString());
}
finally
{
resetEvent.Set();//end of thread
}
});
//Wait until thread catches up with us
WaitHandle.WaitAll(new WaitHandle[] { resetEvent });
//Estimated bitrate
var bitRate = (speechAudioFormatConfig.AverageBytesPerSecond * 8);
//Set at starting position
synthWavMs.Position = 0;
//Be sure to have a bin folder with lame dll files in there.
//They also need to be loaded on application start up via Global.asax file
using (var mp3FileWriter = new LameMP3FileWriter
(outStream: mp3Stream, format: waveFormat, bitRate: bitRate))
synthWavMs.CopyTo(mp3FileWriter);
}
}
catch (Exception ex)
{
Response.AddHeader("EXCEPTION", ex.GetBaseException().ToString());
}
finally
{
//Set no cache on this file
Response.Cache.SetExpires(DateTime.UtcNow.AddMinutes(-1));
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.Cache.SetNoStore();
//required for chrome and safari
Response.AppendHeader("Accept-Ranges", "bytes");
//Write the byte length of mp3 to the client
Response.AddHeader("Content-Length",
mp3Stream.Length.ToString(CultureInfo.InvariantCulture));
}
//return the converted wav to mp3 stream to a byte array for a file download
return File(mp3Stream.ToArray(), "audio/mp3");
}
The Naudio Lame DLL files need to be loaded into memory on application start. The code below will need to be added to the global.aspx.cs file.
public static void CheckAddBinPath()
{
// find path to 'bin' folder
var binPath = Path.Combine(new string[]
{ AppDomain.CurrentDomain.BaseDirectory, "bin" });
// get current search path from environment
var path = Environment.GetEnvironmentVariable("PATH") ?? "";
// add 'bin' folder to search path if not already present
if (!path.Split(Path.PathSeparator).Contains(binPath, StringComparer.CurrentCultureIgnoreCase))
{
path = string.Join(Path.PathSeparator.ToString
(CultureInfo.InvariantCulture), new string[] { path, binPath });
Environment.SetEnvironmentVariable("PATH", path);
}
}
In the same file, the Application_Start
method should look like below with the CheckAddBinPath
added to the bottom of the method.
protected void Application_Start()
{
AreaRegistration.RegisterAllAreas();
FilterConfig.RegisterGlobalFilters(GlobalFilters.Filters);
RouteConfig.RegisterRoutes(RouteTable.Routes);
BundleConfig.RegisterBundles(BundleTable.Bundles);
//check for bin files to be loaded
CheckAddBinPath();
}
Example of Use
On the home view, add the following HTML, JavaScript, and jQuery code.
<label for="inputText">Type it!</label><br />
<textarea id="inputText" class="form-control"
rows="5" style="width:100%;"></textarea><br />
<button id="playAudio" type="button"
class="btn btn-primary btn-lg btn-block">Say it!</button>
<div id="divAudio_Player" class="hidden">
<audio id="audio_player">
<source id="audio_player_wav" src="@Url.Action
("PlayTextArea", "Home",
new { text = "type something in first" })" type="audio/mp3" />
<embed height="50" width="100"
src="@Url.Action("PlayTextArea", "Home",
new { text = "type something in first" })">
</audio>
</div>
$(function () {
$('#playAudio').click(function () {
var newUrl = '@Url.Action("PlayTextArea", "Home")?text='+
encodeURIComponent($('#inputText').text()) + '×tamp=' + new Date().getTime();
var new_audio = $(this).attr('rel');
var source = '';
//play it
setTimeout(function() {
$('#divAudio_Player').html(source);
var aud = $('#audio_player').get(0);
aud.play();
}, 500);
});
});
Add the following ActionResult
to the home controller that will be used in this example:
public ActionResult PlayTextArea(string text)
{
if (String.IsNullOrEmpty(text)) {
text = "Type something in first";
}
return TextToMp3(text);
}
Run the project, type something in, and click "Say it!
".
Points of Interest
Making any application speak has always been of interest to me. It adds usefulness as an application.
Know Issues
- High CPU usage does occur when converting a WAV memory stream to MP3 stream.
- Application identity would be preferred as a user in IIS but speech synthesizer needs to have a user profile.