Click here to Skip to main content
14,176,894 members
Click here to Skip to main content
Add your own
alternative version

Tagged as


10 bookmarked
Posted 28 Mar 2018
Licenced Apache

Data Scraping from Speech to Text

, 28 Mar 2018
Rate this:
Please Sign up or sign in to vote.
Speech to Text Recognition for Data Scraping and Collection in Data Mining


Data Science is a growing field. According to CRISP DM model and other Data Mining models, we need to collect data before mining out knowledge and conduct predictive analysis. Data Collection can involve data scraping, which includes web scraping (HTML to Text), image to text and video to text conversion. When data is in text format, we usually use text mining techniques to mine out knowledge.

In this article, I am going to introduce you to speech to text recognition. I developed Just Another Voice Transformer (JAVT) to convert videos into text files, and consolidate them into a set of text data for text mining and natural language processing.

JAVT has features to convert video into audio file using ffmpeg, and then convert audio into text file, using Microsoft SAPI or CMU Sphinx. I have included the source code for all the video to audio conversion and audio to text conversion. In this article, I am going to explain only the Speech Recognition and Speech Synthesizer using Microsoft SAPI, and interfacing with ffmpeg.

Speech Recognition in C# using Microsoft SAPI

To use speech recognition in C#, you will need to add the following libraries at the top of the code:  

using System.Speech.Recognition;
using System.Speech.AudioFormat;

Then create the dictation grammar and Speech Recognition Engine:

DictationGrammar dictation;
dictation = new DictationGrammar();
private SpeechRecognitionEngine sr;
sr = new SpeechRecognitionEngine();

We will then need to load the dictation grammar into speech recognition engine:


If you are using .wav file as input, set the speech recognition engine to:


If you are using the audio device such as microphone as input, set the speech recognition engine to:


To perform asynchronous speech recognition:


Then add these event handlers:

sr.SpeechRecognized -= new EventHandler<SpeechRecognizedEventArgs>(SpeechRecognized);
sr.EmulateRecognizeCompleted -= 
new EventHandler<EmulateRecognizeCompletedEventArgs>(EmulateRecognizeCompletedHandler);

sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(SpeechRecognized);
sr.EmulateRecognizeCompleted += 
new EventHandler<EmulateRecognizeCompletedEventArgs>(EmulateRecognizeCompletedHandler);

If the speech is recognized, SpeechRecognized() method will be called. The following is the SpeechRecognized() method used in JAVT. To get the recognized text, we get it from e.Result.Text.

string finalResult;
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) {
            finalResult = e.Result.Text;
            richTextBox3.Text += " " + finalResult;
            catch(Exception ex) {

If the speech recognition is completed, the EmulateRecognizeCompletedHandler() method will be called. The following is the EmulateRecognizeCompletedHandler() method in the program:

bool isCompleted = false;
private void EmulateRecognizeCompletedHandler(object sender, EmulateRecognizeCompletedEventArgs e) {
            isCompleted = true;
            richTextBox3.Text += "\n\nCompleted. \n";
            MessageBox.Show("Completed. ");
            catch(Exception ex) {

Text to Speech

Since we have created speech recognition, the following is the text to speech recognition.

First, we need to add in System.Speech.Synthesis library and create Speech Synthesizer:

using System.Speech.Synthesis;
SpeechSynthesizer speaker;
speaker = new SpeechSynthesizer();

Then we set the Rate and Volume:

speaker.Rate = int.Parse(rateTextBox.Text);
speaker.Volume = int.Parse(volTextBox.Text);

To use a female speaker:


Then run the Speech Synthesizer:


Video to Audio Conversion

I use ffmpeg to convert video into audio. To interface with ffmpeg, first, include the System.Diagnostics library:

using System.Diagnostics;

Then create a new process:

Process process = new Process();

Create the ffmpeg inputs:

string arg = "-i " + f + " -ab 160k -ac 2 -ar 44100 -vn " + f + ".wav";

Set the process settings:

process.StartInfo.FileName = Directory.GetCurrentDirectory() + "\\ffmpeg\\bin\\ffmpeg.exe";
process.StartInfo.Arguments = arg;
process.StartInfo.ErrorDialog = true;
process.StartInfo.WindowStyle = ProcessWindowStyle.Normal;

Start the process:



This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0


About the Author

Eric M. H. Goh
Founder SVBook
Singapore Singapore
Eric Goh is a data scientist, software engineer, adjunct faculty and entrepreneur with years of experiences in multiple industries. His varied career includes data science, data and text mining, natural language processing, machine learning, intelligent system development, and engineering product design. He founded SVBook and extended it with DSTK.Tech and EMHAcademy. DSTK.Tech is where Eric develops his own DSTK data science softwares. Eric also publishes 5 books at LeanPub and SVBook, and teaches the content at Udemy and EMHAcademy. During his free time, Eric is also an adjunct faculty at University of the People.

Eric Goh has been leading his teams for various industrial projects, including the advanced product code classification system project which automates Singapore Custom’s trade facilitation process, and Nanyang Technological University's data science projects where he develop his own DSTK data science software. He has years of experience in C#, Java, C/C++, SPSS Statistics and Modeller, SAS Enterprise Miner, R, Python, Excel, Excel VBA and etc. He won Tan Kah Kee Young Inventors' Merit Award and Shortlisted Entry for TelR Data Mining Challenge.

He holds a Masters of Technology degree from the National University of Singapore, an Executive MBA degree from U21Global (currently GlobalNxt) and IGNOU, a Graduate Diploma in Mechatronics from A*STAR SIMTech (a national research institute located in Nanyang Technological University), and Coursera Specialization Certificate in Business Statistics and Analysis from Rice University. He possessed a Bachelor of Science degree in Computing from the University of Portsmouth after National Service. He is also a AIIM Certified Business Process Management Master (BPMM), GSTF certified Big Data Science Analyst (CBDSA), and IES Certified Lecturer.

Specialties: Data Science, Text Mining, Social Network Analysis, Natural Language Processing, Machine Learning, Software Engineering, Mechatronics, Business.

You may also be interested in...

Comments and Discussions

QuestionNice Article! Any Chance You Have a Repo with Sphinx Integration? Pin
David Gerding29-Sep-18 11:38
memberDavid Gerding29-Sep-18 11:38 
QuestionGreat document and a realy good read... Pin
Catelyn Hearne22-Apr-18 12:14
memberCatelyn Hearne22-Apr-18 12:14 
Thankyou Eric.
I am from a Contact Centre Telepony background and see now the simplicity of this type of solution, which is some great insight building. You have opened an interest to look further into the API... Thankyou.... Smile | :)

Nicely writren, easy read. Appreciate your article.
QuestionData scraping/mining? Pin
Ravi Bhavnani29-Mar-18 13:46
professionalRavi Bhavnani29-Mar-18 13:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web01 | 2.8.190526.1 | Last Updated 28 Mar 2018
Article Copyright 2018 by Eric M. H. Goh
Everything else Copyright © CodeProject, 1999-2019
Layout: fixed | fluid