Click here to Skip to main content
15,896,475 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
See more:
How can I develop an own text to speech software from scratch? I want to start it in C++ .net or C# language. I have programming skills in c# and c++ .net but I can't wanna use 3rd party libraries, I want an entirely own one.

My project below is not enough for me and sample from my project only.
I have some years musical audio skills and programmatically some months but I have some problems with Fast FT and Shortime FT.

What I have tried:

C#
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

using System.Speech.Synthesis;
using System.Speech.AudioFormat;

 public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        List<string> all_lines_List = new List<string>();

        private void Read_Click(object sender, EventArgs e)
        {
            if(comboBox1.SelectedItem!=null)
            {
                button1.Enabled = false;

                
                SpeechSynthesizer speech_synthesizer = new SpeechSynthesizer();

                ReadOnlyCollection<InstalledVoice> InstalledVoices = speech_synthesizer.GetInstalledVoices();
                


                speech_synthesizer.SelectVoice(comboBox1.SelectedItem.ToString());
                //speech_synthesizer.SelectVoice("MSMary");  
               
                speech_synthesizer.SetOutputToDefaultAudioDevice();
                //PromptBuilder builder = new PromptBuilder();
                //builder.AppendSsmlMarkup("<say-as interpret-as = \"chs\"> chair </say-as>");

                //speech_synthesizer.AddLexicon(new Uri("C:\\W7\\Spelling.pls"), "application/pls");
                
                speech_synthesizer.Volume = 100;
                speech_synthesizer.Rate = 0;

                PromptBuilder builder = new PromptBuilder();
                builder.AppendText("This is sample output to a WAVE file.", PromptEmphasis.Strong);
                builder.AppendSsmlMarkup("<say-as interpret-as = \"WAVE\"> chair </say-as>");

                speech_synthesizer.Speak(builder);

                speech_synthesizer.Speak(richTextBox2.Text);

                //speech_synthesizer.RemoveLexicon(new Uri("C:\\W7\\Spelling.pls"));
                button1.Enabled = true;                

                System.Media.SystemSounds.Asterisk.Play();

                Application.DoEvents();
            
            }
            else
            {
                System.Media.SystemSounds.Hand.Play();
                MessageBox.Show("Please, Select a Voice.");
            }


           
        }

        private void Save_Without_Reading_Click(object sender, EventArgs e)
        {
                      
            if (comboBox1.SelectedItem != null)
            {
                SpeechSynthesizer speech_synthesizer = new SpeechSynthesizer();

                speech_synthesizer.SelectVoice(comboBox1.SelectedItem.ToString());
                
                speech_synthesizer.SetOutputToDefaultAudioDevice();

                speech_synthesizer.Volume = 100;
                speech_synthesizer.Rate = 0;

                speech_synthesizer.SetOutputToWaveFile(comboBox1.SelectedItem.ToString() + " - Speech.wav", new SpeechAudioFormatInfo(44100, AudioBitsPerSample.Sixteen, AudioChannel.Mono));

                speech_synthesizer.Speak(richTextBox2.Text);
                System.Media.SystemSounds.Asterisk.Play();                

            }
            else
            {
                System.Media.SystemSounds.Hand.Play();
                MessageBox.Show("Please, Select a Voice.");
            }

            
        }        
Posted
Updated 27-Jun-16 5:23am
v2
Comments
El_Codero 27-Jun-16 11:06am    
I think it would be a huge task to really build your OWN TTS-Project. You're currently using SpeechSynthesizer-class from the .net Framework. What is not working with your code? In addition you may be interested in CmuSphinx which is open source and extensible CMUSphinx
Sergey Alexandrovich Kryukov 27-Jun-16 11:17am    
Very interesting link; thank you for sharing. Did you test the recognition itself, its quality?
—SA
El_Codero 27-Jun-16 11:25am    
Hi Sergey, oh yeees I spent a quite a lot of time with cmusphinx two years ago. The existing language model for EN language is pretty well trained (unfortunately it's clear that it currently can't concurrent with "cloud-trained" recognition systems like google speech recognition), but there're many tools within the cmusphinx framework to build a custom language model and how to train it, the documentation is good for an open source project.
i.e. here is a recommendation from the dev's how to train your model. "1 hour of recording for command and control for single speaker" till "50 hours of recordings of 200 speakers for many speakers dictation" . so we need recordings for training :D ...by the way I waited years to get this, it's out now for 3rd party developers like me :)
Sergey Alexandrovich Kryukov 27-Jun-16 11:44am    
Just a note for others: we are now discussing speech recognition (not text to speech, which is the topic of the original question), but this is what I'm interested in.

I only tried recognition I installed on Android, in the form of a virtual keyboard in two different languages (I don't know the origin of source code); and was pleased by the quality even in dictation; it can even make "speech typing of text" practical.

Before, I tried recognition supplied my Microsoft for Windows; it does work; but overall I characterized it as "spectacular failure". It can recognize things based on tiny grammar, but dictation, formally existing, is better should be forgotten.

So, my other question is: does Windows build work? Did you try it, too?

Perhaps your experience would be very useful if you develop good demo applications with complete build instructions and one-click batch build, tests, discuss results, etc. — in a fairly detailed article.

—SA
El_Codero 27-Jun-16 12:07pm    
yes, the speech recognition on android relies on the google speech recognition api and unfornately yes, working with the windows speech recognition api from the .net framework is the first thing I had to give up working with because of the quality, detection rate and more things (as you said it's unfortunately really a big failure). Far way better is the BING Speech Recognition API but google's speech rec api is (in my option, and after much tests) the top of speech rec apis. Yes, i really would like to write an article about that, hope I have some time for that in near feature :) What demo application are you interested in (most)?With CMUSphinx? Björn Ranft

1 solution

 
Share this answer
 
Comments
Roland-HE-C# 27-Jun-16 11:31am    
It is usuable but this is not suffiecent for me example its max frequency is 22025Hz and cannot customize its reader intonation/pitch.
Can you give a timbre preverser pitch algorithm?

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900