Click here to Skip to main content
15,878,852 members
Please Sign up or sign in to vote.
2.00/5 (1 vote)
See more:
I have been trying to convert speech to text using the Sphinx package in java...but i am unable to understand why is it not correctly producing the tokens....
Below is the code

.java file
Java
package speechtotext;

import edu.cmu.sphinx.frontend.util.Microphone;
import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
import java.awt.*;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.*;


public class HelloWorld extends JApplet  implements ActionListener{
 
  private JButton b1 = new JButton("SPEAK"), b2 = new JButton("STOP");
  JTextArea textArea = new JTextArea(7,30);
  Result result;
  ConfigurationManager cm;
  Recognizer recognizer;
  Microphone microphone;
  String resultText ;
  
  public void init() {
    Container cp = getContentPane();
    cp.setLayout(new FlowLayout());
    Image image = Toolkit.getDefaultToolkit().createImage("C:\\Users\\arsa\\Desktop\\1.png");
    Image scaled = image.getScaledInstance(300, 550, Image.SCALE_SMOOTH);
    JLabel label = new JLabel(new ImageIcon(scaled));
    cp.add(label,BorderLayout.CENTER);
    textArea.setText("");
    textArea.setLineWrap(true);
    textArea.setEditable(false);
    add(textArea,"Center");
    cp.add(b2,FlowLayout.LEFT);
    cp.add(b1,FlowLayout.LEFT);
    cp.add(textArea);
    b1.addActionListener(this);
    b2.addActionListener(this);
    cm = new ConfigurationManager(HelloWorld.class.getResource("helloworld.config.xml"));
    recognizer = (Recognizer) cm.lookup("recognizer");
    System.out.println("Successful1 allocation");
    recognizer.allocate();
    System.out.println("Successful1 allocation1");
    microphone = (Microphone) cm.lookup("microphone");
     if (!microphone.startRecording()) {
            System.out.println("Cannot start microphone.");
            recognizer.deallocate();
            System.exit(1);
        }
  }
@Override
    public void actionPerformed(ActionEvent e) {
    String str=e.getActionCommand();
     if (e.getSource() == b1)
     {
         result = recognizer.recognize();                
     }   
     else  if (e.getSource() == b2)
     {
         if (result != null) {
             resultText = result.getBestPronunciationResult();
             if(resultText!=null)
                textArea.setText("You said: " + resultText + '\n');
            else if(resultText==null)
                textArea.setText("I couldn't hear what you said.\n");
         }
         else if(result==null)
             textArea.setText("Cheater!! Cheater!! you didn't say anything....\n");
     }
   }


.gram file
XML
#JSGF V1.0;

/**
 * JSGF Grammar for Hello World example
 */

grammar hello;

public <greet> = (Good morning | Hello) ( Bhiksha | Evandro | Paul | Philip | Rita | Will );
Posted
Updated 29-Apr-14 21:03pm
v4
Comments
NeverJustHere 29-Apr-14 19:54pm    
Try speaking in an American accent :)

I'm only half joking. The only experience I have with Speech to Text was a system installed in Australia in the late 90's using Dialogic Speech to Text recognition boards. The accuracy improved significantly when we got them to provide an Australian accented pattern.

We were only attempting to distinguish Yes/No answers over a telephone line.

1 solution

The problem might be your accent. But you can solve this by modifying the default acoustic model (list of phonemes of each word).
In Sphinx the acoustic model can be found as a text file. It includes some thing like following lines,
HELLO	HH AH L OW
HELLO(2)	HH EH L OW
THANKS	TH AE NG K S
YOUR	Y AO R
YOUR(2)	Y UH R


TH AE NG K S is the set of phonemes for the word "THANKS". You can modify these phonemes to suit to your pronunciation.

1. First find WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar file and extract it.
2. Go to edu\cmu\sphinx\model\acoustic\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz\dict folder and open “cmudict.0.6d” file in that folder.
3. Modify the content as it will suit to your pronunciation and save.
4. Zip the extracted hierarchy back as it was and Zip file named should be same as JAR file.
5. Remove “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar” file from Project’s CLASSPATH and add “WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.zip” instead of it.

You can add more words using the following tool.
http://www.speech.cs.cmu.edu/tools/lmtool-new.html[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900