Download source - 20.7 KB

Scotty: "Computer... Computer..."
(McCoy hands Scotty the mouse)
Scotty: "Aye. Hello computer."
Nichols: "Just use the keyboard."
Scotty: "Keyboard. How quaint."
-- Star Trek IV: The Voyage Home

With the advent of "Siri", Apple’s voice-recognition system for controlling (more or less) the iPhone via voice, the idea of being able to control a device or phone via voice commands leapfrogged into the stratosphere of "coolness". Not that voice interaction is a new idea, mind you: What developer hasn’t imagined having conversations with her computer the same way Tony Stark does with JARVIS in the movies or comic books; or, if you’re more of the old school science fiction fan, your favorite Captain conversing with his Enterprise?

Fortunately, the Android OS provides some baked-in speech recognition/speech-to-text capability, and accessing it is pretty straightforward. Unfortunately, figuring how and when to use that functionality is still something of an art—for example, trying to know when to start listening to the user’s voice and process the associated commands can be something of a user interface nightmare. If you force the user to push a button on the screen first, you lose the "hands free" capability that makes voice-driven commands so attractive; if you simply start listening to the audio input from the moment your application begins execution, not only do you start to run into problems with interacting with the Android activity lifecycle (do you keep reading the input channel if they switch away from your app?), but it can be nigh-impossible to distinguish between commands intended for the application, as opposed to those intended for the other people in the room (or in the other cars on the highway).

In truth, there’s a pretty simple signal to know when the user wants to start using the application: when they pick up their phone, or for a lot of users, when they pick up their headset and put it on. In fact, that latter event is an even stronger signal than the former—many users pick up their phone for a variety of tasks that have nothing to do with voice input, while the headset really has only one use, and all of it has to do with audio input or output. So, if we could somehow get the signal that the user was starting to use the headset, the voice-controlled application could begin its audio stream analysis. Or, should the user already be wearing the headset, the user could push a button on the headset to "begin".

While this is certainly likely to only be the first of many, the Voyager Legend UC® headset from Plantronics gives exactly this kind of information. Plantronics sent me a unit to explore.

Bluetooth, Android, and You

As we talk about programming the headset on an Android device, I’m going to assume that readers are familiar with the basics of Android programming, meaning that Activities, Handlers and Intents shouldn’t be foreign concepts. If you’re not comfortable yet with Android, it’s a fairly easy OS for a developer to pick up—tutorials on the Web abound, and only a rudimentary knowledge of the Java language is required.

The short version of the story is that once paired with an Android device (and bear in mind, this can be either phone or a tablet, which opens up some interesting possibilities for non-phone-related applications), the Voyager Legend sends several bits of information over Bluetooth to the paired Android device. These events range from "don" and "doff" events (meaning the user has "donned" the device—putting it on—or "doffed" the device, taking it off), to button-press events, to different sensor responses that the device picks up.

Thus, the first step in any sort of headset-aware application is to recognize the paired headset in your application, and ask Android to begin sending you events that the device receives over Bluetooth. In Android, receiving events from other parts of the device is done using a BroadcastReceiver; this is essentially a superclass whose onReceive() method receives all the messages destined for that BroadcastReceiver (filtered through an IntentFilter). Once that’s established, it’s a trivial matter to start watching the event stream, looking for particular events—notifications coming over Bluetooth from the device, usually device-specific unless we’re talking about generic "paired" and "not-paired" events—and reacting according to the application’s needs.

The Voyager Legend headset includes two buttons, one called the "Voice" button and the other the "Call" button—when used normally, they’re used to talk to the headset (press the "Voice" button, and it will tell you how much talk time is left on the headset, which is a nifty little feature, if you ask me) and to answer an incoming phone call, respectively. However, from the point of view of the developer, they’re each just a button, and we can repurpose them as we need to.

The exact nature of the data sent by the device to Android, and from Android to the application, is described in Plantronics documentation. It’s a set of codes and strings that are particular to the Plantronics device, and sorting through all of them to figure out exactly what’s being sent can be tricky. Fortunately, Cary Bran, a Plantronics evangelist, has written some example code up on the Plantronics developer forum that provides an example "event-streamer" application that demonstrates the different events that are sent, and provides two classes, the PlatronicsReceiver (the BroadcastReceiver-inheriting class) and a simple wrapper for the messages it fires, PlantronicsXEventMessage. See http://developer.plantronics.com/blogs/Cary/2012/11/26/plugging-into-plantronics-headset-sensor-events-via-android for details.

Events, please

Receiving events from the Android OS, such as Bluetooth events, involves the use of a BroadcastReceiver-derived class, and must be registered with the Android OS, so the OS knows to send events (Intent objects) to the BroadcastReceiver. This registration can come in two forms—one, where the BroadcastReceiver lives on outside of your Android application process, requires the BroadcastReceiver to be registered in the AndroidManifest.xml file, while the other, where the BroadcastReceiver is passed in a registerReceiver() method, means the BroadcastReceiver only receives events as long as your Android process is running, and requires no manifest entry. Thus, the first thing the application will need to do is create one of these custom BroadcastReceivers (the PlantronicsReceiver, below) and register it for incoming Bluetooth events:

  private void initBluetooth() {
    handler = new BluetoothHandler();
    receiver = new PlantronicsReceiver(btHandler);
 
    intentFilter = new IntentFilter();
    intentFilter.addCategory(
      BluetoothHeadset.VENDOR_SPECIFIC_HEADSET_EVENT_COMPANY_ID_CATEGORY +
      "." +
      BluetoothAssignedNumbers.PLANTRONICS);
    intentFilter.addAction(BluetoothDevice.ACTION_ACL_CONNECTED);
    intentFilter.addAction(BluetoothDevice.ACTION_ACL_DISCONNECT_REQUESTED);
    intentFilter.addAction(BluetoothDevice.ACTION_ACL_DISCONNECTED);
    intentFilter.addAction(
      BluetoothHeadset.ACTION_VENDOR_SPECIFIC_HEADSET_EVENT);
    intentFilter.addAction(BluetoothHeadset.ACTION_AUDIO_STATE_CHANGED);
    intentFilter.addAction(BluetoothHeadset.ACTION_CONNECTION_STATE_CHANGED);
 
    registerReceiver(receiver, intentFilter);
  }

Aside from the Bluetooth-specific parts of the IntentFilter, this is a pretty normal BroadcastReceiver implementation. The actions in the IntentFilter specify which actions the BroadcastReceiver will be open to receiving (we don’t want to be flooded with every message sent to anywhere on the device), and then it’s passed to registerReceiver(), opening the PlantronicsReceiver for business.

(The details of how the PlantronicsReceiver unpacks the packet of information sent across Bluetooth is really beyond the scope of this article, but it’s not too difficult to reverse-engineer from the code that Cary posted. Said simply, it’s all packaged up in the Intent, and the extra data comes as "event extras" in that Intent, and the PlantronicsReceiver unpacks the "event extras" to discover the additional information, such as which button was pressed and how, to set those as properties on the PlantronicsXEventMessage instance that it creates.)

Just Handle it

Notice that the PlantronicsReceiver takes a Handler as its constructor argument—this is a Handler-extending class that handles the messages being sent from the BroadcastReceiver. This Handler is the ultimate recipient of the message, and will be supplied by the application. It’s in here that the application receives the PlantronicsXEventMessage, determines the type of event (DON, DOFF, BUTTON, or whatever else), and extracts any additional information that comes with the event. For example, BUTTON events will come with three additional properties on the message: "buttonId", describing which button was pressed, "buttonName", the name of said button, and "pressType", indicating the kind of press (short or long) registered. BATTERY events, on the other hand, will come with properties like "level", describing the charge level of the headset, "charging", a "true"/"false" value indicating whether the headset is plugged in and charging, and "minutesOfTalkTime", which is pretty self-explanatory.

The PlantronicsReceiver class is the final arbiter on what data is sent in the PlantronicsXEventMessage, so check that code for details.

A Handler, then, will receive these message objects in its handleMessage() method, and "unpack" them like so:

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case PlantronicsReceiver.HEADSET_EVENT:
                PlantronicsXEventMessage message =
                  (PlantronicsXEventMessage) msg.obj;

                // What is the type of this event?
                String type = message.getEventType();
                if (type.equals(PlantronicsXEventMessage.BUTTON_EVENT)) {
                }
                if (type.equals(PlantronicsXEventMessage.BATTERY_EVENT)) {
                }
                break;
            default:
                break;
        }
    }

Note that the use of the new "static import" facility in Java 6 can be used to reduce the verbosity of those typechecks.

Dictaphone

Say, for example, we want to create a Dictaphone kind of application. (Readers under the age of 40 may not know this, but back in ancient days of yore, back when dinosaurs roamed the Earth, and devices had to be tethered via wires in order to operate, a "dictaphone" was a device designed to record the human voice onto some kind of storage medium—early ones actually used wax cylinders.) The application flow would be something like this: if the application is running and if a Voyager Legend is paired to the device, then we wait for a "button" event.

Once the button has been pressed, we can immediately kick the device into speech-to-text mode, listening to the incoming audio stream:

    /**
     * Handler for BluetoothReceiver events
     */
    public class BluetoothHandler extends Handler {
        @Override
        public void handleMessage(Message msg) {
            switch (msg.what) {
                case PlantronicsReceiver.HEADSET_EVENT:
                    PlantronicsXEventMessage message =
                        (PlantronicsXEventMessage) msg.obj;
 
                    // What is the type of this event?
                    String type = message.getEventType();
 
                    // If this is a "BUTTON" event, start the recorder
                    if (type.equals(PlantronicsXEventMessage.BUTTON_EVENT)) {
                      // Pop Toast
                      Toast.makeText(getApplicationContext(),
                          "Listening....",
                          Toast.LENGTH_SHORT).show();
 
                      Intent intent = new Intent(
                          RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
                      intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
                          "en-US");
 
                      try {
                        startActivityForResult(intent, RESULT_SPEECH);
                        textPane.setText("");
                      } catch (ActivityNotFoundException a) {
                        Toast.makeText(getApplicationContext(),
                            "Ops! Your device doesn't support Speech to Text",
                            Toast.LENGTH_SHORT).show();
                      }
                    }
                    break;
                default:
                    break;
            }
        }
    }

When that Activity stops, Android will have listened to the audio stream and made its best-guess interpretation of that spoken word, returning an ArrayList<String> of possibilities (the most likely one being the first) that we can then extract and append to the text display in the middle of the main Activity. Doing so, however, isn’t just a matter of capturing the return value from a method call; in Android, when one Activity wants the result from another, the requesting Activity launches the second using startActivityForResult(), passing in the Intent used to launch the second Activity (which in this case, is the Intent designed to launch the built-in speech recognition Activity, as obtained from RecognizerIntent, above) and a "result code" that will be used to identify this particular exchange, which in this case is a simple constant called RESULT_SPEECH (with a value of "1"). When the speech-recognition Activity finishes, it will fire an activity result back at this Activity, which will trigger the onActivityResult() method, and will include a request code (RESULT_SPEECH), a result code (defined by Android, and usually RESULT_OK, unless something went pear-shaped), and an Intent containing the Activity’s results (in this case, our collection of Strings representing the user’s speech):

    @Override
    protected void onActivityResult(int requestCode,
                                    int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
 
        switch (requestCode) {
            case RESULT_SPEECH: {
                if (resultCode == RESULT_OK && null != data) {
                  ArrayList<String> text =
                      data.getStringArrayListExtra(
                          RecognizerIntent.EXTRA_RESULTS);
 
                  textPane.setText(text.get(0));
                }
                break;
            }
        }
    }

Once the data has been retrieved from the Intent, it’s an easy matter to set its contents on the TextView (textPane) in the application.

From here, it would be fairly easy to imagine how this application could save the note, load other notes, and so on, including voice commands to do all of the above, but this is the basics.

Summary

Readers may actually be a little surprised at how little code there actually is here—between the PlantronicsReceiver interpreting the Bluetooth data and handing us pretty easy-to-consume message objects, and the Android OS doing the "heavy lifting" of translating speech into text, we have a pretty functional application in just three classes and (not counting the code Cary wrote) about 150 lines of code. That’s a pretty hefty amount of functionality for fairly small engineering effort, and makes me, at least, smile in anticipation of the ways this can be used in modern mobile applications. Enjoy!