Google Prediction API - Hello Prediction!

Google Cloud Platform

4.75/5 (3 votes)

Dec 4, 2014

CC (Attr 3U)

7 min read

11810

This page gives a quick example of using the Prediction API that you can set up and run in 15 minutes.

Prerequisites
The Problem
The Solution
Next Steps

Prerequisites

You must have a Google Account, with a Google name and password.
You must have a Google Developers Console project with Google Prediction and Google Cloud Storage activated.
To activate an API for your project, do the following:
1. Go to the Google Developers Console.
2. Select a project, or create a new one.
3. In the sidebar on the left, expand APIs & auth.
4. Click APIs.
5. In the displayed list of available APIs, find the one you want to activate, and set its status to ON.
Some APIs also prompt you to accept their Terms of Service before you can activate them.

Google Cloud Storage is required by Google Prediction if you want to train from a CSV file, which is the use case covered here. However, if you wish to train from instances passed in the request or by updating an empty model, it is sufficient to only have Google Prediction enabled.

The Problem

Imagine that your company receives emails requesting help in several different languages, and you want to route the email to someone with the appropriate language skills. The problem here is to detect whether a given phrase is English, Spanish, or French.

To do this, you must create some training data to train the prediction engine. This training data consists of several text entries, each labeled "English," "Spanish," or "French." After training the system on this data, you will be able to submit arbitrary words or phrases in any of those languages, and the prediction engine will categorize your data as being closest to one of them.

The Solution

Here's how to run Hello Prediction to determine the language of an arbitrary text snippet:

Upload training data. We will provide you a sample training data file that includes English, Spanish, and French language examples. You must upload this to your Google Cloud Storage account.
Train the system. Tell the Prediction API to load your training data from Google Cloud Storage and analyze it. This is an asynchronous process, so you'll have to query the server periodically to check the status of the training session. Training must be complete before you can start to send queries.
Send queries. After training is done, you can send queries containing phrases in English, Spanish, or French, and Google Prediction will respond with the language of that text. You can run this step as many times as you want.

1. Upload Training Data

In this step you will upload a file of training data to your Google Cloud Storage account.

Download this training file (language_id.txt), which contains English, French, and Spanish training data entries. The format of the training data is a comma-separated values file with many entries and two columns: the second column is a long text snippet in a single language; the first column is the string name of the snippet language. Open the file to see what the training entries look like.

Upload the file to Google Cloud Storage:
1. Go to the Google Developers Console.
2. Select the project under which to store the data.
3. Select the "Cloud Storage" tab.
4. Click "New Bucket" or select an existing bucket.
5. Click on the bucket to which to upload the file, and click "Upload"
Create a new bucket by clicking New Bucket.
Select the bucket and click Upload, and upload the language_id.txt file from your computer.
Copy the bucket/path name of your file from the path column in the Google Cloud Storage Manager. For example: mybucket/language_id.txt

2. Train the System

The next step is to train the system against the training data that you uploaded. To do this, call trainedmodels.insert(), specifying the following parameters:

project: The Project Number listed in the Overview tab in Google Developers Console.
id: The string id that will be used to reference the model.
storageDataLocation: The Google Cloud Storage path where you uploaded your training data.

This creates a trained model that you can send your queries to.

For this exercise, you will use the Google APIs Explorer to make API calls. When programming your own applications, you would use one of the Google client libraries.

Open the APIs Explorer with the Google Prediction API selected.
Enable "Authorize requests using OAuth 2.0."
Select the trainedmodels.insert method.
In the Structured Editor tab of the dialog, add values for the following properties:
1. project - The Project Number associated with your project. You can find it in Google Developers Console listed in the Overview tab.
2. id - Assign an ID to your model. You will use this ID to refer to the model in training and query requests. The ID must be from 1 to 255 characters long, any mix of lowercase letters (a-z), digits (0-9), and dashes and underscores (_-). For example: languageidentifier
3. storageDataLocation - Enter the Google Cloud Storage path to the training file you uploaded. For example: mybucket/language_id.txt
Click Execute to call the method and start training your model. You can see the request and response in the History pane on the page.

Training is asynchronous; the training method returns immediately, and you must query Google Prediction to learn the status of the training session. For a training file this small, training should take less than a minute.

To check training status:

Select the trainedmodels.get method.
In the project textbox, enter the Project Number that was used in the Insert call that created the model.
In the data textbox, enter the model ID that you assigned to the id property.
Click Execute to call the method.
In the History pane, examine the response for the trainingStatus property. The call will return an HTTP 200 while training is in progress, with trainingStatus="RUNNING". When the call returns a trainedmodels resource with trainingStatus="DONE", training is finished, and you can start sending queries.

Here is an example reply:

{
  "kind": "prediction#training",
  "id": "languageidentifier",
  "storageDataLocation": "mybucket/language_id.txt",
  "selfLink": "https://www.googleapis.com/prediction/v1.6/projects/12345678910/trainedmodels/languageidentifier",
  "created": "2013-04-10T21:54:08.840Z",
  "trainingComplete": "2013-04-10T21:54:11.504Z",
  "modelInfo": {
    "numberInstances": "420",
    "modelType": "classification",
    "numberLabels": "3",
    "classificationAccuracy": "0.95"
  },
  "trainingStatus": "DONE"
}

3. Send Queries

Now you're ready to send queries to your model. Queries are always in the format of a single row of training data, minus the first column. Your training data had two columns: language_label, phrase_in_that_language; therefore a query against this data consists of a single column: a phrase in a language that you want to identify. Your phrase must be in one of the languages used in your training data. Google Prediction replies with its best guess at the language of your phrase.

To send a query:

Select the trainedmodels.predict method.
In the project textbox, enter the Project Number that was used in the Insert call that created the model.
In the id textbox, enter the model ID that you entered before.
In the Request body, add values for the following properties:
1. input - Select "add a property > csvInstance"
2. csvInstance - Click Add and enter a text string in English, French, or Spanish. For example: Muy Bueno. Do not quote the string. Only add one string value inside csvInstance.
Click Execute to call the method.
In the History pane, examine the response for the outputLabel property. This will be the best guess for the language of the string.

Here is an example reply:

{
 "kind": "prediction#output",
 "id": "languageidentifier",
 "selfLink": "https://www.googleapis.com/prediction/v1.6/projects/12345678910/trainedmodels/languageidentifier/predict",
 "outputLabel": "Spanish",
 "outputMulti": [
  {
   "label": "French",
   "score": "0.334130"
  },
  {
   "label": "Spanish",
   "score": "0.418339"
  },
  {
   "label": "English",
   "score": "0.247531"
  }
 ]
}

All score values are relative to each other, and the label with the highest score is the best guess. The best guess in the example above is Spanish, which is assigned to outputLabel. You can read more about the scoring algorithm in the predict method property description for outputMulti[].score.

Next Steps

Learn more about the Google Prediction API:

Experiment with the API in a Google Spreadsheet.
Read Use Cases to see different ways to use the API in real-life use cases.
Read the Developer's Guide to learn how to program against the API, and how to design a good model.
Check out the end-to-end samples in Java and Python.