Click here to Skip to main content
16,016,623 members
Articles / Artificial Intelligence / Tensorflow
Article

Python Machine Learning on Azure Part 3: Creating a TensorFlow Model with Python and Azure ML Studio

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
14 Jan 2022CPOL8 min read 4.9K   7   2
How to train within an Azure Machine Learning workspace using a TensorFlow model
This article shows the code to create and train a model using TensorFlow and then run model training on Azure ML. This will be followed by a brief demonstration to show that the trained model works as expected.

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

In this series, we work through several approaches to creating and using machine learning models in Python on Azure. In the first article, we used Visual Studio Code with ML extensions to train an XGBoost model on Azure. Then, we used GitHub Codespaces and GitHub Actions to automatically train a PyTorch model after each push of changes to a repository.

In most real-life machine learning projects, the actual training of the final model requires a long experimentation phase. This phase is highly interactive, so using regular Python code alone is uncommon. Initially, you’ll use mostly the Jupyter notebook on JupyterLab. This article shows how to train within an Azure Machine Learning workspace using a TensorFlow model.

You can find the sample code for this article on GitHub.

Prerequisites

To follow examples from this article, you need a web browser and access to an Azure subscription. If you don’t have a subscription, you can sign up for a free Azure account.

Setting Up the Azure Machine Learning Workspace

If you have followed examples in the previous two articles, you should already have an Azure: Machine Learning workspace created. If not, you can quickly build it the same way as any other Azure resource. One way to do it is to log in to the Azure Portal, click Create a Resource, then type "machine learning" into the search box.

Image 1

After selecting Machine Learning from the list and clicking Create, you need to enter your workspace’s details.

Image 2

You can enter any values you like, but we refer to azureml-rg as our resource group and demo-ws as the workspace name in the following examples. When satisfied, click Review + create, then click Create on the final wizard screen. After a few minutes, you have access to your new resource.

Image 3

The rest of the time we spend in the Azure Machine Learning Studio. To get there, click Launch studio in the Azure portal or enter ml.azure.com manually in the browser. Then, select your workspace.

Preparing the Azure ML Notebook

The easiest way to create a notebook is by clicking the Start now button in the Azure Machine Learning Studio’s Notebooks tile:

Image 4

Before that, we need to add a compute. Without it, we can still create and edit a notebook but not run its code.

To continue, click the Compute option on the left-side menu.

Image 5

Here, we have four options to choose from:

  • Compute instances
  • Compute clusters
  • Inference clusters
  • Attached computers

The compute instance is the best choice for working with notebooks.

Image 6

When creating a new compute instance, we need to provide its name and virtual machine (VM) parameters. You may need to be a little creative here, as it must be unique in your Azure region. If you expect heavy calculations, you may consider a VM with GPU.

Unlike compute clusters used for background processing, provisioning the compute instance when needed and deleting it when not used isn’t automatic. Initially, you spend most of your time writing the code, not running it. So, the GPU would be idle anyway.

When the compute is running, you see its status on the Compute instances tab:

Image 7

From here, we can start either JupyterLab or Jupyter. Select whichever option you prefer. It doesn’t matter for our purposes, so we continue with Jupyter. For more serious development, JupyterLab provides many helpful features, though.

When Jupyter starts, which should take only a moment, we select New, then the Python 3.8 - Tensorflow kernel. We end up with a new, empty notebook.

Image 8

Let’s name it (for example, “aml-demo-tf-notebook”). Now we’re ready to start.

Downloading an MNIST Dataset

As in the previous two articles in this series, we start by downloading the MNIST handwritten digits dataset. We use Microsoft’s Azure Open Datasets as a source:

Python
import os
import urllib.request

DATA_FOLDER = os.path.join(os.getcwd(), 'datasets/mnist-data')
DATASET_BASE_URL = 'https://azureopendatastorage.blob.core.windows.net/mnist/'
os.makedirs(DATA_FOLDER, exist_ok=True)

urllib.request.urlretrieve(
os.path.join(DATASET_BASE_URL, 'train-images-idx3-ubyte.gz'),
filename=os.path.join(DATA_FOLDER, 'train-images.gz'))
urllib.request.urlretrieve(
os.path.join(DATASET_BASE_URL, 'train-labels-idx1-ubyte.gz'),
filename=os.path.join(DATA_FOLDER, 'train-labels.gz'))
urllib.request.urlretrieve(
os.path.join(DATASET_BASE_URL, 't10k-images-idx3-ubyte.gz'),
filename=os.path.join(DATA_FOLDER, 'test-images.gz'))
urllib.request.urlretrieve(
os.path.join(DATASET_BASE_URL, 't10k-labels-idx1-ubyte.gz'),
filename=os.path.join(DATA_FOLDER, 'test-labels.gz'))

After executing this cell, the dataset files will be ready to use.

Exploring the MNIST Dataset

With the dataset downloaded to our ML workspace, we can explore it. First, we need to load it. This part of the code is dataset-specific and, in our case, is borrowed from the Azure Open Datasets example. If you’d like to use your data, you need to update or replace it accordingly.

Python
import gzip
import struct
import numpy as np

def load_dataset(dataset_path):
def unpack_mnist_data(filename: str, label=False):
with gzip.open(filename) as gz:
struct.unpack('I', gz.read(4))
n_items = struct.unpack('>I', gz.read(4))
if not label:
n_rows = struct.unpack('>I', gz.read(4))[0]
n_cols = struct.unpack('>I', gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols) / 255.0
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(-1)
return res

X_train = unpack_mnist_data(os.path.join(dataset_path, 'train-images.gz'), False)
y_train = unpack_mnist_data(os.path.join(dataset_path, 'train-labels.gz'), True)
X_test = unpack_mnist_data(os.path.join(dataset_path, 'test-images.gz'), False)
y_test = unpack_mnist_data(os.path.join(dataset_path, 'test-labels.gz'), True)

return X_train.reshape(-1,28,28,1), y_train, X_test.reshape(-1,28,28,1), y_test

Apart from simply loading the data, the load_dataset method normalizes pixel values (from range 0-255 to 0-1) and shapes the output into dimensions expected by the TensorFlow framework.

To load our dataset, we run:

Python
X_train, y_train, X_test, y_test = load_dataset(DATA_FOLDER)

We can confirm that we have expected 60,000 monochromatic images in the training set (28 x 28 pixels each):

Image 9

Let’s check what’s in there:

Python
def show_images(images, labels):
images_cnt = len(images)
f, axarr = plt.subplots(nrows=1, ncols=images_cnt, figsize=(16,16))
for idx in range(images_cnt):
img = images[idx]
lab = labels[idx]
axarr[idx].imshow(img, cmap='gray_r')
axarr[idx].title.set_text(lab)
axarr[idx].axis('off')

plt.show()

show_images(X_train[:10], y_train[:10])

Running this cell will show us the first ten labeled images from the training set:

Image 10

Defining the Model

We have data, so now we can start working on the model. To keep things fit-to-purpose, we’ll create a simple convolutional neural network (CNN), which accepts a 28 x 28 matrix, and returns the probability score for each of 10 digits.

Python
import tensorflow as tf
def create_tf_model():
model = tf.keras.models.Sequential(
[
tf.keras.layers.Conv2D(filters=10, kernel_size=5, input_shape=(28,28,1), activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=(2,2)),
tf.keras.layers.Conv2D(filters=20, kernel_size=5, activation='relu'),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.MaxPool2D(pool_size=(2,2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(320, activation='relu'),
tf.keras.layers.Dense(50, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='relu')
]
)
return model

model = create_tf_model()
model.build()

Using the model.summary method, we can inspect its layers and number of parameters:

Image 11

Training the Model

Before training, we must compile our model with a selected loss function and optimizer:

Python
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

Now, we can start the training. If we want to keep our training history in the Azure ML experiments, we need to take care of it first:

Python
from azureml.core import Workspace, Experiment
import mlflow, mlflow.tensorflow
SUBSCRIPTION="<your-subscription-id>"
GROUP="azureml-rg"
WORKSPACE="demo-ws"

ws = Workspace(
subscription_id=SUBSCRIPTION,
resource_group=GROUP,
workspace_name=WORKSPACE,
)
experiment = Experiment(ws, "aml-demo-tf-notebook-training")

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.start_run(experiment_id=experiment.id)
mlflow.tensorflow.autolog()

Using the code above, we create the new experiment (aml-demo-tf-notebook-training), inform the MLflow framework to use the Azure ML workspace for tracking, start a new run, and finally ask MLflow to log the TensorFlow training progress automatically. This auto-logging feature ensures model parameter and training progress tracking without additional manual steps.

With all these pieces in place, we can finally start the training. To ensure repeatable results between runs, we use a constant random feed:

Python
tf.random.set_seed(101)
model.fit(X_train, y_train, epochs=5)
mlflow.end_run()

Our simple model and relatively simple dataset shouldn’t take long regardless of the selected compute instance VM’s size.

Image 12

Thanks to our tracking configuration, we can check its progress and metrics using the Experiments tab in Machine Learning Studio (both during and after the training):

Image 13

Image 14

If training your model takes too long, it may be the moment to reconsider the compute VM’s size. At any time, you can go back to the Compute section of Machine Learning Studio to create a more powerful VM instance, then re-run the notebook. Alternatively, you can save the training code in a separate file and run the training on a compute cluster. The latter is slightly easier to do using Visual Studio Code, as shown in the previous two articles.

Evaluating and Registering the Model

After the training, we can check how our model works on the previously unseen test data. The simplest way is by using the model’s evaluate method:

Image 15

We have 98 percent accuracy. If the model is good enough for us, we can save it and register for future use. There are multiple options to save a TensorFlow model. We use the single-file HDF5 format for its portability. Finally, after saving the model, we can register it in the Azure: Machine Learning workspace:

Python
from azureml.core.model import Model
model.save(‘mnist-tf-model.h5')
registered_model = Model.register(
workspace=ws,
model_name='mnist-tf-model',
model_path='mnist-tf-model.h5',
model_framework=Model.Framework.TENSORFLOW,
model_framework_version=tf.__version__)

The trained model is now available for future use.

Image 16

Using the Saved Model for Predictions

To load the model registered in the Azure: Machine Learning workspace, we use the following code:

Python
aml_model = Model(workspace=ws, name='mnist-tf-model', version=registered_model.version)
downloaded_model_filename = aml_model.download(exist_ok=True)
downloaded_model = tf.keras.models.load_model(downloaded_model_filename)

If you run this code in a different notebook session from the training, make sure to provide the correct value of the version parameter.

Let’s examine what we have loaded:

Image 17

Now we can use the model for predictions.

Python
preds = downloaded_model.predict(X_test).argmax(axis=1)

We can quickly check how it compares to our ground-truth:

Image 18

As we can see, the first and the last three predicted values are correct.

If we still don't have confidence in our model, we can use the same method as previously to look at the labels predicted for test images:

Python
show_images(X_test[:10], preds[:10])

Image 19

Now, let’s evaluate the predictions using the complete test dataset. We could use the downloaded_model.evaluate method again, but to avoid re-running predictions, we can also rely on the scikit-learn method:

Python
from sklearn.metrics import accuracy_score

accuracy_score(y_test, preds)

We should get the same value as we did directly after training, 0.9864.

Cleaning Up Azure Resources

As soon as you finish your work with the notebook, remember to stop and delete resources you don’t need anymore. An extreme way of doing this is to remove the whole resource group azureml-rg. If you want to keep your data, the minimum you should do is to stop all compute instances:

Image 20

Unlike the compute clusters we used in the previous articles, compute instances don’t stop automatically after a configured period of inactivity.

Summary

Throughout the series, you’ve learned how we can use Python in several different ways to create and train models using Azure Machine Learning. This final article of the three-part series has shown how to train a TensorFlow model using Azure ML notebooks.

In this article, you’ve experienced how, with little effort, you can adapt the sample code to handle any image classification task for your own images. Most likely, you have a different number of channels (MNIST images are monochromatic) and classes. You must appropriately change the model code and the load_dataset method in such a case.

To learn everything you need to get started with Azure Machine Learning, check out Quickstart: Create workspace resources.

This article is part of the series 'Python ML on Azure View All

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Architect
Poland Poland
Jarek has two decades of professional experience in software architecture and development, machine learning, business and system analysis, logistics, and business process optimization.
He is passionate about creating software solutions with complex logic, especially with the application of AI.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA14-Jan-22 20:48
professionalȘtefan-Mihai MOGA14-Jan-22 20:48 
GeneralRe: My vote of 5 Pin
Jarek Szczegielniak18-Jan-22 11:34
Jarek Szczegielniak18-Jan-22 11:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.