Click here to Skip to main content
15,124,780 members
Articles / Artificial Intelligence / Keras
Article
Posted 16 Apr 2021

Stats

2.9K views
54 downloads
3 bookmarked

Building AI Language Translation with TensorFlow and Keras

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
16 Apr 2021CPOL4 min read
In this article we’ll build our AI language translation system.
In this article we built a deep learning-based model for automatic translation from English to Russian using TensorFlow and Keras.

Introduction

Google Translate works so well, it often seems like magic. But it’s not magic — it’s deep learning!

In this series of articles, we’ll show you how to use deep learning to create an automatic translation system. This series can be viewed as a step-by-step tutorial that helps you understand and build a neuronal machine translation.

This series assumes that you are familiar with the concepts of machine learning: model training, supervised learning, neural networks, as well as artificial neurons, layers, and backpropagation.

In the previous article, we installed all the tools required to develop an automatic translation system, and defined the development workflow. In this article, we’ll go ahead and build our AI language translation system.

We’ll need to write very few lines of code because, for most of the logic, we’ll use Keras-based pre-formatted templates.

If you'd like to see the final code we end up with, it's available in this Python notebook.

Importing Libraries

As a start, we need to load the required libraries:

Python
import warnings
warnings.filterwarnings("ignore")
import tensorflow as tf
import numpy as np
import string
from numpy import array, argmax, random, take
#for processing imported data
import pandas as pd
#the RNN routines
from keras.models import Sequential
from keras.layers import Dense, LSTM, Embedding, RepeatVector
#we will need the tokenizer for BERT
from keras.preprocessing.text import Tokenizer
from keras.callbacks import ModelCheckpoint
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
from keras import optimizers

Building Model Components

Building our model with Keras is very straightforward. We'll start by creating our model using the Sequential model provided by Keras.

Python
model = Sequential()

Next, we add a long short-term memory (LSTM) layer. In Keras' LSTM class, most parameters of an LSTM cell have default values, so the only thing we need to explicitly define is the dimensionality of the output: the number of LSTM cells that will be created for our sequence-to-sequence recurrent neural network (RNN).

The size of the input vector is the total of the words inside the original sentence. Because we’re using an embedding, we will get tokenized words. This means that words can be split into subtokens, increasing the number of words in the input sentence.

To keep our model size manageable (and therefore ensure we can train it in a reasonable amount of time), we set a length of 512. We add two LSTM layers: the first is an encoder, and the second is a decoder.

Python
model.add(LSTM(512))
model.add(RepeatVector(LEN_EN))
model.add(LSTM(512))

Note that we've added a RepeatVector in the middle. That will be part of our attention mechanism, which we'll add shortly.

Next, we add a Dense layer to our model. This layer takes all the output neurons from the previous layer. We need the dense layer because we’re making predictions. We want to get the sentence in Russian that has the maximal score corresponding to the inputted English sentence. The dense layer, essentially, computes a softmax on the outputs of each LSTM cell.

Python
model.add(Dense(LEN_RU, activation='softmax'))

LEN_RU is the size of the output vector (we will compute these parameters later on). The same for the variable LEN_EN.

Here's how our model should look so far:

Python
model = Sequential()
model.add(LSTM(512))
model.add(LSTM(512))
model.add(Dense(LEN_RU, activation='softmax'))
rms = optimizers.RMSprop(lr=0.001)
model.compile(optimizer=rms, loss='sparse_categorical_crossentropy')

We are using a Keras optimizer called RMSprop. It optimizes the gradient descent technique used for backpropagation.

We still need to add the embedding layer, as well as include an attention layer between the encoder and the decoder.

The embedding layer is created with Word2Vec.This is, in fact, a pretrained embedding layer. Now we need to generate the Word2Vec weights matrix (the weights of the neurons of the layer) and fill a standard Keras Embedding layer with that matrix.

We can use the gensim package to obtain the embedding layer automatically:

Python
from gensim.models import Word2Vec

Then, we create our Word2Vec embedding layer:

Python
model_w2v = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)

The embedding layer can then be retrieved as follows:

Python
model_w2v.wv.get_keras_embedding(train_embeddings=False)

We can call the model.summary() function to get an overview of our model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding_1 (Embedding)      (None, None, 100)         1200
_________________________________________________________________
lstm_1 (LSTM)                (None, 512)               1255424
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 8, 512)            0
_________________________________________________________________
lstm_2 (LSTM)                (None, 512)               2099200
_________________________________________________________________
dense_1 (Dense)              (None, 512)               262656
=================================================================
Total params: 3,618,480
Trainable params: 3,617,280
Non-trainable params: 1,200
_________________________________________________________________

Adding Attention Mechanism

Now we want to add an attention mechanism. We could write it from scratch, but a simpler solution is to use an existing Keras module, such as Keras self-attention.

Let’s import this module:

Python
from keras_self_attention import SeqSelfAttention

Now we will add the imported module between the two LSTM blocks:

Python
model.add(SeqSelfAttention(attention_activation='sigmoid'))

Our model is now complete.

Putting the Model Together

Here is the final code of our NN, coded in Keras:

Python
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import string
from numpy import array, argmax, random, take
#for processing imported data
import tensorflow as tf
import pandas as pd
#the RNN routines
from keras.models import Sequential
from keras.layers import Dense, LSTM, Embedding, RepeatVector
from keras.preprocessing.text import Tokenizer
from keras.callbacks import ModelCheckpoint
from keras.preprocessing.sequence import pad_sequences
from keras.models import load_model
from keras import optimizers
#optional if you want to generate statistical graphs of the DMT
#import matplotlib.pyplot as plt
#from keras.utils import plot_model
#import pydot

from gensim.models import Word2Vec
from gensim.test.utils import common_texts
from keras_self_attention import SeqSelfAttention


model = Sequential()

model_w2v = Word2Vec(common_texts, size=100, window=5, min_count=1, workers=4)
model.add(model_w2v.wv.get_keras_embedding(train_embeddings=False))
model.add(LSTM(512))
model.add(RepeatVector(8))

model.add(SeqSelfAttention(attention_activation='sigmoid'))

model.add(LSTM(512))
model.add(Dense(LEN_RU, activation='softmax'))
rms = optimizers.RMSprop(lr=0.001)
model.compile(optimizer=rms, loss='sparse_categorical_crossentropy')

#plot_model(model, to_file='model_plot4a.png', show_shapes=True, show_layer_names=True)

model.summary()

After we run the code, we get the following output:

[root@ids ~]# python3 NMT.py
Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding_1 (Embedding)      (None, None, 100)         1200
_________________________________________________________________
lstm_1 (LSTM)                (None, 512)               1255424
_________________________________________________________________
repeat_vector_1 (RepeatVecto (None, 8, 512)            0
_________________________________________________________________
seq_self_attention_1 (SeqSel (None, 8, 512)            32833
_________________________________________________________________
lstm_2 (LSTM)                (None, 512)               2099200
_________________________________________________________________
dense_1 (Dense)              (None, 512)               262656
=================================================================
Total params: 3,651,313
Trainable params: 3,650,113
Non-trainable params: 1,200

Although our model code works well as-is, consider the enclosing the model creation code in a function will make it easier to reuse. You don't have to do this - but to get an idea of how it might look, see the final translator code in the notebook we mentioned earlier.

Next Steps

Now our model is ready. In the next article, we’ll train and test this model. Stay tuned!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Martin_Rupp
Russian Federation Russian Federation
No Biography provided

Comments and Discussions

 
QuestionBy insight growing next step going Pin
Kurt Baldes 20218-Jun-21 22:34
MemberKurt Baldes 20218-Jun-21 22:34 
QuestionThrown out by by some TensorFlow internals? Pin
Kurt Baldes 20217-Jun-21 23:51
MemberKurt Baldes 20217-Jun-21 23:51 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.