Click here to register and download your free 30-day trial of Intel® Parallel Studio XE.

Now data scientists can see significant speedups when running R* applications on the latest Intel® Xeon Phi™ processor family x200 (code-named Knights Landing). Intel® software tools such as the Intel® Data Analytics Acceleration Library (Intel® DAAL) are helping make this happen—with minimum effort from the developer.

The latest Intel Xeon Phi processor is a specialized platform for demanding computing applications. It brings several new technological breakthroughs, including:

- Socket form factor (self-boot CPU), as well as a coprocessor version
- A high bandwidth, on-package memory called Multi-Channel DRAM (MCDRAM) in addition to DDR4 memory
- The latest and most advanced vectorization technology: Intel® Advanced Vector Extensions (Intel® AVX-512) instruction sets
- Massive parallelism with up to 72 cores on each die, four threads per core
- 3+ TFLOPS for double-precision floating-point computation
- 6+ TFLOPS for single-precision floating-point computation

Coming with these technologies, Intel provides a set of software tools in the Intel® Parallel Studio XE suite to help developers take full advantage of Intel Xeon Phi processors. One of these tools is Intel DAAL, which offers preoptimized machine learning and data analytics algorithms. Realworld machine learning problems are usually ranked the highest in terms of CPU and memory demand. Intel Xeon Phi is an ideal platform for these types of problems. And Intel DAAL provides a quick way of building machine learning applications optimized for Intel Xeon Phi processors.

R is an open source project and software environment that provides the most complete set of algorithms for statistical analysis, data mining, and machine learning.^{1} R is extremely popular among data science practitioners. Without special tuning, however, R and all of its packages cannot take advantage of the high-performance features offered by Intel Xeon Phi processors. In fact, the out-of-the-box performance of R is expected to be poor on Intel Xeon Phi processors. This is because, in many cases, vanilla R would only use a single thread on a single Intel Xeon Phi processor core. That means it can only use 1/288th of the compute resource available on an Intel Xeon Phi processor (assuming 72 cores with four threads each). A single Intel Xeon Phi processor is roughly equivalent to an Intel® Atom™ processor, making it less powerful than a single Intel® Xeon® processor. The vanilla R also won’t be able to use other performance features, such as the advanced vectorization based on the Intel AVX-512 instruction sets and the high-bandwidth memory MCDRAM.

But there is an easy way for R programmers to tap into the Intel Xeon Phi processor’s impressive data-crunching abilities: integrating a preoptimized library into the R environment. Here, we chose a classic machine learning algorithm, the Naïve Bayes classifier. We show step by step how to build and run a Naïve Bayes classifier in R using Intel DAAL on an Intel Xeon Phi processor selfboot system. Then we compare our solution with a native implementation from the venerable R e1071 package.^{2}

## Naïve Bayes Classifier

The Naïve Bayes algorithm is a classification method based on Bayes’ theorem.^{3} It assumes that all features are independent of each other. Despite its simplicity, it can often outperform more sophisticated classification methods. It has been commonly used for document categorization, email spam detection, and more.

Given a feature vector \(X_{i}=(x_{i1},...,x_{ip}),i=1,...,n,\) where \(x_{ik}\) is the scaled frequency of the \(k\)-th feature observed in the \(i\)-th observation, \(p\) is the number of features. Also, given \(C=(C_{1},...,C_{d})\) a set of possible labels, the Naïve Bayes algorithm is based on:

That is, for a label \(C_{k1}\)

**Posterior probability** is directly proportional to (**Prior probability × Likelihood**)

In the training stage, a training data set with known labels for each observation is used to learn a model that contains parameters, such as the prior probability for each class (label) and the likelihood of all features. Then, in the prediction stage, for each new observation, the algorithm finds the maximum posterior probability for the observation and assigns the corresponding label to it.

### Naïve Bayes with R

A well-known Naïve Bayes classifier in R is provided by the package e1071: Misc Functions of the Department of Statistics, Probability Theory Group.^{2}

It has an easy interface with two functions: one for training a model and the other for applying the model:

1 library(e1071)
2
3 model <- naiveBayes(training_data, ground_truths)
4 result <- predict(model, new_data)

When we run this on an Intel Xeon Phi processor, the model training step takes more than 200 seconds for a data set of moderate size (100K observations, 200 features, and 100 classes). And the prediction step for a data set of the same size takes 6,272 seconds. That’s almost one hour and 45 minutes.

This kind of performance precludes any practical use of e1071 Naïve Bayes on Intel Xeon Phi processors. Now we will see how to quickly fix this using Naïve Bayes with Intel DAAL.

### Naïve Bayes with Intel® DAAL

Multinomial Naïve Bayes classifier is one of the classification algorithms that Intel DAAL provides.^{4} Along with superior performance, the implementation in Intel DAAL provides some features and flexibilities that package e1071 doesn’t. In particular, Naïve Bayes in Intel DAAL supports three processing modes in model training: batch processing, online processing, and distributed processing. The batch mode is the same as what package e1071 supports: the entire data set is processed all at once. The online mode supports a usage model where a data set too big to fit in memory all at once can be processed chunk by chunk, and a model is learned after all chunks are processed. The distributed mode supports distributed model training on a cluster. For the purpose of simplicity, here we are only using the batch processing mode. However, this methodology can also be applied to integrating the other two processing modes into R.

After going through the integration steps described below, we expect to use our new, Intel DAAL-enabled Naïve Bayes classifier like this:

1 # DAAL Naive Bayes
2 test <- loadData("traindata.csv", nfeatures)
3 model.daal <- nbTrain(test$data, test$labels, nclasses)
4 labels.daal <- nbPredict(model.daal, test$data, nclasses)

## Integrating Intel DAAL’s Naïve Bayes Classifier into R

Intel DAAL has a C++ programming interface (in addition to the Java* and Python* APIs). To use Intel DAAL’s Naïve Bayes algorithm with R, we have to wrap the training and prediction steps into C++ functions and then export these functions to R. The Rcpp package comes in handy for this purpose.^{5,6}

### Tools Setup: Rcpp and Related Packages

The Rcpp package is the de facto way of extending R with C++ code. Hundreds of other R packages use this package to accelerate computation with C++ implementation and to connect to other C++ projects. Rcpp provides seamless R and C++ integration. It allows the direct interchange of R objects between R and C++.

Rcpp can be installed from CRAN. We also need to install the inline package.^{7} This package allows compiling, linking, and loading C++ code directly from R code. Among the multiple methods supported by Rcpp to compile, link, and load C++ functions for use by R, we have found using the inline package in conjunction with Rcpp to be a good balance between ease of use and flexibility. Lastly, we need a C++ compiler. Typically, the Intel® C++ compiler is the preferred compiler for building and optimizing code on Intel Xeon Phi processors. But in this case we are not building a lot of C++ source code. We are linking prebuilt binaries (from Intel DAAL) into a small dynamic lib to be loaded by R. So any compiler will do. If you don’t have a default C++ compiler on your system, then you can do one of the following:

- On Windows*, install Rtools
- On Mac OS*, install Xcode from the app store
- On Linux*, sudo apt-get install r-base-dev or similar

### The Glue Code to Connect R with C++ Functions

The inline package provides a simple function cxxfunction, which takes the signature of a C++ function, the definition of the C++ function, and a plug-in object. The plug-in object is used to specify additional C++ header files and additional link lines that our project depends on. This becomes clear with an example (**Figure 1**).

Lines 5–9 in **Figure 1** create and register an Rcpp plug-in with R using the Rcpp plug-in maker facility. The plug-in allows us to specify dependence on external libraries. In this case, the dependency is Intel DAAL. The remaining part of the code snippet creates three R functions, `loadData`

, `nbTrain`

, and `nbPredict`

, for reading data from a CSV file, training a Naïve Bayes model, and predicting labels for new data, respectively. We use `cxxfunction`

to connect these three R functions to three C++ functions whose definitions are embedded in R code as three character strings called readCSV, train, and predict.

Next, we implement these C++ functions using Intel DAAL data structures and algorithms.

1 library(Rcpp)
2 library(inline)
3
4 # Create and register a Rcpp plugin
5 plug <- Rcpp:::Rcpp.plugin.maker(
6 include.before = "#include <daal.h> ",
7 libs = paste("-L$DAALROOT/lib/ -ldaal_core -ldaal_thread ",
8 "-ltbb -lpthread -lm", sep=""))
9 registerPlugin("daalNB", plug)
10
11 # R function for loading data and labels
12 loadData <- cxxfunction(signature(file="character", ncols="integer"),
13 readCSV, plugin="daalNB")
14
15 # R function for training a model
16 nbTrain <- cxxfunction(signature(X="raw", y="raw", nclasses="integer"),
17 train, plugin="daalNB")
18
19 # R function for scoring
20 nbPredict <- cxxfunction(signature(model="raw", X="raw", nclasses="integer"),
21 predict, plugin="daalNB")
22

Figure 1 - Glue code to create R functions from inline C++ code

### Reading in Data from a CSV File

We could have used R functions such as `read.csv()`

, `read.table()`

, or `scan()`

to easily read data from a CSV file. However, we would then have to convert the data to a representation recognized by Intel DAAL.

Intel DAAL uses NumericTables (a hierarchy of C++ classes) for in-memory data representation. Instead, we want to use Intel DAAL’s data source facility to load data and build NumericTables directly from it. Lines 13–16 in **Figure 2** define a data source connecting to a CSV file that contains the training data and the ground truths for the training data. We assume the last column of the CSV table displays the ground truths (labels). Lines 19–24 create Intel DAAL NumericTables to hold the data and the labels to be read. Line 27 loads the entire data set into the NumericTables. Before returning to the R space, we serialize the data NumericTable and the labels NumericTable into two pieces of raw bytes, and then return them in a List. Later, the C++ model training function will take the raw bytes and restore them back into NumericTables.

1 # load data
2 readCSV <- '
3 using namespace daal;
4 using namespace daal::data_management;
5
6 // Inputs:
7 // file - file name
8 // ncols - number of columns in file
9 std::string fname = Rcpp::as<std::string>(file);
10 int k = Rcpp::as<int>(ncols);
11
12 // Data source
13 FileDataSource<CSVFeatureManager> dataSource(
14 fname,
15 DataSource::notAllocateNumericTable,
16 DataSource::doDictionaryFromContext);
17
18 // DAAL NumericTables for data and labels
19 services::SharedPtr<NumericTable> data(
20 new HomogenNumericTable<double>(k-1, 0, NumericTable::notAllocate));
21 services::SharedPtr<NumericTable> labels(
22 new HomogenNumericTable<int>(1, 0, NumericTable::notAllocate));
23 services::SharedPtr<NumericTable> merged(
24 new MergedNumericTable(data, labels));
25
26 // Load data
27 dataSource.loadDataBlock(merged.get());
28
29 // Serialize NumericTables
30 InputDataArchive dataArch, labelsArch;
31 data->serialize(dataArch);
32 labels->serialize(labelsArch);
33 Rcpp::RawVector dataBytes(dataArch.getSizeOfArchive());
34 dataArch.copyArchiveToArray(&dataBytes[0], dataArch.getSizeOfArchive());
35 Rcpp::RawVector labelsBytes(labelsArch.getSizeOfArchive());
36 labelsArch.copyArchiveToArray(&labelsBytes[0], labelsArch.getSizeOfArchive());
37
38 // Return a list of RawVectors
39 return Rcpp::List::create(
40 _["data"] = dataBytes,
41 _["labels"] = labelsBytes);
42 '

Figure 2 - C++ function to read data from a CSV file and build Intel® Data Analytics Acceleration Library NumericTables

### Training a Naïve Bayes Model

When it comes to using any Intel DAAL algorithm in code, the programming model has an easy-to-follow sequence for putting things together:

- Create an algorithm object for a chosen algorithm and a chosen processing mode.
- Set input data using the input.set method of the algorithm object.
- Invoke the compute method on the algorithm object.
- Retrieve the result using the getResult method of the algorithm object.

Lines 25–28 in **Figure 3** show this sequence at play when implementing the C++ train function. Note that before this sequence we first deserialize the input (training data and the accompanying labels) and restore it into NumericTables. After this sequence, we serialize the result—the model object—into raw bytes. This way, the model can be passed to the predict function to be used for classifying new data.

1 # Naive Bayes: train a model
2 train <- '
3 using namespace daal;
4 using namespace daal::algorithms;
5 using namespace daal::algorithms::multinomial_naive_bayes;
6 using namespace daal::data_management;
7
8 // Inputs:
9 // X - training dataset
10 // y - training data groundtruth
11 // nclasses - number of classes
12 Rcpp::RawVector Xr(X);
13 Rcpp::RawVector yr(y);
14 int nClasses = Rcpp::as<int>(nclasses);
15
16 // Deserialize data and labels
17 OutputDataArchive dataArch(&Xr[0], Xr.length());
18 services::SharedPtr<NumericTable> ntData(new HomogenNumericTable<double>());
19 ntData->deserialize(dataArch);
20 OutputDataArchive labelsArch(&yr[0], yr.length());
21 services::SharedPtr<NumericTable> ntLabels(new HomogenNumericTable<int>());
22 ntLabels->deserialize(labelsArch);
23
24 // Train a model
25 training::Batch<> algorithm(nClasses);
26 algorithm.input.set(classifier::training::data, ntData);
27 algorithm.input.set(classifier::training::labels, ntLabels);
28 algorithm.compute();
29
30 // Get result
31 services::SharedPtr<training::Result> result = algorithm.getResult();
32 InputDataArchive archive;
33 result->get(classifier::training::model)->serialize(archive);
34
35 Rcpp::RawVector out(archive.getSizeOfArchive());
36 archive.copyArchiveToArray(&out[0], archive.getSizeOfArchive());
37 return out;
38 '

Figure 3 - C++ function to train a Naïve Bayes model

### Predicting Labels for New Data

**Figure 4** shows the C++ predict function. Again, the same sequence of code appears in lines 28–31. The result of prediction is an \(n\)×1 NumericTable of predicted labels, where n is the number of observations for our predictions. The last part of the code reads the labels off the resulting NumericTable and assembles them into an `Rcpp::IntegerVector`

object. This object then seamlessly slips into the R space and becomes an integer array.

1 # Naive Bayes: predict
2 predict <- '
3 using namespace daal;
4 using namespace daal::algorithms;
5 using namespace daal::algorithms::multinomial_naive_bayes;
6 using namespace daal::data_management;
7
8 // Inputs:
9 // model - a trained model
10 // X - input data
11 // nclasses - number of classes
12 Rcpp::RawVector modelBytes(model);
13 Rcpp::RawVector dataBytes(X);
14 int nClasses = Rcpp::as<int>(nclasses);
15
16 // Retrieve model
17 OutputDataArchive modelArch(&modelBytes[0], modelBytes.length());
18 services::SharedPtr<multinomial_naive_bayes::Model> nb(
19 new multinomial_naive_bayes::Model());
20 nb->deserialize(modelArch);
21
22 // Deserialize data
23 OutputDataArchive dataArch(&dataBytes[0], dataBytes.length());
24 services::SharedPtr<NumericTable> ntData(new HomogenNumericTable<double>());
25 ntData->deserialize(dataArch);
26
27 // Predict for new data
28 prediction::Batch<> algorithm(nClasses);
29 algorithm.input.set(classifier::prediction::data, ntData);
30 algorithm.input.set(classifier::prediction::model, nb);
31 algorithm.compute();
32
33 // Return newlabels
34 services::SharedPtr<NumericTable> predictionResult =
35 algorithm.getResult()->get(classifier::prediction::prediction);
36 BlockDescriptor<int> block;
37 int n = predictionResult->getNumberOfRows();
38 predictionResult->getBlockOfRows(0, n, readOnly, block);
39 int* newlabels = block.getBlockPtr();
40 IntegerVector predictedLabels(n);
41 std::copy(newlabels, newlabels+n, predictedLabels.begin());
42 return predictedLabels;
43 '

Figure 4 - C++ function to predict labels for new data using a trained model

### Putting All the Pieces Together

We can put all code shown in Figures 1–4 into a single script—for example, NaiveBayesClassifierDaal.R. It can be brought into an R environment using R’s source() function. Then the new functions are available as `loadData()`

, `nbTrain()`

, and `nbPredict()`

. Figure 5 shows how to use these functions in R. Also shown is how we us e `microbenchmark()`

to benchmark performance of the training and prediction steps.

Each time the script is sourced into R, a compile and link process automatically kicks off to build the C++ code into R extensions. This process does take time. Users who want to avoid this overhead should consider using Rcpp to write a dedicated R package.

A dedicated R package contains prebuilt dynamic libs such that no compiling and linking is needed when loading it into R. The exact same C++ code for using Intel DAAL we’ve used here can be used to build an R package. Writing R package is beyond the scope of this discussion. If you are interested, see the official *Writing R Extensions* manual for details.8

1 source("NaiveBayesClassifierDaal.R")
2
3 # DAAL Naive Bayes
4 test <- loadData("traindata.csv", nfeatures)
5 trainperf.daal <- microbenchmark(
6 model.daal <- nbTrain(test$data, test$labels, nclasses))
7 scoreperf.daal <- microbenchmark(
8 labels.daal <- nbPredict(model.daal, test$data, nclasses))

Figure 5 - Using the Naïve Bayes classifier built with Intel® Data Analytics Acceleration Library

## Performance Gains

Our solution leads to significantly faster model training and prediction. As described above, e1071 Naïve Bayes took 200+ seconds to train and 1 hour and 45 minutes to predict. **Now each of these two steps finishes within 0.25 seconds**.

We benchmarked our R extension implemented using Intel DAAL’s Naïve Bayes for data sets of different sizes. We ran the e1071 Naïve Bayes on the same data sets. Not surprisingly, our implementation achieves more than a 1,000x speedup for the training step, and up to a 30,000x speedup for the prediction step. **Chart 1** and **Chart 2** below show the speedup numbers for these data sets.

Chart 1. Speedup over e1071 Naïve Bayes classifier, training step

Chart 2. Speedup over e1071 Naïve Bayes classifier, prediction step

## Conclusion

The Intel Xeon Phi processor is a high-performance, massively parallel platform ideal for machine learning and data analytics workloads. However, mainstream software tools such as R that are popular in the data science community may have not yet been tuned to effectively run on Intel Xeon Phi processors. Here, we introduced a solution for R programmers who want to take advantage of Intel Xeon Phi processors right now.

The solution involves integrating Intel DAAL functions into an R environment by writing R extensions using Rcpp. Intel DAAL is a library offering ready-to-use algorithms for data analytics, machine learning, and deep learning. These algorithms have already been optimized for Intel Xeon Phi processors, as well as for a whole gamut of Intel® Xeon, Core™, and Atom™ processors. Our benchmarking result shows this solution has a vast performance advantage over the native R solution provided by the e1071 package. Our methodology immediately opens the door to allow many R applications to get the benefits of Intel Xeon Phi processors.

We used the multinomial Naïve Bayes algorithm as an example to illustrate these points. But the same methodology can be replicated to integrate other Intel DAAL algorithms into R. Intel DAAL provides a rich set of algorithms, including: linear regression, SVM, Naïve Bayes, classification with boosting, recommender system, clustering, deep neural networks, and many others.

See the Intel DAAL open source project on Github* >

## Configurations and Tools Used

System configurations used for benchmarking:

- Intel Xeon Phi self-boot system
- CPU: Intel Xeon Phi-D B0, 68 cores @ 1.40 GHz, 34 MB L2 cache
- Memory: 16 GB MCDRAM, 96 GB DDR4

- OS version: Red Hat Enterprise Linux* 7.2 (kernel 3.10.0-327.0.1.el7.x86_64, glibc 2.17-105.el7.x86_64)

Software tools used in this example:

- Intel DAAL 2017 Beta update 1
- R version 3.3.1 (Bug in Your Hair)
- Rcpp package version 0.12.5
- Inline package version 0.3.14
- e1071 package version 1.6-7
- g++ version 4.8.5

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. No license (express or implied, by estoppel or other wise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm.

This sample source code is released under the Intel Sample Source Code License Agreement.

## References

- The R* Project for Statistical Computing, r-project.org/.
- e1071: Misc Functions of the Department of Statistics, Probability Theory Group, David Meyer, cran.rproject.org/web/packages/e1071/index.html.
- Electronic Statistics Textbook, StatSoft, Inc., Tulsa, OK, 2013.
- Developer Guide and Reference for Intel® Data Analytics Acceleration Library, 2016. http://software.intel.com/sites/products/documentation/doclib/daal/daal-user-and-reference-guides/index.htm
- Rcpp: Seamless R* and C++ Integration, Dirk Eddelbuettel, cran.r-project.org/web/packages/Rcpp/index.html.
- Seamless R* and C++ Integration with Rcpp, D. Eddelbuettel, Springer, 2013.
- inline: Functions to Inline C, C++, Fortran Function Calls from R*, Oleg Sklyar, cran.r-project.org/web/packages/inline/index.html.
- Writing R* Extensions, cran.r-project.org/doc/manuals/r-release/R-exts.html.