Get Intel® Parallel Studio XE for free today!
This article recently appeared in Issue 29 of Parallel Universe Magazine.
LeCloud is a leading video cloud service provider that deploys its content delivery network (CDN) nodes in more than 60 countries and regions, with 20 terabytes per second (Tbps) of available bandwidth to support its cloud services―including cloud-on-demand, content sharing, distribution and live broadcasting of commercial events, virtual reality, and more. LeCloud platforms support more than a million live events per year, hundreds of millions of device accesses a day, and millions of concurrent users per second. With the heavy demand for its high-performance public database services, LeCloud uses MySQL*, a popular open-source database, as the basis for its cloud service.
One key challenge for LeCloud is optimizing MySQL database performance―also an important topic for many database administrators (DBA) and operation and maintenance IT developers. Key performance concerns include:
- Queries per second (QPS)
- Transactions per second (TPS)
- Query response time
This article shows how we optimized MySQL using Intel® C++ Compiler and its Interprocedural Optimization (IPO) capability. Without having to modify any code, we compiled the MySQL source code and tested the performance indicators. The result? By using Intel C++ Compiler optimizations on MySQL in a set of test scenarios, performance improved between 5% and 35%―bringing developers another avenue to better MySQL performance.
Intel® C++ Compiler
Intel® C++ Compiler, a component of Intel® Parallel Studio XE, is a C and C++ optimizing compiler that takes advantage of the latest instruction sets and architectural features to maximize performance. Because the Intel compiler development team knows the Intel architecture so well, they can do specialized―and more effective―optimization for CPU features like new SIMD instructions and cache structure for Intel® CPUs compared to other compilers like GNU GCC* or Microsoft Visual C++*. Some of the optimization technologies used in the present case study include:
- Automatic vectorization
- Guided auto parallelism
- Interprocedural optimization
- Profile-guided optimization
The Intel compilers help users get the most benefit out of their Intel-based platforms.
The automatic vectorizer (also called the auto-vectorizer) is a component of the Intel® compiler that automatically uses SIMD instructions in the Intel® Streaming SIMD Extensions (Intel® SSE, Intel® SSE2, Intel® SSE3, and Intel® SSE4), Supplemental Streaming SIMD Extensions (SSSE3) instruction sets, and the Intel® Advanced Vector Extensions (Intel® AVX, Intel® AVX2, Intel® AVX512) instruction sets. The vectorizer detects operations in the program that can be done in parallel and converts the sequential operations to parallel. For example, the vectorizer converts the sequential SIMD instruction that processes up to 16 elements into a parallel operation, depending on the data type. The compiler also supports a variety of auto-vectorizing hints that can help the compiler generate effective vector instructions on the latest processors, including the Intel® Xeon® E5-2699 v4 used in this case study.
Guided Auto Parallelism
The Guided Auto Parallelism (GAP) feature of the Intel C++ Compiler offers advice to improve the performance of sequential applications by suggesting changes that will take advantage of the compiler’s ability to automatically vectorize and parallelize code as well as improve the efficiency of data operations.
Interprocedural Optimization (IPO)
This automatic, multistep process allows the compiler to analyze code to find interprocedural optimizations (i.e., optimizations that go beyond individual program subunits) within source files and across multiple source files. IPO is covered in more detail below, as it is a key component of this case study.
Profile-Guided Optimization (PGO)
In PGO, the compiler analyzes the code while it runs and takes advantage of this profile information during subsequent recompilation. This improves application performance by shrinking code size, reducing branch mispredictions, and reorganizing code layout to minimize instruction-cache problems.
Compiling with IPO
IPO is a key optimization technique in the Intel C++ Compiler. It does code profiling and static topological analysis based on both single-source and multi-source files, and then implements specific optimizations like inlining, constant propagation, and dead function elimination for programs that contain many commonly used small and medium-sized functions. Figure 1 shows the IPO process.
When you compile your source code with the IPO option, for single-file compilation, the compiler performs inline function expansion for calls to procedures defined within the current source file. For multi-file IPO, the compiler may perform some inlining according the multi-source code, such as inlining functions marked with inlining pragmas or attributes (GNU C and C++) and C++ class member functions with bodies included in the class declaration. After each source file is compiled with IPO, the compiler stores an intermediate representation of the source code in mock object files.
When you link with the IPO option, the compiler is invoked a final time to perform IPO across all mock object files. During the analysis process, the compiler reads all intermediate representations in the mock file, object files, and library files to determine if all references are resolved and whether or not a given symbol is defined in a mock object file. Symbols included in the intermediate representation in a mock object file for both data and functions are candidates for manipulation based on the results of whole program analysis.
Figure 1. IPO Compilation Process
Building MySQL with the Intel Compiler
The latest official version of MySQL (v5.6.27) was used in this case study. This section describes how to use Intel C++ Compiler to compile MySQL on Linux*.
Download the MySQL installation package:
tar –zxvf mysql-5.6.27.tar.gz
Compile MySQL with Intel C++ Compiler (Figure 2):
1. Install Cmake:
yum -y install wget make cmake gcc gcc-c++ autoconf automake zlib* libxml2* ncurses-devel libmcrypt* libtool-ltdl-devel*
2. Compile MySQL with Intel C++ Compiler: Set CC to icc, CXX to icpc, and enable IPO with the –ipo option.
Figure 2. Compiling MySQL with Intel C++ Compiler
Create MySQL Grant System Tables
useradd -M -g mysql mysql -s /sbin/nologin ;
chown -R mysql .
chgrp -R mysql .
./scripts/mysql_install_db --user=mysql --collation-server=utf8_general_ci
The purpose of this analysis is to do comparative performance testing between MySQL built with the GNU and Intel compilers. The performance metrics for online transaction processing (OLTP) include queries per second (QPS), and response time (RT). The test tool was Sysbench* (v0.4.12), a modular, cross-platform, multi-threaded benchmarking tool that is commonly used to evaluate database performance. Tables 1 and 2 show the test environment.
Table 1. Hardware Environment
Table 2. Software Environment
We installed two MySQL instances, 3308 and 3318, on the same Intel® SATA SSD disk, and used Sysbench for OLTP testing (Table 3). Before testing, we cleaned the operating system cache and disabled the MySQL query_cache. We created 10 MySQL database tables for testing, with one million records per table. 4, 8, 16, 32, 64, 128, and 512 test threads were used for random read OLTP. MySQL buffer_pool size was set as 2GB, 8GB, and 16GB. Each test was run three times and the average time was used as the final result.
Table 3. Test Steps
Performance Testing Results
The results of the Sysbench TPS test on MySQL compiled with the Intel and GNU compilers are shown in Table 4 and Figure 3. The number of threads and the MySQL buffer_pool size were varied. Across the board, MySQL compiled with the Intel compiler gives significantly higher QPS, especially as the number of threads increases. We see a 5% to 35% improvement in QPS for MySQL compiled with the Intel C++ Compiler (Figure 3).
Table 4. Queries per Second (higher is better)
Average Response Time
The results of the Sysbench average response time (RT) test on MySQL compiled with the Intel and GNU compilers are shown in Table 5 and Figure 4. RT is in milliseconds. The number of threads and the MySQL buffer_pool size were varied as in the QPS test. Once again, MySQL compiled with the Intel compiler gives superior performance, especially at higher numbers of threads.
Table 5. Average Response Time (in milliseconds, lower is better)
Figure 3. Queries per Second for Intel C++ Compiler versus GNU GCC
Figure 4. Response Time, Intel C++ Compiler versus GNU GCC
Maximizing MySQL performance is essential for DBAs, operators, and developers trying to reach a performance goal. This study demonstrates that compiling MySQL with the Intel C++ Compiler can significantly improve database performance. We found that the Intel C++ Compiler improved performance by 5% to 35%, with an average improvement of about 15%.
There are many factors that affect MySQL performance, including the MySQL configuration, the CPU, and the SSD. For example, we used the Intel Xeon processor E5-2620 V3, but upgrading to V4 should improve performance even further. Using a faster SSD, such as the Intel® DC P3700 instead of the Intel DC S3510 used in this article should also further improve performance.
- Intel® Media Server Studio Support
- MySQL* NDB Cluster 7.3–7.4 Reference Guide
- CMake Reference Documentation