## Introduction

In this article, I coded a high performance Monte Carlo Simulation framework. To achieve high performance, I use both parallel processes and parallel multithreading in each process. MPI (message passing interface ) protocol is used to coordinate multiple process ( openmpi implementation of MPI is used in this article ), and C++ 11 threading library is used to implement multithreading. Since I come from financial industry, this Monte Carlo Framework is designed to price financial derivatives. I use static inheritance, in particular, curiously recurring template pattern to make the framework flexible enough to price any type of derivative with different kinds of payoff, while keeping high performance, and staying away from virtual function cost if it were to implement as run time dynamic inheritance.

## Multi Process and Multi Threading

Monte carlo simulation is very good candidate do parallel computing, since all paths are independent to each other, and we need run large number of path in order to converge. In order to achieve high performance, I use two levels of parallels. The first level is on parallel process. MPI interface provide communication between a set of process. First, I divide simulation paths of computation into different processes, then I continue to divide the paths in each process into multiple threads. After all threads done ( using thread join to wait for all threads), then I combine those results as the final results for this process. After done for all processes, then I combine those results coming out of processes by using MPI reduce operation to get final result of Monte Carlo simulation.

The code for MPI and initilization of monte carlo simulation as follows:

int main(int argc, char** argv)
{
std::chrono::duration<double> elapsed_seconds_0;
auto start_0 = std::chrono::high_resolution_clock::now();
int proc_id = 0;
int num_procs = 1;
int namelen;
float result_each_proc;
float result_sum_global;
char proc_name[MPI_MAX_PROCESSOR_NAME];
MPI_Init( &argc, &argv );
MPI_Comm_size( MPI_COMM_WORLD, &num_procs );
MPI_Comm_rank( MPI_COMM_WORLD, &proc_id );
MPI_Get_processor_name( pro_name, &namelen );
int num_path_per_process = SIM_NUM_PATH / num_procs;
Instrument Stock(100, 0.5, 0.01);
AsianCallOptionPayOff AsianPayOff( 100 );
DerivativeInstrument<AsianCallOptionPayOff> DerivedInstr(0.5, &Stock, &AsianPayOff);
MonteCarlo<AsianCallOptionPayOff> MonteCalc(STEP_NUM, num_path_per_process, &DerivedInstr,NUM_THREAD);
std::chrono::duration<double> elapsed_seconds;
auto start = std::chrono::high_resolution_clock::now();
elapsed_seconds_0 = start - start_0;
double dRes = MonteCalc.CalcMultiThreadPlainMC();
MPI_Allreduce ( &result_each_proc, &result_sum_global, 1, MPI_FLOAT, MPI_SUM, MPI_COMM_WORLd );
float final_result = result_sum_global / num_procs;
auto end = std::chrono::high_resolution_clock::now();
elapsed_seconds = end - start;
if ( 0 == proc_id )
{
std::cout << "MPI Initial" << num_procs << " processes spend" << elapsed_seconds_0.count() << "second" << std::endl;
std::cout << SIM_NUM_PATH << "number of " << STEP_NUM << "steps of simulation takes " << elapsed_seconds.count()<<"second" << std::endl;
}
MPI_Finalize();
return 0;

The following table shows time spent for using different number of process and thread in a 2.2GHz 32 core machine.

Thread/Process 1 2 5 10 20 32
1 149.176 74.785 29.726 14.916 10.351 10.136
2 74.698 37.351 15.561 10.419 8.538 7.591
5 29.803 15.040 9.500 7.609 7.530 6.926
10 11.967 9.035 7.298 7.008 6.850 6.757
16 12.470 9.454 7.151 6.822 6.880 6.758

From the performance table, we can conclude that adding either the number of process or the number of thread can increase the performance until reaching a certain limit. The host machine is 32 core machine, and we can not reach the limit by just using 32 process, and adding thread definitely improve the performance. In addition, the time to initialize MPI multi process increase as the number of the process increase.

The following table shows time spent for using different number of process and thread in a 2.2GHz 32 core machine.

Process 1 2 5 10 20 32
Time to
initialize processes 0.049 0.052 0.063 0.165 0.855 1.973

## Monte Carlo Simulation Framework

Monte Carlo Simulation are a broad class of computational algorithms that rely on repeated random sampling to obtain numerial results; typically one runs simulations many times over in order to obtain the distribution of unknown probabilistic entity. Monte Carlo Simulation can be used in numerical integration to calculate expected average, and this framework is designed to calculate expected average of financial derivative prices. The code example in this article use a struct to the stochastic underlying price the as the state variable. Users of this framework certainly can add more variable in this structure, in order to expand to multi-dimension with multiple random process with certain correlation variance. I use vector of State to define a path, and vector of paths to define all paths coming out of Monte Carlo Simulation.

struct State
{
State( float price ) : m_price ( price ) {}
float m_price;
};
typedef std::vector< State > Path;

The stochastic underlying process is coded here to follow the geometric Brownian motion with a risk free interest rate drift term and a stochastic term. The stochastic term dictate that the percentage change from previous state price comes from a random sample from a normal distribution with mean zero and the variance being underlying price variance multiply the time change.

for (int num = 0; num < lSimNum; num++)
{
pVecPath = &(m_vecPaths[pathindex+num]);
(*pVecPath)[0].m_price = InitialPrice;
S = InitialPrice;
fPathSum = 0;
for ( int i = 1; i < m_iSteps; i++ )
{
bPathValid = true;
number = distribution(generator);
volterm = voldt* number;
S = S * exp(drift + volterm);
(*pVecPath)[i].m_price = (*pVecPath)[i-1].m_price * exp( drift + volterm );
}
}

Different kinds of financial derivative have different types of payoff. The payoff types I code here are path independent option like vanilla call or put option, path check barrier like barrier option, and average underlying price along the path. like asian option. I use static inheritance, in particular, curiously recurring template pattern to make the framework flexible enough to price any type of derivative with different kinds of payoff, while keeping high performance, and staying away from virtual function cost if it were to implement as run time dynamic inheritance.

The actual implementation of the base payoff class is as follows:

template < class Derived>
class PayOff
{
public:
float CalcPayOff(float dUnderPrice)
{
return static_cast <Derived *> (this)->CalcPayOff(dUnderPrice);
}
bool CheckPathBarrier( float fPrice )
{
return static_cast < Derived * >(this)->CheckPathBarrier( fPrice );
}
void CalcSumPath( float fPrice, float * fRet )
{
return static_cast< Derived * > (this)->CalcSumPath( fPrice, fRet );
}
PayOffType m_type;
};

The follwing code shows how a European call option inheritate a payoff base class.

class EuropeanCallOptionPayOff : public PayOff <EuropeanCallOptionPayOff >
{
public:
EuropeanCallOptionPayOff(float Strike) : m_strike(Strike)
{
m_type = PathIndependent;
}
float CalcPayOff(float dPrice)
{
if (dPrice > m_strike)
return dPrice - m_strike;
else
return 0;
}
private:
float m_strike;
};

Derivative class is a template class with payoff as the template argument. The code is as follows:

template<typename DerivatedPayOff>
class DerivativeInstrument
{
public:
DerivativeInstrument(float Time, Instrument * pIns, PayOff<DerivatedPayOff> * pPO) : dTimeToMature(Time), pInstrument(pIns), pPayOff(pPO)
{
}
float dTimeToMature;
Instrument * pInstrument;
PayOff<DerivatedPayOff> * pPayOff;
};

## Points of Interest

When you download code, you can download openmpi head and library, and set up them correctly in make file, in order to make the code work. Let me know if you have any questions

## History

08/31/2014, initial publication

9/01/2014, Add code into this article

10/122014, Modify paper and code in order to generalize this framework