Speed Execution Benchmark on Monte Carlo

Today I will try to benchmark the execution speed of several programming languages on a Monte Carlo example. This benchmark involves VBA, C++, C#, Python, Cython and Numpy vectorization. I will try to add progressively other programming languages so that this article will be more thorough.

Execution environment

All the chunks of code have been executed on the same computer so that the benchmark has sense. It’s a MacBook Pro with 8go RAM, i5 Dual-Core 2.4GHZ, and SSD 256go.

Python code is launch from the console, C# code in Visual Studio 2015, VBA in Excel 2016 and C++ in CodeBlocks with GCC compiler (with O2 flag for optimization).

Monte Carlo Recipe

I used Monte Carlo to price a Vanilla European Call. It doesn’t have a real sense to perform Monte Carlo on this kind of option but the point of the article is in the execution time.

The procedure is the following:

  • Compute the final value of your asset (no need for time discretization since the option isn’t path-dependant, only the final value of the asset is interesting for us here) with a Black Scholes dynamic under risk-neutral probability:

S_T = S_0 e^{(r-\frac{\sigma^2}{2})T + \sigma B_T}

where B_T = \sqrt{T}X

  • Apply your payoff function to the final value. Here the payoff is h(S_T) = max(S_T-K,0).
  • Repeat N times.
  • Take the mean of all your simulated payoffs and discount the result.

Review of the languages and packages used

Python is a byte-code interpreted language. Your .py code is first compiled into byte-code (.pyc file) and then is interpreted by the Python Virtual Machine.

Cython generates very efficient C code from your Python code only once and then compiles it with C/C++ compilers. It allows to really improve the time-execution of your code without knowing anything about C.

VBA (Virtual Basic for Application): VBA code is compiled into Microsoft P-code (intermediate language) and then executed by a Virtual Machine.

C# is compiled into an intermediate language which is compiled just-in-time (combination of ahead-of-time compilation and interpretation) into the native assembly language of the machine.

C++ is a compiled language. A compiler program translates your code into bytecode that can be processed by the CPU. It is famous for its speed.

Numpy is powerful Python package provides with very efficient N-dimensional arrays and sophisticated functions. It enables to use vectorization instead of loops for some operations which gives a terrible gain in time execution.

Performance assessment

In order to benchmark the performance of the languages, I wrote the same code in each language. I implemented a function Monte_Carlo_Vanilla_Code that runs with 10.000.000 iterations. For each language, I called this function 1, 10, 20, 50, 100 and 200 times and timed the whole execution time.

Code Snippets

Here are the chunks of code that I used in each programming language:

  • Python with and without vectorization:
import numpy as np
import math
import timeit

def Black_Scholes(S,R,Vol,T,X):
    return S*np.exp((R-(Vol**2)/2)*T + Vol*math.sqrt(T)*X)

#Uses Numpy Vectorization
def Monte_Carlo_Vanilla_Call_Vectorization(S_0,K,R,Vol,T,N_simu):
    X = np.random.randn(N_simu)

    #Compute by vectorization:
    assets_final_value = Black_Scholes(S_0,R,Vol,T,X)

    payoffs = np.maximum(assets_final_value-K,0)

    print("Call Price: {}".format(np.mean(payoffs)*math.exp(-R*T)))

#Uses Loop
def Monte_Carlo_Vanilla_Call(S_0,K,R,Vol,T,N_simu):
    X = np.random.randn(N_simu)
    sum_payoffs = 0

    for i in range(N_simu):
        final_value_asset = Black_Scholes(S_0,R,Vol,T,X[i])
        sum_payoffs += max(final_value_asset-K,0)

    print("Call Price: {}".format((sum_payoffs/N_simu)*math.exp(-R*T)))

S_0 = 100
K = 100
R = 0.01
T = 1
Vol = 0.2
N_simu = 10000000

start_time = timeit.default_timer()
for i in range(1):
print(timeit.default_timer() - start_time)
  • Cython code (.pyx file). (Observe C variable type declaration). This code is called into another Python file.
import cython
cimport numpy as np
import numpy as np
import math
import timeit

#Call this one with loop code
cdef Black_Scholes(double S,double R,double Vol,double T,double X):
   return S*np.exp((R-(Vol**2)/2)*T + Vol*math.sqrt(T)*X)

#Call this one with vectorization
cdef Black_Scholes_Vect(double S,double R,double Vol,double T,np.ndarray[double,mode="c",ndim=1] X):
   return S*np.exp((R-(Vol**2)/2)*T + Vol*math.sqrt(T)*X)

#Uses Numpy Vectorization
cpdef Monte_Carlo_Vanilla_Call_Vectorization(double S_0,double K,double R,double Vol,double T,int N_simu):
   cdef np.ndarray[double,mode="c",ndim=1] X = np.random.randn(N_simu)

   #Compute by vectorization:
   cdef np.ndarray[double,mode="c",ndim=1] assets_final_value = Black_Scholes_Vect(S_0,R,Vol,T,X)  

   cdef np.ndarray[double,mode="c",ndim=1] payoffs = np.maximum(assets_final_value-K,0)

   print("Call Price: {}".format(np.mean(payoffs)*math.exp(-R*T)))

#Uses loop
cpdef Monte_Carlo_Vanilla_Call(S_0, K, R, Vol, T, N_simu):
   X = np.random.randn(N_simu)
   sum_payoffs = 0
   final_value_asset = 0

   for i in range(N_simu):
      final_value_asset = Black_Scholes(S_0,R,Vol,T,X[i])
      sum_payoffs += max(final_value_asset-K,0)

   print("Call Price: {}".format((sum_payoffs/N_simu)*math.exp(-R*T)))
  • C# code:
static void Main(string[] args)
    //Chronometer :
    var chrono = new Stopwatch();
    double vol = 0.3;
    double R = 0.01;
    double S0 = 50;
    double K = 50;
    double T = 1; //1 year

    int nb_simulations = 10000000;

    for(int i = 0; i < 10; i++)
        Monte_Carlo_Vanilla_Call(S0, T, R, vol, K, nb_simulations);

    Console.WriteLine("Time : {0}", chrono.Elapsed);

static void Monte_Carlo_Vanilla_Call(double S,double T,double R,double vol,double K,int nb_simu)
    double final_value_asset;
    double sum_payoffs = 0;
    double final_price;

    //Random number generator:
    Normal norm = new Normal(0, 1);

    for (int i = 0; i < nb_simu; i++)
        final_value_asset = Generate_Asset(norm.Sample(), S, T, R, vol);
        sum_payoffs += Math.Max(final_value_asset - K, 0);

    final_price = (sum_payoffs/nb_simu) * Math.Exp(-R * T);
    Console.WriteLine("Price : {0}", final_price);

static double Generate_Asset(double X,double S,double dt,double R, double vol){
    return S * Math.Exp((R - (Math.Pow(vol, 2) / 2)) * dt + vol * Math.Sqrt(dt) * X);
  •  VBA code:
Option Explicit

Sub Monte_Carlo_Vanilla_Call(S_0 As Double, K As Double, R As Double, T As Double, Vol As Double, nb_simulation As Long)
    Dim sum_payoffs As Long
    Dim final_asset_value As Double
    Dim final_price As Double
    Dim X As Double
    Dim i As Long

    sum_payoffs = 0
    For i = 1 To nb_simulation
        X = Rnd()
        If X <> 0 Then
            X = Application.WorksheetFunction.NormInv(X, 0, 1)
            final_asset_value = Generate_Asset(S_0, R, T, Vol, X)
            sum_payoffs = sum_payoffs + WorksheetFunction.Max(final_asset_value - K, 0)
        End If
    Next i

    final_price = (sum_payoffs / nb_simulation) * Exp(-R * T)
    'Debug.Print (final_price)
    End Sub

Function Generate_Asset(S_0 As Double, R As Double, dt As Double, Vol As Double, X As Double) As Double
    Generate_Asset = S_0 * Exp((R - (Vol ^ 2) / 2) * dt + Vol * Sqr(dt) * X)
End Function

Sub main()
    Dim start As Date
    Dim S_0 As Double
    Dim R As Double
    Dim K As Double
    Dim T As Double
    Dim Vol As Double
    Dim nb_simulation As Long
    Dim i As Integer

    R = 0.01
    S_0 = 50
    K = 50
    Vol = 0.3
    T = 1
    nb_simulation = 1000000

    start = Now()
    For i = 1 To 100
        Call Monte_Carlo_Vanilla_Call(S_0, K, R, T, Vol, nb_simulation)
    Next i
    Debug.Print (Format(Now() - start, "hh:mm:ss"))
    'Debug.Print ("Done")
    End Sub
  • C++ Code (I apologize in advance for experienced C++ programmer, I do not master C++ at all):
#include <iostream>
#include <cmath>
#include <algorithm>
#include <random>
#include <chrono>

double Generate_Asset(double S,double R,double Vol,double dt,double X){
    return S*std::exp((R-(std::pow(Vol,2))/2)*dt + Vol*std::sqrt(dt)*X);

void Monte_Carlo_Vanilla_Call(double S_0,double R,double Vol,double T,double K,double nb_simulation){
    static std::default_random_engine e{};
    static std::normal_distribution<double> d{0, 1};

    double sum_payoffs = 0;
    double final_asset_value = 0;
    double X = 0;
    double call_price;

    for(int i=0;i<nb_simulation;++i){

        X = d(e);
        final_asset_value = Generate_Asset(S_0,R,Vol,T,X)
        sum_payoffs += std::max(final_asset_value-K,0.0);
     call_price = (sum_payoffs/nb_simulation)*std::exp(-R*T);
     std::cout << "Price : " << call_price << std::endl;

int main()
    double S_0 = 50;
    double K = 50;
    double R = 0.01;
    double Vol = 0.3;
    double T = 1;
    double nb_simulation = 10000000;

    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();

    for(int i=0;i<100;++i){

    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();

    std::cout << std::endl << "Execution time : " <<     std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << std::endl;
    std::cout << "End! Press a key to exit." << std::endl;
    return 0;

The results

This chart sums up all the results:

Benchmark results

Actually, the last line at the bottom of the chart is the overlap of:

  • “Cython Numpy Vectorization With Typing”
  • “Cython Numpy Vectorization Without Typing”
  • “Python Numpy Vectorization”

These last three are therefore the fastest way to perform Monte Carlo here, even faster than C++.

The strength of Numpy: 

  • Python with loop (orange line): More than 5.000 seconds to perform the 200 calls to the Monte Carlo function.
  • Python with Numpy vectorization (brown line): Less than 150 seconds to perform the 200 calls to the Monte Carlo function.

The difference is amazing and shows how powerful is Numpy!

The strength of Cython:

  • Python with loop (orange line):  More than 5.000 seconds to perform the 200 calls to the Monte Carlo function.
  • Cython with loop (green line): This is exactly the same code but generated in C code thanks to Cython. Huge improvement without any change!
  • Cython with loop and variable typing (purple line): This is the same code again except that each variable is typed (double, integer, Numpy array…).
    Almost 2 times faster just with Cython and variable declaration.
  • Cython with Numpy vectorization and variables declaration isn’t gaining a lot since Numpy is already really optimized.

C++ (pale green line) is also excellently ranked here and has the expected performance: a little bit less than 300 seconds to perform the 200 calls to the function.

And finally, VBA is slower than C# as expected. Anyway, this is a good performance for VBA, I wasn’t expecting it to be faster than Cython (I wonder if I haven’t done something wrong…).


This benchmark shows the relative performance of several programming languages. I wasn’t aware of the Python’ slowness, it’s pretty disappointing. However, some packages such as Numpy, Cython (and others like Jython) really boost it and make it faster than C++ code!

C++ is undeniably really fast. I understand now why all the books about Monte Carlo pricing deal with C++. However, it’s a language that seems to be pretty hard to master.

VBA and C# give pretty good results but I think it will be interesting to add other programming languages recognized for their swiftness. Therefore, I will try to add soon C, Java, Swift, and Javascript.

By the way, if anyone is interested to provide some results in another language I would be glad to display its code, its result and obviously mention his name.

Thank You!

2 thoughts on “Speed Execution Benchmark on Monte Carlo

  1. Thanks, very interesting !

    In fact we can see that hard computation tasks such as complex math function are the main things slowing down execution of such algorithms.

    By extension, two things are true:
    -performance depends on how well native math library is implemented in each languages you’re using in the benchmark,
    -perfs are also influenced by the general optimization of the binaries (for compiled languages) by the compilers used, and in the case of java or python it depends on the optimization of the virtual machines that run the byte code (we can see the major optimization between Java’s JVM and Google’s DVM (dalvik virtual machine) which is their own implementation of a java runtime environment for Android operating system).

    It might be interesting to see a benchmark of such an algorithm on the Apache Spark environment with Scala.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s