Importance Sampling and Black Magic of Monte Carlo

Notes on Importance Sampling and Exponential Smoothing

Author

Eunki Chung

Published

April 16, 2023

Monte Carlo method

The Monte Carlo method is a computational method that approximates the target value by repetitive random sampling. The goal of the method is to evaluate a function g(X) where X is some random quantity. One naive approach is generating random samples, X1,X2,...Xn, from computer simulation using pseudo-random number generation, and taking the average of evaluated g(Xi). That is, E[g(X)]1ni=1ng(Xi) This is called the direct sampling method and is indeed powerful in many settings. However, it has a severe shortcoming when the function is related to the occurrence of rare events. Let’s assume we have a standard normal random variable XN(0,1) and want to evaluate a function g(X)=1(X>30). Note that E[1(X>30)]=P(X>30) so we know that the analytical solution of this problem is not zero. And also, notice that the direct estimation is essentially just counting the number of samples greater than 30 and calculating the frequency of that event.

Fortunately, we can easily generate standard normal samples using the Box-Muller algorithm, but it is almost impossible to get a sample greater than 30. Our direct MC estimate will indicate 0 until we get a single observation greater than 30, which will require a tremendous amount of samples unless we are very lucky. Even if we observe a desired sample with a meager chance, the variance of the estimation, hence the efficiency of the method, would still be poor.

Importance Sampling

Univariate case

One way to address this issue is using the Importance Sampling method. With the following simple algebra, we can convert the estimation to one that is amazingly more efficient than the direct method. E[g(X)]=g(x)fX(x)dx=g(x)12πexp(x22)dx=g(x)12πexp(x22+μxμ22μx+μ22)dx=g(x)12πexp((xμ)22μx+μ22)dx=g(x)12πexp((xμ)22)exp(μx+μ22)dx=g(x)exp(μx+μ22)fX(x)dxwhere XN(μ,1)=E[g(X)exp(μX+μ22)]1ni=1n{g(xi)exp(μxi+μ22)} Here, we’re sampling from a shifted normal variable X, evaluate g with samples xi and do a correction by multiplying exp(μxi+μ22) to retrieve the original target value. If we choose a μ=30, we can easily sample x>30 and this significantly improves the quality of the Monte Carlo estimate.

Exponential Smoothing

It turns out that there is a direct analog of this approach for the Multivariate case and even for the stochastic process, called Exponential Smoothing. For example, let’s consider the following Brownian Motion(BM) with drift. Xt=Bt+μtt0 and we have discrete time points s.t. 0=t0<t1<<tn=T.

Similar to the Univariate case, our objective is to evaluate the expectation, E[f(Xt1,Xt2,,Xtn)], where f:RnR is a bounded Borel function. Notice that since Bt is BM, Xt has independent increments. Let x0=0, and g:RnR s.t. f(x1,x2,,xn)=g(x1x0,x2x1,,xnxn1) Because of the independent increments, the joint density is a product of the marginal probability of the increments XiXi1, which is given by. XiXi1=Bti+μtiBti1μti1=BtiBti1+μ(titi1) Since BtiBti1N(0, titi1), XiXi1N(μ(titi1), titi1). Then the joint pdf is given by, i=1n12πtiti1exp(((xixi1)μ(titi1))22(titi1)) We can separate the constant term of C=(2π)n/2(t1t0)1/2(t2t1)1/2(tntn1)1/2 and the remaining exponent part, i=1nexp(((xixi1)μ(titi1))22(titi1))=i=1nexp((xixi1)22(xixi1)μ(titi1)+μ2(titi1)22(titi1))=i=1nexp((xixi1)22(titi1)+(xixi1)μμ22(titi1))=i=1nexp((xixi1)22(titi1))exp((xixi1)μμ22(titi1))=i=1nexp((xixi1)22(titi1))i=1nexp((xixi1)μμ22(titi1))={i=1nexp((xixi1)22(titi1))}exp(i=1n(xixi1)μμ22(titi1))={i=1nexp((xixi1)22(titi1))}exp(μi=1n(xixi1)μ22i=1n(titi1))={i=1nexp((xixi1)22(titi1))}exp(μxn12μ2tn)

Note that $C _{i=1}^n (- )$ is the joint density of (Bt1Bt0, Bt2Bt1,, BtnBtn1). Then we can convert the original expectation as follows.

E[f(Xt1,Xt2,,Xtn)]=E[f(Bt1,Bt2,,Btn)exp(μBT12μ2T)]

As in the Univariate case, we have converted the expection with respect to Bt instead of Xt, evaluate f(), then do a correction by multiplying exp(μBT12μ2T). One interesting fact is that the correction term Mt=exp(μBt12μ2t) is a martingale. E[Mt|Fs]=E[exp(μBt12μ2t)|Fs]=E[exp(μBtμBs+μBs12μ2(ts)12μ2s)|Fs]=E[exp(μBs12μ2s)exp(μ(BtBs)12μ2(ts))|Fs]=MsE[exp(μ(BtBs)12μ2(ts))] BtBsFs=MsE[exp(μ(BtBs))exp(12μ2(ts))]=Msexp(12μ2(ts))E[exp(μY)]where YN(0, ts)Notice E[exp(μY)] is the MGF of Y=Msexp(12μ2(ts))exp(12μ2(ts))=Ms

Change of measure

In the previous example, while we were deriving the joint pdf of the process, we only did algebra and used some trivial properties of BM. Surprisingly, however, we ended up having the martingale correction term Mt. It further turns out that Mt coincides with the Radon-Nikodym derivative we get if we apply Girsanov’s theorem to the same problem.