Monte Carlo simulations are a class of computational algorithms that rely on repeated random sampling to obtain numerical results. These simulations are often used when mathematical solutions are complex or even impossible. In this tutorial, we'll take a look at how to implement a simple Monte Carlo simulation in Python.
import numpy as np
import matplotlib.pyplot as plt
For this tutorial, we'll implement a simple Monte Carlo simulation to estimate the value of Pi. This is a common application of Monte Carlo simulations. The idea is to simulate random (x, y) points in a 2-D plane with domain as a square of side 1 unit. Imagine a circle inside this square. The ratio of the number of points that lie inside the circle and the total number of points will give us an approximation of Pi/4.
def estimate_pi(num_points):
inside_circle = 0
for _ in range(num_points):
x, y = np.random.uniform(0, 1), np.random.uniform(0, 1)
distance = np.sqrt(x**2 + y**2)
if distance <= 1:
inside_circle += 1
return 4 * inside_circle / num_points
In this function, we generate num_points random points in the unit square, and then check if the point falls within the unit circle. If it does, we increment inside_circle. Finally, we return our estimate for Pi. Now that we have defined our simulation, we can run it with a certain number of points.
np.random.seed(0)
print(estimate_pi(1000000))
3.140936
Here, we are running the simulation with 1,000,000 points. The np.random.seed(0) is used to ensure that our results are reproducible. For better understanding, it can be helpful to visualize the simulation. Below is a function that not only performs the simulation, but also plots the points:
def visualize_pi_estimation(num_points):
inside_x, inside_y, outside_x, outside_y = [], [], [], []
inside_circle = 0
for _ in range(num_points):
x, y = np.random.uniform(0, 1), np.random.uniform(0, 1)
distance = np.sqrt(x**2 + y**2)
if distance <= 1:
inside_circle += 1
inside_x.append(x)
inside_y.append(y)
else:
outside_x.append(x)
outside_y.append(y)
pi_estimate = 4 * inside_circle / num_points
fig, ax = plt.subplots()
ax.set_aspect('equal')
ax.scatter(inside_x, inside_y, color='b', label='Points Inside Circle')
ax.scatter(outside_x, outside_y, color='r', label='Points Outside Circle')
ax.legend()
plt.show()
return pi_estimate
np.random.seed(0)
print(visualize_pi_estimation(1000))
3.06
This will run the simulation with 1,000 points and also plot these points.
This technique can be used when the integral is too complex to solve analytically, or when you want an approximation.
We will estimate the value of the integral
$\int_0^1 e^{-x^2} dx$
This integral does not have a simple analytical solution. However, we can use a Monte Carlo method to estimate it.
Here is a step-by-step guide on how to do it:
We will sample a large number of points uniformly from the domain (0, 1).\ For each point, we will evaluate the function $e^{-x^2} $ \ The average of these function values will be an estimate of the integral.
import numpy as np
# Define the function to integrate
def f(x):
return np.exp(-x**2)
# Number of points to sample
num_points = 1000000
# Generate random points in the domain
x_values = np.random.uniform(0, 1, num_points)
# Evaluate the function at these points
f_values = f(x_values)
# The average of these values is an estimate of the integral
integral_estimate = np.mean(f_values)
integral_estimate
0.7465597348076702
Let's assume you are managing a project which consists of three tasks. Each task has a minimum, most likely, and maximum completion time (in days) estimated as follows:
Task 1: [5, 7, 9] \ Task 2: [10, 12, 14] \ Task 3: [7, 8, 10]\ The time estimates are based on the "Three Point Estimation Technique," which is a common approach in project management. This is a statistical method used in project management and analysis to predict the time and resources needed to complete a task. It takes into consideration the best-case, most likely, and worst-case scenarios, and then calculates an average of these estimates to provide a single value prediction.
To generate random values for each task duration, we can use a triangular distribution which is commonly used in such scenarios.
We will run a Monte Carlo simulation with 10000 iterations to estimate the total project completion time. For each iteration, we will:
Generate a random duration for each task based on the triangular distribution.\ Sum up the durations to get the total project time.\ Repeat the process to build a distribution of total project time.
Let's implement this in Python.
import matplotlib.pyplot as plt
# Define task estimates
task_estimates = {
"Task 1": [5, 7, 9],
"Task 2": [10, 12, 14],
"Task 3": [7, 8, 10]
}
# Number of iterations for the simulation
num_iterations = 10000
# Store all possible total durations
total_durations = []
for _ in range(num_iterations):
total_duration = 0
for task, estimates in task_estimates.items():
min_estimate, likely_estimate, max_estimate = estimates
# Draw a random sample from the triangular distribution
duration = np.random.triangular(min_estimate, likely_estimate, max_estimate)
total_duration += duration
total_durations.append(total_duration)
# Calculate average total duration
average_duration = np.mean(total_durations)
# Plot histogram of total durations
plt.hist(total_durations, bins=50, alpha=0.5)
plt.axvline(average_duration, color='red', linestyle='dashed', linewidth=1, label=f'Average: {average_duration:.2f}')
plt.legend()
plt.xlabel('Total Project Time (days)')
plt.ylabel('Frequency')
plt.title('Monte Carlo Simulation of Project Completion Time')
plt.show()
average_duration
27.347118705663824
The Monte Carlo simulation resulted in an average estimated project completion time of approximately 27.34 days.
The histogram shows the distribution of the total project time over 10,000 iterations of the simulation. As we can see, the total project time mostly lies between 25 and 30 days, but there are also possibilities of it taking less than 25 or more than 30 days.
This is the power of Monte Carlo simulations: they not only provide a single-point estimate (like the average), but also give a sense of the variability or uncertainty around that estimate.
In this case, as a project manager, you might use this information to communicate that, while you expect the project to take about 27 days, it could potentially take anywhere from around 25 to 30 days, depending on various factors.