Multiprocessing considerations for your simulations#

In this tutorial we’ll explore how using a different number of cores/process for your cogsworth simulations can affect the runtime.

Learning Goals#

By the end of this tutorial you should know how to:

  • Test the runtime of different aspects of cogsworth simulations

  • Make choices about the optimal number of processes to use for your simulations

[1]:
import cogsworth
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import astropy.units as u
import time
[2]:
# this all just makes plots look nice
%config InlineBackend.figure_format = 'retina'

plt.rc('font', family='serif')
plt.rcParams['text.usetex'] = False
fs = 24

# update various fontsizes to match
params = {'figure.figsize': (12, 8),
          'legend.fontsize': fs,
          'axes.labelsize': fs,
          'xtick.labelsize': 0.9 * fs,
          'ytick.labelsize': 0.9 * fs,
          'axes.linewidth': 1.1,
          'xtick.major.size': 7,
          'xtick.minor.size': 4,
          'ytick.major.size': 7,
          'ytick.minor.size': 4}
plt.rcParams.update(params)
pd.options.display.max_columns = 999

Create a base sampled population#

First let’s sample a (small) population for testing in this tutorial. Let’s create a cogsworth Population with 500 binaries, but not do the stellar evolution or galactic orbit integration.

[14]:
p = cogsworth.pop.Population(500)
p.sample_initial_galaxy()
p.sample_initial_binaries()
[15]:
# this population has some initial conditions
p.initial_binaries
[15]:
index kstar_1 kstar_2 mass_1 mass_2 porb ecc metallicity tphysf mass0_1 mass0_2 rad_1 rad_2 lum_1 lum_2 massc_1 massc_2 radc_1 radc_2 menv_1 menv_2 renv_1 renv_2 omega_spin_1 omega_spin_2 B_1 B_2 bacc_1 bacc_2 tacc_1 tacc_2 epoch_1 epoch_2 tms_1 tms_2 bhspin_1 bhspin_2 tphys binfrac
0 0 0.0 0.0 0.226616 0.196968 7.902619 0.567668 0.008181 10630.704990 0.226616 0.196968 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
1 1 0.0 0.0 0.114457 0.089994 225.922400 0.508927 0.002960 8241.624761 0.114457 0.089994 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
2 2 0.0 0.0 0.302681 0.163874 27.543268 0.173288 0.002527 8820.695128 0.302681 0.163874 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
3 3 0.0 0.0 0.147824 0.133992 9.226240 0.208507 0.004398 8504.835938 0.147824 0.133992 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
4 4 0.0 0.0 0.359252 0.314032 87.137307 0.119351 0.003971 10780.070181 0.359252 0.314032 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
526 526 0.0 0.0 0.241499 0.166543 665.587041 0.358429 0.006649 8102.167199 0.241499 0.166543 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
527 527 0.0 0.0 0.699683 0.353945 1.898187 0.045507 0.030000 1656.614841 0.699683 0.353945 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
528 528 0.0 0.0 0.323788 0.258470 18135.512964 0.501732 0.012245 8690.954261 0.323788 0.258470 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
529 529 0.0 0.0 0.135099 0.096897 13377.016304 0.094308 0.023247 5933.253135 0.135099 0.096897 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5
530 530 0.0 0.0 0.169556 0.132036 2.453343 0.008602 0.002995 9042.664952 0.169556 0.132036 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5

531 rows × 39 columns

[16]:
# but no evolution yet
p._bpp is None
[16]:
True

A function for testing runtimes#

Now let’s make a quick function for testing the runtime of both the stellar evolution and the galactic orbit integration.

[18]:
def test_runtime(pop, processes=1):
    # make a copy of the population
    p = pop.copy()

    # update number of processes
    p.processes = processes

    # start a timer and perform the stellar evolution
    start = time.time()
    p.perform_stellar_evolution()
    end = time.time()
    stellar_runtime = end - start

    # same thing for orbit integration
    start = time.time()
    p.perform_galactic_evolution(progress_bar=False)
    end = time.time()
    galactic_runtime = end - start

    return [stellar_runtime, galactic_runtime]

Apply the function#

Let’s use this function to see how the runtime varies when using up to the 8 cores that my laptop has.

Note

This test isn’t quite fair because I should really have stopped running everything else on my laptop when I did it and perhaps not run them back to back - be better than me and have a controlled testing environment!

[19]:
processes = range(1, 9)
runtimes = np.zeros((len(processes), 2))
for i, proc in enumerate(processes):
    runtimes[i] = test_runtime(p, processes=proc)
[22]:
runtime_total = runtimes.sum(axis=1)

Plotting time!#

Now let’s make a plot of how the runtime changes as we change the number of processes. We can show the contributions from both stellar evolution and galactic orbit integration as well as the relative speedup of the total runtime.

[29]:
fig, ax = plt.subplots(figsize=(11, 8))

# area fill for the stellar evolution
ax.fill_between(processes, 0, runtimes[:, 0], edgecolor="none", lw=0.0,
                color="tab:purple", alpha=0.7, label="Stellar evolution")
# same thing for the orbit integration
ax.fill_between(processes, runtimes[:, 0], runtime_total, edgecolor="none", lw=0.0,
                color="tab:green", alpha=0.5, label="Galactic orbit integration")
# add some markers for the total runtime
ax.plot(processes, runtime_total,
        marker="o", markersize=np.sqrt(50 * (i * 0 + 1)**2), zorder=10-i, color="k", label="Total")

# label the number of processes and fix the bottom of y axis to start at 0
ax.set(xticks=processes, xticklabels=[str(p) for p in processes],
       xlabel="Number of processes", ylabel="Runtime [s]")
ax.set_ylim(bottom=0.0)

# add a legend
ax.legend(loc="upper right")

# add a second y-axis for speedup, make sure it's behind
right_ax = ax.twinx()
ax.set_zorder(right_ax.get_zorder()+1)
ax.set_frame_on(False)

# plot the relative speedup
right_ax.plot(processes, runtime_total[0] / runtime_total,
              marker="x", markersize=np.sqrt(100), zorder=-10, color="grey", linestyle="--")

# set some ticks and change the colour of the right side
right_ax.set(xticks=processes, xticklabels=[str(p) for p in processes],
             yticks=processes, yticklabels=[str(p) for p in processes])
right_ax.set_ylabel("Speedup relative to one process", color="grey")
right_ax.spines['right'].set_color('grey')
right_ax.tick_params(axis='y', colors='grey')

plt.show()
../../_images/tutorials_misc_runtime_14_0.png

Interpretation#

So what do we learn from this plot? Well firstly you can see the galactic orbit integration takes longer than the stellar evolution. This is partly due to our default assumptions, if we integrated shorter orbits (say only 10 Myr instead of potentially 12 Gyr) then this could be much faster.

We can also see that once we go beyond 3, maybe 4 processes the gains in runtime don’t keep up with the additional processes. This is because we have a pretty small population and the runtime is often dominated by a few complicated sources, so any additional processes are not going to help with the computation if it’s just a couple of binaries that are taking a while.

The exact number of processes is population dependent though and that’s why you should do your own tests to decide how many processes to use!!

For example, in a case of “here’s one I made earlier”, this is a bigger test of up to 128 processes on a 1000 binary population that I did more carefully on a computing cluster.

image0

In this case the turnover is closer to 16 processes and so you can see it’s important to try this yourself for your own population!

Wrap-up#

And that’s all for some simple runtime testing of how your simulations run with different numbers of cores. You can hopefully use this information to make decisions about how to run your simulations in the future!

Note

This tutorial was generated from a Jupyter notebook that can be found here.