# PyMC3 MCMC performance with and without Theano's NumPy BLAS warning (updated with PyMC v4 comparison!)

Does the warning ‘Using NumPy C-API based implementation for BLAS functions’ from Theano when using PyMC3 affect the performance of MCMC?

## Introduction

PyMC3 is a Python-based probabilistic programming language used to fit Bayesian models with a variety of cutting-edge algorithms including NUTS MCMC1 and ADVI2. It is not uncommon for PyMC3 users to receive the following warning:

WARNING (theano.tensor.blas): Using NumPy C-API based implementation
for BLAS functions.


where Theano3 is the autodiff engine that PyMC34 uses under-the-hood. The usual solution is to re-install the library and its dependencies following the operating system-specific instructions in the wiki.

I am working on a project using PyMC3 to fit large hierarchical models and was receiving the warning on the Linux server used to fit the full models, but not on my local MacBook Pro when experimenting with model designs (even after re-creating the conda environment). Therefore, I asked for more help on the PyMC Discourse with two questions:

1. How can I get this warning to go away?
2. Is the problem that Theano is warning about affecting its performance?

## Results

### Resource consumption

To begin, each model used about 53-56 GB of the available 64 GB of RAM. No real substantial difference there.

### Sampling rates

The first plot below shows the time-course for each chain, colored by the experimental condition. The plot below that displays the duration of each 5-draw interval indicating the rate of the sampling process over time for each condition followed by a plot that shows the same data on the same axes.

Each chain generally went through about 3 stages of rapid early tuning, slow tuning, and then rapid sampling post-tuning (the draws that will represent the approximate posterior distributions).

One exception to this is one of the chains from the “warning” condition that, comparatively, had an incredibly rapid early stage, prolonged slow stage, and a slower final stage than the other chains. (This chain took longer than 5 days to fit and my job timed-out before it could.) I similar result happens reliably when I use ADVI to initialize the chains (this may be a future post), so I think this is just a process of the randomness inherent to MCMC and not necessarily attributable to the Theano warning. Removing this chain shows how similar the results were between those remaining.

The sampling durations in each condition are also plotted as histograms (below).

### Summary table

Finally, the following is a table summarizing the sampling rates for each chain. Note that chain #2 of the “warning” condition never finished (>5 days).

Note that the average draw rate (in minutes per 5 draws) is the same for all chains during the “sampling” stage (except for the outlier chain).

total (hr.)draw rate (min.)
durationmeanstdmin25%50%75%max
chain groupchainstage
BLAS warning0tune30.99.311.20.92.52.610.140.5
sampling39.52.60.02.52.52.62.62.6
1tune35.910.812.50.82.72.710.542.6
sampling44.92.70.02.62.72.72.72.8
2tune73.722.214.30.110.020.239.240.5
sampling119.820.30.812.120.220.220.621.0
3tune28.08.48.70.82.73.210.642.0
sampling37.02.70.12.72.72.72.73.8
no warning4tune31.49.411.90.82.62.610.241.3
sampling40.02.60.02.62.62.62.62.6
5tune31.79.612.61.42.92.911.445.9
sampling41.42.90.02.82.92.92.92.9
6tune21.56.57.00.82.62.610.241.2
sampling30.22.60.02.62.62.62.62.6
7tune31.19.411.60.22.62.610.341.4
sampling39.82.60.02.62.62.62.62.6
PyMC v48tune12.83.84.10.41.31.56.119.2
sampling18.01.60.11.51.61.61.62.2
9tune25.17.610.20.31.52.69.247.1
sampling30.31.60.01.51.51.61.61.6
10tune20.66.28.70.21.51.89.547.2
sampling25.61.50.01.21.51.51.51.5
11tune18.55.66.60.21.51.69.431.1
sampling23.61.50.01.51.51.51.51.5

## Conclusion

My understanding of the inner-workings of PyMC and Theano is limited, so it is impossible for me to provide a confident final conclusion. From this experiment, it seems like PyMC3 performed equivalently with and without the warning message. That said, it is probably best to address the underlying issue to ensure optimal PyMC3 performance and behavior.

Update with addition of PyMC v4: PyMC v4 samples faster during both the tuning and sampling stages of MCMC. In addition, version 4 tends to have fewer long-duration steps in the tuning process, dramatically cutting down on overall runtime.

