A template for reproducible papers
At the PINGA lab, we have been experimenting with
ways to increase the reproducibility of our research by publishing the git
repositories that accompany our papers.
You can find them on our
I’ve synthesized the experience of the last 4 years into a template in the
The template reflects the tools we’ve been using and the type of research that
- Most papers are proposing a new methodology rather than the analysis of a
- There is always an application to a dataset to show the method works. We
can’t always publish the data but we include it in the repository whenever we
- All papers include an implementation of the proposed method.
- Our code is usually written in Python and executed in Jupyter notebooks.
- The focus of the paper is usually on the methodology, not the code. As such,
the code is more of a proof-of-concept than a full blown application or
- The paper itself is written in LaTeX with the source usually included in the
This certainly won’t fit everyone’s needs but I hope that you can at least use
a few bits and pieces for inspiration.
Of course, the template code is open-source (BSD license) and you are free to
reuse it however you like.
The template includes a sample application to climate change data, complete
with a Python package, automated tests, an analysis notebook, a notebook that
generates the paper figure, raw data, and a LaTeX text.
Everything, from compilation to building the final PDF, can be done with a
We’ve been using different versions of this template for a few years and I’ve
been tweaking it to address some of the difficulties we encountered along the
- Running experiments in Jupyter notebooks can get messy when people aren’t
diligent about the execution order. It can be hard to remember to “Reset and
run all” before using the results.
- The execution was done manually so you had to remember and document in what
order the notebooks need to be run.
- Experimental parameters (e.g., number of data points, inversion parameters,
model configuration) were copied into the text manually. This sometimes led
to values getting out of sync between the notebooks and paper.
- We only had integration tests implemented in notebooks. More often than not,
the checks were visual and not automated. I think a big reason for this is
the lack of experience in writing tests within the group and setting up all
of the testing infrastructure (mainly how to use pytest and what kind of test
The latest update
addresses all of these pain points.
The main features of the new template are:
Makefiles to automate the workflow. You can build and test the
software, generate results and figures, and compile the PDF with a single
Makefile for building the manuscript PDF with extra rules for
running proselint, counting words,
and opening the PDF.
- A starter conda
for managing dependencies and making sure everyone gets the same version of
- Boilerplate instructions for downloading the code and reproducing the
Makefile for building the Python package, testing it with
pytest, running static code
checks (flake8 and
pylint), and generating results and figures from
- The code
Makefile can run the notebooks using
jupyter nbconvert to
guarantee that the notebooks are executed in sequential order (top to
bottom). I would love to use nbflow but
the SCons requirement puts me off a bit.
fine and the basic syntax is easier to understand.
- An example of using code to write experimental parameters in a
The file defines new variables that are used in the main text. This
guarantees that the values cited in the text are the ones that you actually
used to produce the results.
This last feature is my favorite. For example, the notebook
code/notebooks/estimate-hawaii-trend.ipynb has the following code:
tex = r"""
% Generated by code/notebooks/estimate-hawaii-trend.ipynb
with open('../../manuscript/hawaii_trend.tex', 'w') as f:
It defines the LaTeX commands
can be used in the paper to insert the values estimated by the Python code.
The commands are saved to a
.tex file that can be included in the main
Since this file is generated by the code, the values are guaranteed to be
If you want to use the template to start a new project:
Create a new git repository:
Pull in the template code:
git pull https://github.com/pinga-lab/paper-template.git master
Create a new repository on GitHub.
Push the template code to GitHub:
git remote add origin https://github.com/USER/REPOSITORY.git
git push -u origin master
Follow the instruction in the
Alternatively, you can use the “Import repository” option on GitHub.
I hope that this template will be useful to people outside of our lab.
There is definitely still room for improvement and I’m looking forward to
trying it out on my next project.
What other features would you like to see in the template?
I’d love to know about your experiences and workflows for computational papers.