In the ~4 years that I’ve spent on my PhD now, I’ve run hundreds, nay,
thousands of simulations. Research work is incredibly iterative. I (and I
assume others too) make small changes to their methods and then study how these
changes produce different results, and this cycle continues until a
proposed hypothesis has either been accepted or refuted (or a completely new
insight gained, which happens quite often too!).
Folders, and Dropbox? Please, no.
Keeping track of all these iterations is quite a task. I’ve seen multiple
methods that people use to do this. A popular method is to make a different
folder for each different version of code, and then use something like Dropbox
to store them all.
Since I come from a computing background, I firmly believe that this is not a
good way of going about it. It may work for folks—people I know and work with
use this method—but it is simply a bad way of going about it. This PhDComic
does a rather good job of showing an example situation. Sure, this is about a
document, but when source code is kept in different folders, a similar
situation arises. You get the idea.
Version control, YES!
If there weren’t tools designed to track and manage such projects, one could
still argue for using such methods, but the truth is that there is a plethora
of version control tools available
under Free/Open Source licenses. Not only do these tools manage projects,
they also make collaborating over source code simple.
All my simulation code, for example, lives in a Git repository (which will be
made available under a Free/Open source license as soon as my paper goes out
to ensure that others can read, verify, and build on it). The support scripts
that I use to set up simulations and then analyse the data they produce already
live here on GitHub,
for example. Please go ahead and use them if they fit your purpose.
I have different Git branches for different features that I add to the
simulations—the different hypothesis that I’m testing out. I also keep a
rather meticulous record of everything I do in a research journal in LaTeX that
also lives in a Git repository, and uses Calliope (a simple helper script to
manage various journaling tasks). Everything goes in here—graphs, images,
sometimes patches and source code even, and the deductions and other
My rather simple system is as follows:
- Each new feature/hypothesis gets its own Git branch.
- Each version of its implementation, therefore, gets its own unique commit
(a snapshot of code that Git saves for the user with a unique identifier and
a complete record of the changes that were made to the project, when they
were made and so on.)
- For each run of a snapshot, the generated data is stored in a folder that is
named YYYYMMDDHHMM (Year, month, day, time),
which, unless you figure out how to go back in time, is also unique.
- The commit hash + YYYYMMDD become a unique identifier for each code snapshot
and the results that it generated.
- A new chapter in my research journal holds a summary of the simulation, and
all the analysis that I do. I even name the chapter “git-hash/YYYYMMDDHHMM”.
I know that learning a version control system has a steep initial curve, but I
really do think that this is one tool that is well worth the time.
Using a version control system has many advantages, some of which are:
- It lets you keep the full history of your source code, and go back to any
- You know exactly what you changed between two snapshots.
- If multiple people work on the code, everyone knows exactly who authored
- These tools make changing code, trying out things, and so on, very very easy.
Try something out in a different branch, if it worked, yay, keep the branch
running; maybe even merge it to the main branch? If it didn’t make a note,
delete the branch, and move on!
- With services like GitHub, BitBucket, and GitLab, collaboration becomes
- Ah, and note, that every collaborator has a copy of the source code, so it
has been backed up too! Even if you work alone, there’s always another copy
on GitHub (or whatever service you use).
Here’s a quick beginners guide to using Git and GitHub:
There are many more all over the WWW, of course. Duckduckgo is your friend.
(Why Duckduckgo and not Google?)
What’s Sumatra about, then?
I’ve been meaning to try Sumatra out for a while now. What Sumatra does is
sort of bring the functions of all my scripts together into one well-designed
tool. Sumatra can do the running bit, then save the generated data in a
unique location, and it even lets users add comments about the simulation.
Sumatra even has a web based front end for those that would prefer a graphical
interface instead of the command line. Lastly, Sumatra is written in Python,
so it works on pretty much all systems. Note that Sumatra forces the use of a
version control system (from what I’ve seen yet).
A quick walk-through
python3 -m venv --system-site-packages sumatra-virtual
We then activate the virtual-environment, and install Sumatra:
source sumatra-virtual/bin/activate pip install sumatra
Once it finishes installing, simply mark a version controlled source
repository as managed by Sumatra:
cd my-awesome-project smt init my-awesome-project
Then, one can see the information that Sumatra has on the project, for
smt info Project name : test-repo Default executable : Python (version: 3.6.5) at /home/asinha/dump/sumatra-virt/bin/python3 Default repository : GitRepository at /home/asinha/Documents/02_Code/00_repos/00_mine/sumatra-nest-cluster-test (upstream: email@example.com:sanjayankur31/sumatra-nest-cluster-test.git) Default main file : test.py Default launch mode : serial Data store (output) : /home/asinha/Documents/02_Code/00_repos/00_mine/sumatra-nest-cluster-test/Data . (input) : / Record store : Django (/home/asinha/Documents/02_Code/00_repos/00_mine/sumatra-nest-cluster-test/.smt/records) Code change policy : error Append label to : None Label generator : timestamp Timestamp format : %Y%m%d-%H%M%S Plug-ins :  Sumatra version : 0.7.4
My test script only prints a short message. Here’s how one would run it using
# so that we don't have to specify this for each run smt configure --executable=python3 --main=test.py smt run Hello Sumatra World! Record label for this run: '20180512-200859' No data produced.
One can now see all the runs of this simulation that have been made!
smt list --long -------------------------------------------------------------------------------- Label : 20180512-200859 Timestamp : 2018-05-12 20:08:59.761849 Reason : Outcome : Duration : 0.050611019134521484 Repository : GitRepository at /home/asinha/Documents/02_Code/00_repos/00_mine/sumatra-nest- : cluster-test (upstream: firstname.lastname@example.org:sanjayankur31/sumatra-nest-cluster- : test.git) Main_File : test.py Version : 6f4e1bf05f223a0100ca6f843c11ef4fd70490f3 Script_Arguments : Executable : Python (version: 3.6.5) at /home/asinha/dump/sumatra-virt/bin/python3 Parameters : Input_Data :  Launch_Mode : serial Output_Data :  User : Ankur Sinha (Ankur Sinha Gmail) <email@example.com> Tags : Repeats : None -------------------------------------------------------------------------------- Label : 20180512-181422 Timestamp : 2018-05-12 18:14:22.668655 Reason : Outcome : Well that worked Duration : 0.05211901664733887 Repository : GitRepository at /home/asinha/Documents/02_Code/00_repos/00_mine/sumatra-nest- : cluster-test (upstream: firstname.lastname@example.org:sanjayankur31/sumatra-nest-cluster- : test.git) Main_File : test.py Version : 4f151a368b1fee1fa8f21026c3b6d2c6b2531da8 Script_Arguments : Executable : Python (version: 3.6.5) at /home/asinha/dump/sumatra-virt/bin/python3 Parameters : Input_Data :  Launch_Mode : serial Output_Data :  User : Ankur Sinha (Ankur Sinha Gmail) <email@example.com> Tags : Repeats : None
There’s a lot more that can be done, of course. I’ll quickly show the GUI
One can run the webversion using:
smtweb -p 8001 #whatever port number one wants to use
Then, it’ll open up in your default web-browser at http://127.0.0.1:8001/.
For each project, one can see the various runs, with all the associated
One can then add more information about a run. Sumatra already stores lots of
important information as the image shows:
Pretty neat, huh?
I run my simulations on a cluster, and so have my own system to submit jobs to
the queue system. Sumatra can run jobs in parallel on a cluster, but I’ve
still got to check if it also integrates with the queue system that our cluster
runs. Luckily, Sumatra also provides an API, so I should be able to write a
few Python scripts to handle that bit too. It’s on my TODO list now.
Please use version control and a Sumatra style record keeper
I haven’t found another tool that does what Sumatra does yet. Maybe Jupyter
notebooks would come close, but one would have to add some sort of wrapper
around them to keep proper records. It’ll probably be similar to my current
In summary, please use version control, and use a record keeper to manage and
track simulations. Not only does it make it easier for you, the researcher, it
also makes it easier for others to replicate the simulation since the record
keeper provides all the information required to re-run the simulation.
Free/Open source software promotes Open Science
<video controls=”controls” height=”390″ poster=”//static.fsf.org/nosvn/FSF30-video/fsf30-poster.png” width=”640″>
<source src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/FSF_30_720p.webm” type=”video/webm”>
<track default=”default” kind=”subtitles” label=”English” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.en.vtt” srclang=”en”>
<track kind=”subtitles” label=”Spanish” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_es.vtt” srclang=”es”>
<track kind=”subtitles” label=”French” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.fr.vtt” srclang=”fr”>
<track kind=”subtitles” label=”German” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.de.vtt” srclang=”en”>
<track kind=”subtitles” label=”русский” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.ru.vtt” srclang=”ru”>
<track kind=”subtitles” label=”italiano” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.it.vtt” srclang=”it”>
<track kind=”subtitles” label=”português” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.pt.vtt” srclang=”pt”>
<track kind=”subtitles” label=”српски” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.sr.vtt” srclang=”sr”>
<track kind=”subtitles” label=”fārsi” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.fa.vtt” srclang=”fa”>
<track kind=”subtitles” label=”nederlands” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.nl.vtt” srclang=”nl”>
<track kind=”subtitles” label=”magyar” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.hu.vtt” srclang=”hu”>
<track kind=”subtitles” label=”svenska” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.se.vtt” srclang=”se”>
<track kind=”subtitles” label=”română” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.ro.vtt” srclang=”ro”>
<track kind=”subtitles” label=”lietuvių” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.lt.vtt” srclang=”lt”>
<track kind=”subtitles” label=”hebrew” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.he.vtt” srclang=”lt”>
<track kind=”subtitles” label=”português do Brasil” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.pt-br.vtt” srclang=”pt-br”>
<track kind=”subtitles” label=”chinese” src=”http//phdcomics.com//static.fsf.org/nosvn/FSF30-video/captions/FSF_30_720p.zh-cn.vtt” srclang=”lt”>
(The original video is at the Free Software Foundation’s website.)
As a concluding plea, I request everyone to please use Free/Open source
software for all research. Not only are these available free of cost, they
provide everyone with the right to read, validate, study, copy, share, and
modify the software. One can learn so much from reading how research tools are
built. One can be absolutely sure of their results if they can see the code
that carries out the analysis. One can build on others’ work if the source is
available for all to use and change. How easy does replication become when the
source and all related resources are given out for all to use?
Do not use Microsoft Word, for example. Not everyone, even today, has access
to Microsoft software. Should researchers be required to buy a Microsoft
license to be able to collaborate with us? The tools are here to enable
science, not hamper it. Proprietary software and formats do not enable
science, they restrict it to those that can pay for such software. This is not
a restriction we should endorse in any way.
Yes, I know that sometimes there aren’t Free/Open source software
alternatives that carry the same set of features, but a little bit of extra
work, for me, is an investment towards Open Science. Instead of Word, as an
example, use Libreoffice, or LaTeX. Use Open formats. There will be bugs, but until we report
them, they will not be fixed. Until these Free/Open source tools replace
restricted software as the standard for science, they will only have small
communities around them that build and maintain them.
Open Science is a necessity. Researchers from the neuroscience community
recently signed this letter
committing to the use of Free/Open source software for their research. There
are similar initiatives in other fields too, and of course, one must be aware
of the Open Access movement etc.
I’ve made this plea in the context of science, but the video should also show
you how in everyday life, it is important to use Free/Open source resources.
Please use Free/Open source resources, as much as possible.
Source From: fedoraplanet.org.
Original article title: Ankur Sinha “FranciscoD”: Testing out Sumatra: a tool for managing iterations of simulations/analyses.
This full article can be read at: Ankur Sinha “FranciscoD”: Testing out Sumatra: a tool for managing iterations of simulations/analyses.