## Scientific Python within a virtualenv: from 0 to $\infty$

### October 20, 2014

Hello there Internets! So you’re starting up with python for data analysis and all that, yes?

Here I outline the installation steps and requirements for configuring a python library installation using virtualenv and pip that can be used for scientific applications (number crunching functionality i.e. linear algebra, statistics .. along with quick plotting of data etc.).

Python tends to have somewhat obscure policies for library visibility, which can be intimidating to a beginner. Virtualenv addresses these concerns and allows to maintain self-contained python installations, thus simplifying maintenance. It amounts to a number of hacks (with a number of caveats described here), but I find it to be very effective nonetheless, if you really need Python libraries in your project. In particular, it saved me from Python Package Hell, and I hope it will streamline your workflow as well.

I do not assume much knowledge on the part of the reader, however you are welcome to ask for clarifications in the comments and I’ll reply ASAP. In this tutorial we address UNIX-like operating systems (e.g. Linux distributions, OSX etc.). The tags delimited by angular brackets, <> are free for the user to customize.

1) virtualenv : First thing to install. (If you have already installed it, skip to point 2).

Do NOT use the system Python installation, it leads to all sorts of inconsistencies. Either

• pip install virtualenv

OR “clone” (make a local copy) the github repository

2) create the virtualenv in a given directory (in this example the current directory, represented by . in UNIX systems):

• virtualenv .

This will copy a number of commands (e.g. python, pip), configuration files and setup environment variables within the <venv> directory.

Alternatively, the virtualenv can be made to use system-wide installed packages with a flag. This option might lead to inconsistencies. Use at own risk:

• virtualenv –system-site-packages .

3) Activate the virtualenv, which means parsing the activate script:

• source /bin/activate

As a result of this step, the shell prompt should change and display (<venv>)

4) Test the virtualenv, by verifying that pip and python refer to the newly-created local commands:

• which pip
• which python

should point to a /bin directory contained within the current virtualenv.

When you are done using the virtualenv, don’t forget to deactivate it. If necessary, rm -rf <venv> will delete the virtualenv, i.e. all the packages installed within it etc. Think twice before doing this.

5) Install all the things!

From now on, all install commands use pip, i.e. have the form pip install <package> , e.g. pip install scipy :

scipy (ships with numpy, so it is fundamental)

pandas (various helper functions for numerical data structures)

scikit-learn (machine learning libraries, can be handy)

matplotlib (plotting functions, upon which all python plotting is build)

pyreadline for tab completion of commands

ipython, esp. with browser-based notebooks. The install syntax will be

• pip install “ipython[notebook]”

bokeh (pretty plots)

ggplot for those who need R-style plots. The requirements for ggplot are

• matplotlib, pandas, numpy, scipy, statsmodels and patsy

6) Develop your scientific Python applications with this powerful array of technologies

7) Once you’re ready to distribute your application to third parties, freeze its dependencies using pip. This is another hack, but hey, we’re in a hurry to do science right? The following two statements represent the situation in which one needs to install the dependencies on a second computer, account or virtualenv.

• pip freeze > requirements.txt
• pip install -r requirements.txt

That’s it for now; comment if anything is unclear, or if you find errors, or would like to suggest alternative/improved recipes.

Ciao!