Angry white grad student on the keyboard – the Monday edition

August 24, 2015

Academic publishers behave more like libraries (hosting knowledge, curating collections).
All the _actual work_ (intellectual/experimental, writing, proofreading, peer review, typesetting) is done on a voluntary basis by mostly tax-funded academics. Therefore publishers should

0. die in a fire if unwilling to change, become tax-funded public institutions otherwise
2. OR, but it’s a mutually exclusive scenario, publishers start _paying their suppliers_, like everyone else.

There, I said it.

You know why this doesn’t happen? Because academia is an ego- and jealousy-driven enterprise, and branding one’s work under prestigious logos is the only tangible* metric of success most academics can aspire to. We are nothing but neurotic shaved monkeys, deal with it.

edit: I’d like to deconstruct what I wrote above: is it any true? and if that’s the case, does it necessarily have to be so? I.e. can this be turned into a positive statement; what drives academia (I’m referring to its research aspect ony; let’s leave education aside fttb) and why? To drive the human spirit forward by expanding knowledge and insight into the workings of the tangible (or intangible? here’s looking at you, theoreticians) world. To form the people who do so into heralds of positive change.

What do paywalled journals have to do with this? Why do we accept being reduced to currency, by an unfair economic lock-in mechanism? (This is what makes us neurotic, I think …)

October 20, 2014

Hello there Internets! So you’re starting up with python for data analysis and all that, yes?

Here I outline the installation steps and requirements for configuring a python library installation using virtualenv and pip that can be used for scientific applications (number crunching functionality i.e. linear algebra, statistics .. along with quick plotting of data etc.).

Python tends to have somewhat obscure policies for library visibility, which can be intimidating to a beginner. Virtualenv addresses these concerns and allows to maintain self-contained python installations, thus simplifying maintenance. It amounts to a number of hacks (with a number of caveats described here), but I find it to be very effective nonetheless, if you really need Python libraries in your project. In particular, it saved me from Python Package Hell, and I hope it will streamline your workflow as well.

I do not assume much knowledge on the part of the reader, however you are welcome to ask for clarifications in the comments and I’ll reply ASAP. In this tutorial we address UNIX-like operating systems (e.g. Linux distributions, OSX etc.). The tags delimited by angular brackets, <> are free for the user to customize.

1) virtualenv : First thing to install. (If you have already installed it, skip to point 2).

Do NOT use the system Python installation, it leads to all sorts of inconsistencies. Either

• pip install virtualenv

OR “clone” (make a local copy) the github repository

2) create the virtualenv in a given directory (in this example the current directory, represented by . in UNIX systems):

• virtualenv .

This will copy a number of commands (e.g. python, pip), configuration files and setup environment variables within the <venv> directory.

Alternatively, the virtualenv can be made to use system-wide installed packages with a flag. This option might lead to inconsistencies. Use at own risk:

• virtualenv –system-site-packages .

3) Activate the virtualenv, which means parsing the activate script:

• source /bin/activate

As a result of this step, the shell prompt should change and display (<venv>)

4) Test the virtualenv, by verifying that pip and python refer to the newly-created local commands:

• which pip
• which python

should point to a /bin directory contained within the current virtualenv.

When you are done using the virtualenv, don’t forget to deactivate it. If necessary, rm -rf <venv> will delete the virtualenv, i.e. all the packages installed within it etc. Think twice before doing this.

5) Install all the things!

From now on, all install commands use pip, i.e. have the form pip install <package> , e.g. pip install scipy :

scipy (ships with numpy, so it is fundamental)

pandas (various helper functions for numerical data structures)

scikit-learn (machine learning libraries, can be handy)

matplotlib (plotting functions, upon which all python plotting is build)

pyreadline for tab completion of commands

ipython, esp. with browser-based notebooks. The install syntax will be

• pip install “ipython[notebook]”

bokeh (pretty plots)

ggplot for those who need R-style plots. The requirements for ggplot are

• matplotlib, pandas, numpy, scipy, statsmodels and patsy

6) Develop your scientific Python applications with this powerful array of technologies

7) Once you’re ready to distribute your application to third parties, freeze its dependencies using pip. This is another hack, but hey, we’re in a hurry to do science right? The following two statements represent the situation in which one needs to install the dependencies on a second computer, account or virtualenv.

• pip freeze > requirements.txt
• pip install -r requirements.txt

That’s it for now; comment if anything is unclear, or if you find errors, or would like to suggest alternative/improved recipes.

Ciao!

Visual poetry

December 21, 2013

“It is said that paradise with virgins is delightful, I find only the juice of the grape enchanting! Take this penny and the let go of a promised treasure, because the war drum sound is exhilarating only from a distance.” — Omar Khayyam (1048-1131), Iranian polymath and poet

The above is an example of a Ruba’i, a traditional Persian form of quatrain poetry. I find it beautiful on so many levels.

The size of what can be known

December 15, 2013

The Planck length is estimated at $1.616199(97) \times 10^{-35}$ meters, whereas the radius of the observable Universe (comoving distance to the Cosmic Microwave Background) is $46.6 \times 10^{9}$ light years, i.e. $4.41 \times 10^{26}$ meters.

Both represent the metric limits of what we can perceive, regardless of the observation technique: the Planck length corresponds to the smallest measurable distance, whereas the observable radius of the Universe corresponds to the most ancient observable radiation (the CMB is the redshifted light emitted at the end of the Inflationary Epoch).

The existence of universes whose Planck length is larger than the observable size of ours (or, whose universe is bounded by our Planck length) is not provable. A fractal nesting of turtles.

Some life-saving Ubuntu tips

December 2, 2013

• In case you manage to break your installation so badly that it won’t move much past the bootloader (say, when the contents of  /etc/init/ are read .. runlevel 2?), you may want to modify the boot command line in order to gain shell access (choose it and press e), by appending init=/bin/bash at the end of the line starting with ‘linux’.
• Your HD might be mounted in read-only mode at this stage, so you might want to remount it in read-write mode, like so: first note down the device name (e.g. /dev/sda3 ), and then call mount with the appropriate remount options: mount -o remount,rw /dev/sda3 .
• You might need to know a couple of vi commands, in order to edit the relevant configuration files (pray that you know what you’re doing). For example, i enters ‘insertion’ mode, x deletes a character, ESC returns vi in command mode at which point you can either close without saving with :q! or after saving with :wq
• Ubuntu 12 waits until the network interfaces listed in /etc/network/interfaces are brought up (see man ifup). One can override this setting by replacing start on (filesystem and static-network-up) or failsafe-boot with: start on (filesystem) or failsafe-boot  in /etc/init/rc-sysinit.conf .
• In general, having a proven working operating system on another partition/neighboring PC helps a lot. Happy sysadministration!

Area – Giro giro tondo (live 1977)

August 31, 2013

Area were an Italian progressive rock band, until the premature death of their frontman, Demetrio Stratos.
I find the sheer creativity and sense of freedom conveyed by their sound to be so refreshing.

Below, one of their albums. Enjoy!

Percolation threshold estimation

August 30, 2013

The first assignment for Algorithms 1 is an estimation for the percolation threshold in an n-by-n array of sites; the system is said to percolate whenever there exists an uninterrupted series of connections between the top “edge” and the bottom one.

We need a fast lookup for the neighborhood of any given site; the neighbors() method checks whether a site is in the middle or on the sides or corners of the grid:

def neighbors(ii, n):
"returns list of neighbors of ii in n-by-n grid"
l = []
if mod(ii, n) > 0:
l.append(ii-1)
if mod(ii-n+1, n) > 0 or ii == 0:
l.append(ii + 1)
if ii > n-1:
l.append(ii - n)
if ii < n*(n-1):
l.append(ii + n)
return l


I use two lists: L is a pointer array as in the previous post and O contains the state tags (‘#’ for closed, ‘.’ for open site). There are two extra sites, serving as roots for the top, resp. bottom sites.

def percolation_frac(L, O, sites):
for ii, isite in enumerate(sites):
O[isite] = '.'
if isite < n:
L[isite] = ridxTop
elif n*(n-1) < isite <= N:
L[isite] = ridxBot
neighb = neighbors(isite, n)
for inb in neighb:
if O[inb]=='.':
L = wqupc2(L, isite, inb)
if L[-1]==L[-2]:
""" percolation:
top and bottom virtual sites
belong to same component
"""
pf = float(ii) / N
return pf


As there is no analytical means of predicting when the percolation phase transition is going to occur, we just run the experiment in Monte Carlo mode (many times with randomized initialization), and average the per-run percolation threshold estimate.
Theory tells us that in a square lattice, this value lies around 0.5927 .
My implementation bumps into a few false positives, possibly due to the order in which the union operation is performed, thereby skewing the average more toward 0.6something. Need to look into this.