old-old-netflux-blog/content/posts/diving-into-python-toolchai...

5.7 KiB

title date tags draft
Diving into the Python toolchain 2020-06-13T00:12:15+02:00
python
audio
hacking
false

I've been inspired to build some audio software for a while and have been hacking a lot with WebAssembly, because in principle at least the potential for doing exciting or even incredible things in the browser with it is very clear (and it's also a great excuse to play with Rust and/or Emscripten). But I've also found myself wanting to build software for which I can't find the supporting libraries in Rust or even C++. This is especially the case for audio analysis, a field for which many of the most interesting libraries are written in Python - a language which doesn't and is unlikely to ever compile to wasm.

But inspired by the cool audino project, I decided to consider embracing Python on the backend - after all, in these days of containers and Kubenetes, it's trivial to deploy a backend with whatever language is the most appropriate one, even if it means we have to jump out of the browser's JavaScript environment to actually do the work.

So off I go to play with the first interesting-looking library that I find, which is librosa.

Getting an isolated environment for an application up-and-running in Python doesn't seem as easy as it could be. First I try pipenv which seems popular and promises to do exactly what I want on the surface - which is manage a simple bundle of isolated dependencies for a small throwaway project. But sadly it also seems inexplicably slow to use - spending minutes stuck on a Locking message when trying to install a couple of simple dependencies, for example. A little web searching suggests that it's possible to avoid these delays by passing --skip-lock to the pip install command, but while I don't have any idea what this implies exactly it's also not exactly an interface that inspires confidence.

So next I spend half an hour experimenting with Anaconda, which also looks promising on the surface but has unsettling aspects too - not least its threat of installing 3GB of scientific packages as a default, while also claiming to be able to create virtual environments and be a package repository for several languages(!). There is a minimal variant which claims to avoid the worst of this behaviour, but even this raises some questions, like how it interacts with the Python binary that I installed using asdf earlier today. So sighing slightly I make a note to return to this tool and investigate further later.

Then I try the Python stdlib venv tool, which does seem lightweight and efficient but also seems to do a bit more than I want, including adding Python binaries and various directories to the source tree. Other languages seem to have this nailed, and have made it trivial to set up a bundle of dependencies for a given project with a couple of simple commands. Hopefully I'm missing something obvious.

So I end up deciding to leave the dependency management fun for another day, and fall back to installing my libraries globally instead. librosa installed perfectly fine like this, and I swiped a simple example script from their tutorials:

import librosa

fname = librosa.util.example_audio_file()
pcm, sr = librosa.load(fname)

print("Got PCM data {} and samplerate {}".format(pcm, sr))

but when I run it, an unpleasant backtrace is spat out ending with ModuleNotFoundError: No module named 'numba.decorators'. The plot thickens once again. I spend some time getting confused about the librosa installation instructions which mention the numba library specifically in a way that suggests to me a possible linking or build issue, but a few attempts to uninstall and reinstall the two libraries in differing orders fail to make a difference. Eventually I discover a mention of the numba API being updated so that numba.decorators changes to numba.core.decorators which would neatly explain the error, so I check the librosa GitHub repo and find a mention of numba at version 0.43.0.1

This sounds like a good path to follow but in turn sparks a battle with pip to install a specific version of a library. Apparently, despite the version existing on the PyPI website, this isn't enough for pip to be able to install it. This, however, is quickly forgiven when I discover that pip does allow an arbitrary archive URL to be passed to it (so pip install https://github.com/numba/numba/archive/0.43.0.tar.gz works exactly as you'd expect). This is a nice touch and something that other dependency management tools would do well to support.

Now, I try to run my Python script again and it appears to find numba.decorators successfully! However, it then immediately falls over with a new and even less comprehensible error: numba/_dynfunc.cpython-38-x86_64-linux-gnu.so: undefined symbol: _PyObject_GC_UNTRACK. At this point it is almost time to give up and collapse into bed, but something that I've forgotten gives me the inspiration to bump numba up a couple of versions - this time to 0.48.0. I re-run the script and this time see the magic output:

Got PCM data [0. 0. 0. ... 0. 0. 0.] and samplerate 22050

Most of an evening has been spent on this and it's taken me back to the days of ad-hoc dependency management in Ruby's pre-Bundler era. But I've learned a lot about the Python toolchain in the process, and am now ready to start some audio hacking.


  1. The issue was already fixed in this PR. ↩︎