diff --git a/content/posts/_index.md b/content/posts/_index.md index 1955dec..9aaf7d1 100644 --- a/content/posts/_index.md +++ b/content/posts/_index.md @@ -1,4 +1,5 @@ --- type: post +title: Posts url: / --- diff --git a/content/posts/diving-into-python-toolchain.md b/content/posts/diving-into-python-toolchain.md new file mode 100644 index 0000000..850661f --- /dev/null +++ b/content/posts/diving-into-python-toolchain.md @@ -0,0 +1,43 @@ +--- +title: "Diving into the Python toolchain" +date: 2020-06-13T00:12:15+02:00 +tags: ["python", "audio", "hacking"] +draft: false +--- + +I've been inspired to build some audio software for a while and have been hacking a lot with WebAssembly, because in principle at least the potential for doing exciting or even incredible things in the browser with it is very clear (and it's also a great excuse to play with Rust and/or Emscripten). But I've also found myself wanting to build software for which I can't find the supporting libraries in Rust or even C++. This is especially the case for audio analysis, a field for which many of the most interesting libraries are written in Python - a language which doesn't and is unlikely to ever compile to wasm. + +But inspired by the cool [audino](https://github.com/midas-research/audino/) project, I decided to consider embracing Python on the backend - after all, in these days of containers and Kubenetes, it's trivial to deploy a backend with whatever language is the most appropriate one, even if it means we have to jump out of the browser's JavaScript environment to actually do the work. + +So off I go to play with the first interesting-looking library that I find, which is [librosa](https://librosa.github.io/librosa/). + +Getting an isolated environment for an application up-and-running in Python doesn't seem as easy as it could be. First I try [pipenv](https://github.com/pypa/pipenv) which seems popular and promises to do exactly what I want on the surface - which is manage a simple bundle of isolated dependencies for a small throwaway project. But sadly it also seems inexplicably slow to use - spending minutes stuck on a _Locking_ message when trying to install a couple of simple dependencies, for example. A little web searching [suggests](https://github.com/pypa/pipenv/issues/1914) that it's possible to avoid these delays by passing `--skip-lock` to the `pip install` command, but while I don't have any idea what this implies exactly it's also not exactly an interface that inspires confidence. + +So next I spend half an hour experimenting with [Anaconda](https://docs.conda.io), which also looks promising on the surface but has unsettling aspects too - not least its threat of installing 3GB of scientific packages as a default, while also claiming to be able to create virtual environments and be a package repository for several languages(!). There is a [minimal variant](https://docs.conda.io/en/latest/miniconda.html) which claims to avoid the worst of this behaviour, but even this raises some questions, like how it interacts with the Python binary that I installed using [asdf](https://github.com/asdf-vm/asdf) earlier today. So sighing slightly I make a note to return to this tool and investigate further later. + +Then I try the Python stdlib `venv` tool, which does seem lightweight and efficient but also seems to do a bit more than I want, including adding Python binaries and various directories to the source tree. Other languages seem to have this nailed, and have made it trivial to set up a bundle of dependencies for a given project with a couple of simple commands. Hopefully I'm missing something obvious. + +So I end up deciding to leave the dependency management fun for another day, and fall back to installing my libraries globally instead. `librosa` installed perfectly fine like this, and I swiped a simple example script from their tutorials: + +```python +import librosa + +fname = librosa.util.example_audio_file() +pcm, sr = librosa.load(fname) + +print("Got PCM data {} and samplerate {}".format(pcm, sr)) +``` + +but when I run it, an unpleasant backtrace is spat out ending with `ModuleNotFoundError: No module named 'numba.decorators'`. The plot thickens once again. I spend some time getting confused about the librosa [installation instructions](https://librosa.github.io/librosa/install.html) which mention the `numba` library specifically in a way that suggests to me a possible linking or build issue, but a few attempts to uninstall and reinstall the two libraries in differing orders fail to make a difference. Eventually I discover a mention of the `numba` API being updated so that `numba.decorators` changes to `numba.core.decorators` which would neatly explain the error, so I check the librosa GitHub repo and find a mention of `numba` at version `0.43.0`.[^1] + +This sounds like a good path to follow but in turn sparks a battle with `pip` to install a specific version of a library. Apparently, despite the version [existing on the PyPI website](https://pypi.org/project/numba/0.43.0/), this isn't enough for pip to be able to install it. This, however, is quickly forgiven when I discover that pip does allow an arbitrary archive URL to be passed to it (so `pip install https://github.com/numba/numba/archive/0.43.0.tar.gz` works exactly as you'd expect). This is a nice touch and something that other dependency management tools would do well to support. + +Now, I try to run my Python script again and it appears to find `numba.decorators` successfully! However, it then immediately falls over with a new and even less comprehensible error: `numba/_dynfunc.cpython-38-x86_64-linux-gnu.so: undefined symbol: _PyObject_GC_UNTRACK`. At this point it is almost time to give up and collapse into bed, but something that I've forgotten gives me the inspiration to bump `numba` up a couple of versions - this time to `0.48.0`. I re-run the script and this time see the magic output: + +``` +Got PCM data [0. 0. 0. ... 0. 0. 0.] and samplerate 22050 +``` + +Most of an evening has been spent on this and it's taken me back to the days of ad-hoc dependency management in Ruby's pre-Bundler era. But I've learned a lot about the Python toolchain in the process, and am now ready to start some audio hacking. + +[^1]: The issue was already fixed in [this PR](https://github.com/librosa/librosa/pull/1107).