pam-unshare: a PAM module that switches into a PID namespace

15 April 2016

Today in my 10% time at PythonAnywhere (we’re a bit less lax than Google) I wrote a PAM module that lets you configure a Linux system so that when someone sus, sudos, or sshes in, they are put into a private PID namespace. This means that they can’t see anyone else’s processes, either via ps or via /proc. It’s definitely not production-ready, but any feedback on it would be very welcome.

In this blog post I explain why I wrote it, and how it all works, including some of the pitfalls of using PID namespaces like this and how I worked around them.

Why write it?

At PythonAnywhere we use a variety of tools to sandbox our users. To a certain extent, we’ve hand-rolled our own containerisation system using the amazing primitives provided by the Linux kernel.

One of the problems with our sandboxes right now is that they don’t allow listing of processes using normal tools like ps. This is because, for security, we don’t mount a /proc inside the filesystem visible from our users’ code. The reason for that is that we don’t want people to see each other’s processes, because — if you’re careless — there can be secret information on the command lines, and command lines are visible from /proc and thus from ps. Our one and only security incident so far came from an error in the system that handles this.

The right way to solve this kind of problem in Linux is to use a combination of PID namespaces and mount namespaces.


There are two kinds of namespaces we’re interested in for this module:

PID namespaces

As the docs say, “PID namespaces isolate the process ID number space, meaning that processes in different PID namespaces can have the same PID.” Allowing different processes to have the same PID isn’t important to us for this — but the isolation is what we want. We want the processes that a user uses when they log in to the system to be in a separate namespace to every other user’s.

Mount namespaces

These were the first kind of namespaces to be introduced into Linux, so they’re sometimes confusingly referred to simply as “namespaces”. Again, going to the docs: “Mount namespaces isolate the set of filesystem mount points, meaning that processes in different mount namespaces can have different views of the filesystem hierarchy.” This is useful because we want each of our process namespaces to have access to its own /proc. When you go into a process namespace, you may have a set of process IDs that are different to the external system. But if you have access to the external filesystem, then you can still see the /proc on the external filesystem — so, ps ax will show you processes outside.

What we need is to get our processes into both a PID namespace and a mount namespace, then umount /proc so that we don’t see the external filesystem’s one, then mount it again so that we see the one appropriate to our PID namespace.

This is actually pretty simple to do from the command line, if you have a recent version of Linux with linux-utils 2.23 or higher (for Ubuntu, that’s Vivid or later — or you can upgrade Trusty using this PPA from Ivan Larionov). If you’re on a Linux command line (as root) and you have the right version, you can try it out:

# unshare --pid -- /bin/sh -c /bin/bash
# echo $$

The first command is a slightly complicated way of getting into a PID namespace — unshare --pid on its own doesn’t work, for reasons that are still hazy in my mind… Anyway, once that’s done, we echo the PID of the current bash process, and we get 1 — so we’re definitely in our own process namespace. However, if you run ps ax you’ll see all of the processes in the parent PID namespace, because (as I said before) the /proc that we see in our filesystem is the one associated with the parent. Naturally, we can’t umount /proc because we’d be trying to umount the directory everyone else in the system is using — the system would complain that it’s busy. So the next thing is to switch into our own mount namespace, then umount our own private /proc, then mount a fresh one:

# unshare --mount
# umount /proc
# mount -t proc proc /proc
# ps ax
    1 pts/0    S      0:00 /bin/bash
   42 pts/0    S      0:00 -bash
   57 pts/0    R+     0:00 ps ax
# ls /proc
1          consoles   execdomains  ipmi       kpagecount     misc          schedstat  sys            version
42         cpuinfo    fb           irq        kpageflags     modules       scsi       sysrq-trigger  version_signature
58         crypto     filesystems  kallsyms   latency_stats  mounts        self       sysvipc        vmallocinfo
buddyinfo  devices    fs           kcore      loadavg        net           slabinfo   timer_list     vmstat
bus        diskstats  interrupts   keys       locks          pagetypeinfo  softirqs   timer_stats    xen
cgroups    dma        iomem        key-users  mdstat         partitions    stat       tty            zoneinfo
cmdline    driver     ioports      kmsg       meminfo        sched_debug   swaps      uptime

Awesome! We’re in our own namespace.


Now, if when we wanted to go into namespaces we had complete control over the code, the above would be entirely sufficient. For example, on PythonAnywhere we have web-based consoles. When someone connects to one of those, we have complete control over the code that is executed before they can start typing in. We could do the two unshare commands, then the /proc remount, then su to the appropriate user account, and then we’d be done.

But we don’t always have control over this code path. For example, people can log in using ssh. And controlling what’s done when someone does that is the domain of PAM.

PAM is Pluggable Authentication Modules. A program can link with PAM and hand over all of its authentication to it. For example, when you ssh in, the ssh daemon asks PAM to authenticate your credentials.

PAM itself delegates the authentication process to a set of modules that are implemented as shared libraries. For example, there’s one to do normal Unix authentication using /etc/passwd or nsswitch — but you could also have ones to do biometric authentication or whatever.

The directory /etc/pam.d contains configuration files saying which auth modules should be used for each PAM client app — what to use to auth ssh, what to use to auth sudo, and so on, along with some common stuff for everything. The syntax is, frankly, vile, but it’s just about understandable if you put your mind to it.

Anyway, that’s all this got to do with our problem? Well, PAM has four kinds of plugins:

  1. Authentication management modules, which handle checking people’s credentials.
  2. Account management modules, which can allow/disallow access even for people who’d be otherwise authorised, based on other factors (eg. time of day).
  3. Authentication token management modules which do things like allowing people to change their passwords.
  4. Session management modules, which do session setup and teardown stuff. A standard module of this type is pam_env, which sets up environment variables.

The last one kind of modules is the place where we can hook in our code. There’s already a pam-chroot, which is a session management module that puts the user into a chroot jail. So my goal with this module was essentially to write something like that which did the same kind of thing, but for process namespaces.


Here’s a minimal PAM session module that just prints stuff when people enter and leave a session (for example, when their su session starts, and when it ends):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <security/pam_modules.h>

PAM_EXTERN int pam_sm_open_session(pam_handle_t *pamh, int flags, int argc, const char **argv) {
    printf("pam_basic pam_sm_open_session\n");
    return PAM_SUCCESS;

PAM_EXTERN int pam_sm_close_session(pam_handle_t *pamh, int flags, int argc, const char **argv) {
    printf("pam_basic pam_sm_close_session\n");
    return PAM_SUCCESS;

Save it as pam_basic.c and you can compile it with this:

gcc -c -fPIC -fno-stack-protector -Wall pam_basic.c
ld --shared -o pam_basic.o -lpam

…then install it like this:

sudo cp /lib/security/

…and enable it by adding this line towards the end of /etc/pam.d/su (before the @includes):

session required

Then try suing to another user. You’ll see the “open_session” and “close_session” messages as you enter and exit the sued environment.

Enter the namespaces

So, you’d think that getting this to work with PID namespaces would be really simple; just make the appropriate system calls in the pam_sm_open_session function to switch to a new PID namespace, then to a new mount namespace, then umount and then mount /proc, and you’re all set. The system function to switch into a new namespace is even called unshare, just like the command-line tool.

But, of course, it’s a little bit more complicated than that. It comes down to processes.

When you make the unshare system call to enter a PID namespace, your current process’s PID namespace is unaffected. Instead, the new namespace is used for any child processes you create using (eg.) fork. When you spin off your first child process after calling unshare, then that process is the “owner” of the PID namespace — kind of like init is for the machine as a whole.

By contrast, the unshare for mount namespaces switches you into a new namespace right away.

Now, when you’re doing an su, your PAM module is executed in-process by su, before it spins off the child process that will handle the user-switched session. So you can do the two unshares in there, and you’ll wind up with a child process that has its own mount and PID namespaces. But that will still have the external system’s /proc mounted, so ps ax will still show all processes. No problem — you can also umount /proc inside the PAM code. Now the user can’t do ps at all.

But the re-mounting of /proc can’t happen in the PAM process, because it’s not in the new PID namespace. Remember, only its children will be. If we were to do the re-mount in the PAM process, we’d still get the /proc for the parent PID namespace.

So the trick is to do the re-mount in a child process. But the child process that’s spun off by su is out of our control; it’s a shell or whatever the user specified. Even worse, the child process will be run as the user we’re suing to, and only root can mount /proc.

OK, you might think — perhaps, after setting things up so that the su process, thanks to the PAM module, is in the right mount namespace, and its children will be in the right PID namespace, we could umount /proc, then spin off a short-lived child process to do the re-mount of /proc, then when it’s exited, continue?

What happens when you do that is that the PID namespace dies when your short-lived child process exits. Remember, the first child process you create after doing the unshare to enter the PID namespace is the “init” equivalent. When it dies, the PID namespace dies with it (and the kernel kills all of its child processes). (BTW I think this is why, when you kill the process you’ve specified in a docker run command, all of its child processes die — even if you’ve detached them.)

My solution to this is a bit of a hack. I spin off a child process, which, being in a fresh PID namespace, will have PID 1. This is our parent process, our “init”, and when it exits, the PID namespace will be shut down. But it’s running as root, so it can mount /proc We know that the next process to be started in the namespace will have the PID 2. So, the child process mounts /proc, then waits until it sees a process with PID 2 — then it waits for that process to die:

        while (kill(2, 0) == -1 && errno == ESRCH) {
            // short-lived busy wait
        while (kill(2, 0) != -1 && errno != ESRCH) {
            // long-lived, poll twice a second

(If you’re wondering why I’m using kill(pid, 0) and polling, rather than waitpid for the process to die, it’s because process 2 isn’t a child of process 1, and you can only use waitpid with your own child processes.).

This seems to work fine! Here’s the complete source code of the current version, annotated. GitHub repo here.

#define _GNU_SOURCE

#include <syslog.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>

#include <sched.h>

#include <sys/mount.h>

#include <security/pam_modules.h>

The standard import-y stuff. The only points of note are the #define _GNU_SOURCE, which is needed to use the unshare function, and the #define PAM_SM_SESSION, which sets things up so that PAM knows we’re writing a session management module.

static void _pam_log(int err, const char *format, ...) {
  va_list args;

  va_start(args, format);
  openlog("pam_unshare", LOG_PID, LOG_AUTHPRIV);
  vsyslog(err, format, args);

A nice wrapper around syslog, shamelessly stolen from pam-chroot.

PAM_EXTERN int pam_sm_open_session(pam_handle_t *pamh, int flags, int argc, const char **argv) {

So this is our entry point when a PAM session is started:

    const char *username;
    if (pam_get_user(pamh, &username, NULL) != PAM_SUCCESS) {
        _pam_log(LOG_ERR, "pam_unshare pam_sm_open_session: could not get username");
        return PAM_SESSION_ERR;
    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: start", username);

Get the username of the person we’re suing to, or who we’re sshing in as, or whatever. Useful for logging.

    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: about to unshare", username);
    int unshare_err = unshare(CLONE_NEWPID | CLONE_NEWNS);
    if (unshare_err) {
        _pam_log(LOG_ERR, "pam_unshare pam_sm_open_session: %s: error unsharing: %s", username, strerror(errno));
        return PAM_SESSION_ERR;
    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: successfully unshared", username);

This does both of the unshares; the CLONE_NEWPID means that our child processes will be in their own PID namespace, and the CLONE_NEWNS put the current process, and all of its future children, into a new mount namespace.

    if (access("/proc/cpuinfo", R_OK)) {
        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: no need to umount /proc", username);
    } else {

If we’re already in a situation where we don’t have /proc then we don’t want to blow up when we try to umount it, so this is a simple guard against that…

        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: about to umount /proc", username);
        int umount_err = umount("/proc");
        if (umount_err) {
            _pam_log(LOG_ERR, "pam_unshare pam_sm_open_session: %s: error umounting /proc: %s", username, strerror(errno));
            return PAM_SESSION_ERR;
        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: successfully umounted /proc", username);

And here we do the umount if we need to.

    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: about to kick off a subprocess", username);
    int pid = fork();

We’ve kicked off our subprocess:

    if (pid == 0) {

If we’re in the new child process…

        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: in subprocess, about to mount /proc", username);
        if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL)) {
            _pam_log(LOG_ERR, "pam_unshare pam_sm_open_session: %s: subprocess: error mounting /proc: %s", username, strerror(errno));
        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: in subprocess, successfully mounted /proc", username);

Do the umount.

        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: in subprocess, about to busy-wait for second child", username);
        while (kill(2, 0) == -1 && errno == ESRCH) {
        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: in subprocess, second child has appeared, switching to slow-poll", username);

        while (kill(2, 0) != -1 && errno != ESRCH) {
        _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: in subprocess, done waiting, exiting", username);


The do the wait for PID 2.

    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_open_session: %s: done", username);
    return PAM_SUCCESS;

This is run if we’re not in the child process — just continue as normal.

PAM_EXTERN int pam_sm_close_session(pam_handle_t *pamh, int flags, int argc, const char **argv) {
    const char *username;
    if (pam_get_user(pamh, &username, NULL) != PAM_SUCCESS) {
        _pam_log(LOG_ERR, "pam_unshare pam_sm_close_session: could not get username");
        return PAM_SESSION_ERR;
    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_close_session: %s: start", username);
    _pam_log(LOG_DEBUG, "pam_unshare pam_sm_close_session: %s: done", username);
    return PAM_SUCCESS;

And that, of course, is just a dummy pam_sm_close_session, which needs to be there for completeness.

That’s basically it.

What’s next?

I’m pretty pleased with how this worked out (especially given that I didn’t really understand PAM or namespaces when I started working on this stuff this morning). But it’s not quite what we need. We already have some pretty powerful code that sets up sandboxed filesystems, and this wouldn’t be compatible with the module as I’ve written it. Possibly we’ll simply use the unsharing portion of this, and then use another mechanism to handle the remounting of /proc.

But I figured it might be worth putting this code out there, just in case anyone else is interested in how PAM and namespaces interact, and what some of the pitfalls — and their workarounds — are.

Comments welcome!


Many thanks to Ed Schmollinger for pam-chroot, which was the inspiration for all this, and to Jameson Little for simple-pam, which was simple enough that I had the confidence to start off coding a PAM module.

SHA-1 sunset in Chromium, and libnss3

6 August 2015

This post is a combination of a description of a Chrome bug (fixed in May), a mea culpa, and an explanation of of the way HTTPS certificates work. So there’s something for everyone! :-)

Here’s the situation — don’t worry if you don’t understand all of this initially, a lot of it is explained later. Last year, the Chromium team decided that they should encourage site owners to stop using HTTPS certificates signed using the SHA-1 algorithm, which has security holes. The way they are doing this is by making the “padlock” icon in the URL bar show that a site is not secure if it has a certificate that expires after the end of 2015 if either the certificate itself is signed with SHA-1, or if any of the certificates in its chain are. I encountered some weird behaviour related to this when we recently got a new certificate for PythonAnywhere. Hopefully by posting about it here (with a bit of background covering the basics of how certificates work, including some stuff I learned along the way) I can help others who encounter the same problem.

tl;dr for people who understand certificates in some depth — if any certificate in your chain, including your own cert, is signed with multiple hashing algorithms, then if you’re using Chrome and have a version of libnss < 3.17.4 installed, Chrome's check to warn about SHA-1 signatures, instead of looking at the most-secure signature for each cert, will look at the least-secure one. So your certificate will look like it's insecure even if it's not. Solution for Ubuntu (at least for 14.04 LTS): sudo apt-get install libnss3. Thank you so much to Vincent G on Server Fault for working out the fix.

Here’s the background. It’s simplified a bit, but I think is largely accurate — any corrections from people who know more about this stuff than I do would be much appreciated!

Public/private keys

Most readers here probably have a decent working knowledge of asymmetrical cryptography, so I’m going to skip a lot of detail here; there are excellent primers on public key encryption all over the Internet and if you need to know more, google for one that suits you.

But if you just want to get through this post, here’s the stuff you need to know: public and private keys are large numbers, generated in public/private pairs. Each of them can be used on its own to encrypt data. Stuff that is encrypted with the private key can be decrypted with the public key, and vice versa.

It is almost impossibly unlikely that a particular bit of data encrypted with one private key would be the same as the same data encrypted with a different private key. So, if I want to prove that I sent you something, and you already have a copy of my public key (let’s ignore how you got hold of it for now), I can send you the data encrypted with my private key, and if you can decrypt it then it’s pretty much guaranteed that it was me who sent it — or at least it was someone who has a copy of my private key.

Furthermore, we can extend this to a concept of digital signatures. If I want to send you some data and to prove that it came from me, I can use a hash function to reduce that data down to a short(ish) number, then I can encrypt that hash with my private key. I then send you the data, with the encrypted hash tacked onto the end as a signature. To verify that it really came from me, you hash the data using the same algorithm, then decrypt the signature using my public key. If they match, then you know I really did sign it.

This has a couple of advantages over simply encrypting the whole thing using my private key — the most relevant for this post being that the data itself is in the clear: we’re only using the encryption to prove who provided it.


When you want to run an HTTPS site, you need to get a certificate. This is basically some data that states the domain that’s covered by the certificate and who owns it, and a public key. It basically claims “the owner of the private key associated with this public key is the person that this data relates to”. It also has some data saying “this is who says that this claim is true” — the issuer. So, for the certificate that you bought from, say, Joe’s SSL Certificates, the issuer will be some identifier for that company.

So, the question is, how do we stop people from just issuing themselves certificates saying that Joe has said that the private/public key pair they’ve just generated is the correct one for The certificate is digitally signed using Joe’s SSL Certificates’ private key. So, assuming that the browser has Joe’s SSL Certificates’ public key, and it trusts Joe to only sign certificates that he really knows are OK, it just uses Joe’s public key to validate the signature.

Browsers come with a bunch of public keys installed as ones they should trust (actually, some rely on a list provided by the operating system). They’re called “root CAs”, and in this case, we’re saying that Joe’s SSL Certificates is one of them. In the real world, maybe not every browser in the world does trust the specific issuer who signed your certificate.

Certificate chains

What happens under these circumstances is that Joe gets his own certificate. This includes his public key, and is signed by someone else. So, the browser receives your certificate and Joe’s when they visit your website. They check your certificate against Joe’s, using the public key in Joe’s certificate, and then they check Joe’s against the certificate they have stored for whoever signed Joe’s one.

And obviously this can go on and on; perhaps Joe’s certificate was signed by Amy, who your browser doesn’t trust… so your web server has to send your certificate, and Joe’s certificate, and Amy’s certificate, and Amy’s certificate is signed by someone the browser does trust and we’re done.

This could in theory go on pretty much indefinitely, with hundreds of certificates being sent, giving a chain of trust from your certificate up to someone the browser really does trust — that is, back to an issuer and their associated public key that came packaged with Chrome or Firefox or whatever browser is being used. But in practice, chains are normally between two and five certificates long.

So, if you’ve ever set up SSL for a domain on PythonAnywhere or on your own server, now you know what all that stuff with the “certificate chain” or “bundle” was. You were putting together a set of certificates that started with your own one, then went from there across a sequence of signatures and public keys to one that all web browsers trust by default.

Signature hashing algorithms

One thing we’ve glossed over a bit until now is the hashing algorithm. It’s really important that different certificates hash to different numbers. Let’s imagine we had a somewhat naive algorithm:

def hash(certificate):
    return 4

I could get a certificate issued for my own domain, which would include as its signature the number 4 encrypted by my issuer’s private key. Because I know that every other certificate in the world also hashes to the number 4, I can just change my certificate so that it says that it relates to, keeping the same public key, and it will still pass the signature validation browsers do. This means that if I can somehow trick your computer (through some kind of clever network hackery) into thinking that my website is, then I can present my hacked certificate when you visit, and your browser will accept it and show you a green padlock in the URL bar. This is obviously not a Good Thing.

This is where we can actually start talking about SHA-1. It’s a hashing algorithm designed by the NSA back in 1995. It’s no longer regarded as being very good — initial problems started surfacing in 2005. This is for cryptographic values of “not very good” — it’s still pretty much impossible to produce a valid certificate that would hash to the same value as one that you’re trying to impersonate. But it’s possible in theory, and has been for quite a while, so it’s time to move on to SHA-2. SHA-2 is also an NSA creation — it’s worth noting that the Snowden revelations don’t seem to have done the hashing algorithm’s reputation any harm, and it’s still regarded as the way to go.

(Just to confuse matters a bit, SHA-2 is actually a family of hash functions, called SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. So if someone is talking about something hashed with SHA-256, they could equally, if less precisely, say that it was hashed with SHA-2. I’ll try to use SHA-2 in the remainder of this blog post to stop my own brain from melting when I reread it…)

Sign it once, sign it twice

So there are different hash algorithms that can be used to sign things. But that obviously could cause problems — given a particular signature, how does the person checking the signature know which one was used? The simple fix is to add an extra field to the signature saying which algorithm was used.

And what if the checker doesn’t know how to perform the particular hash that the signature requires? After all, this is the Internet, and there are computers running software of all different ages knocking around. The answer is that each certificate can have multiple signatures, all using different hashing algorithms but encrypted with the same private key (and so, decryptable with the same public key). It’s simple to do — hash the data once using SHA-1, encrypt that with your private key, and store that as the SHA-1 signature. Then hash the original data again using SHA-2, encrypt the result with the same private key, and then store that as the SHA-2 one.

Once that’s done, a browser trying to validate a certificate can go through the signatures, and find the most secure signature it understands the hashing algorithm for. Then it can check that one and ignore the rest. Old software that doesn’t understand the latest and greatest hashing algorithms has lower security than new software, but it’ll still have something. And new software can stay ahead and always choose the newest algorithm.

The sunset

So, SHA-1 bad, SHA-2 good. Google quite understandably wanted to reduce the number of sites using certificates using the old, broken encryption. So they decided that when your browser loads a website, it will look along the chain and see if there are any certificates there that are SHA-1 only. If it’s expiring soon (as in, before the end of 2015), they’ll accept it anyway. That means that no-one with an old SHA-1 certificate was forced to upgrade quickly. But if it expires later than that, then they show a “broken security” warning. If you click on that, it tells you that one of the certificates in the chain uses SHA-1 so it’s not secure.

The problem

Phew! Finally I can tell you what triggered this whole infodump :-)

We recently got a new certificate for PythonAnywhere. Our current one expires in August 2015, so we needed a new one for another year. I was checking it out prior to pushing it to our live service, and got the “broken security” warning. “Tsk”, I thought, “what a useless certificate issuer we have!” A bit of googling around led me to pages that seemed to be claiming that our issuer had a habit of issuing SHA-1 certs and you had to kick them to get an SHA-2 one. So I sent them a grumpy email asking for an SHA-2 version of our cert, or whichever one in the chain it was. (At this point I didn’t realise that a certificate could have multiple signatures, and — I think — the version of Chrome I was using to explore the certificate chain on the installed certificate on the test servers only showed me the SHA-1 signatures when I checked.)

Then a client provided us with a new cert for their own domain. We installed it, and *bang* — they got the same warning. I let them know about what the problem appeared to me to be. But after a couple of back-and-forwards with their certificate issuers, mediated by them (and I have to thank them for their patience in this) I started doubting myself. My computer was showing the SHA-1 errors in Chrome. A Crunchbang virtual machine I was using, likewise. But the client themselves couldn’t see it. They checked on Firefox, but Mozilla aren’t doing this sunsetting thing, so that was expected. But then they checked in Chrome on Windows, and they didn’t get it. The same Chrome version as I was running, but no warning.

The clincher for me was when my colleague Glenn checked it in his browser, running on Ubuntu, just like me. Same Chrome version… no error. There was obviously something wrong with my own machine! A whole bunch of Googling later and I found this answer to a Server Fault post, by Vincent G.

There’s an Ubuntu library, libnss3, described as “Network Security Service libraries”. In version 3.17.4, there was a fix to a bug described as “NSS incorrectly preferring a longer, weaker chain over a shorter, stronger chain”. It looks like this was pushed live in May.

I use i3 as my window manager, which means that the normal Ubuntu “you have updates you need to install” stuff doesn’t happen, so I need to update my OS manually. It looks like it was a while since I did that… (Cue spearphishing attacks.)

I updated, and suddenly both our certificate and the client’s looked OK. Sincere apologies emailed to both clients and to our respective CAs…

So, just to reiterate what happened… both we and our clients were issued with new certificates. These expired after the end of 2015. Each certificate in the chain up to a root cert was signed with SHA-2 hashing, and also (for backward compatibility) was also signed with SHA-1. When loaded into a Chrome with no buggy libraries, the browser would look along the chain and recognise that every certificate had an SHA-2 signature, so it would decide it was fine. But in my version with the buggy libnss3, it would look along the chain and spot the SHA-1 signatures. It would erroneously decide to ignore the SHA-2 ones, and would report the certificate as broken.

The moral of the story? Keep updating your system. And if something looks broken, check it on as many OS/browser combinations as possible… On the other hand, when select really is broken, it’s a real pain to debug.

Does #EUVAT make accepting bitcoins impossible for EU-based digital services businesses?

19 December 2014

Earlier on today I blogged a description of what we had to do at PythonAnywhere to handle the upcoming EU VAT (~= sales tax) changes for digital services. It’s a long post (though I tried to keep it as light as possible), but the short form is “it was hard, and at least in part unnecessarily so”.

In a comment, Hristo suggested a workaround: “Enter Bitcoin”.

It’s possible he meant “and ignore the tax authorities entirely”, but let’s consider the impact on businesses that want to play by the rules. I think Bitcoin could be worse than credit card billing. In fact, if I’m reading the regulations right (and I might not be, I was up at 5am for a deployment today), the EU VAT changes might make accepting bitcoins pretty much impossible for law-abiding online EU businesses selling digital services like hosting, music downloads, and ebooks.

Here’s why:

Under the new rules, we need two pieces of non-conflicting evidence as to a customer’s location. The IP address can be one of them, and for our credit card/PayPal customers we can use their billing address as another.

But for Bitcoin, there is no billing address — it’s essentially digital cash. And regarding the other kinds of acceptable evidence:

  • “location of the bank” — not applicable using Bitcoin.
  • “the country code of SIM card used by the customer” — not available for an ordinary Internet business.
  • “the location of the customer’s fixed land line through which the service is supplied to him” — not available for an ordinary Internet business.
  • “other commercially relevant information (for example, product coding information which electronically links the sale to a particular jurisdiction)” — not applicable for a standardised product. Possibly usable if you’re selling SaaS tax-filing or something else that’s entirely country-specific.

Now, perhaps that list is non-exhaustive. It’s hard to tell whether it is because it says we must “obtain and to keep in your records 2 pieces of non-contradictory evidence from the following list”, which implies that it’s an exhaustive list, but then says “Examples include”, which implies that it’s not.[UPDATE: they’ve updated the guidance, it’s definitely non-exhaustive] But even if it is non-exhaustive, and, say, you can use a scan of someone’s utility bill or other stuff like the proof of address stuff you need to provide when you start a bank account, I can’t think of anything that anyone would be likely to be willing to provide for digital services like web hosting, music, or ebooks.

All of this means that, at least from my reading of the rules, we cannot now accept bitcoins as a means of payment. I’ve asked our accountants their professional opinion. But I’m not holding out much hope.

What do you think? Am I missing something — perhaps some kind of other proof of location that an online business accepting Bitcoin could easily gather?

Or is Bitcoin now effectively sunk as a means of payment for digital goods sold by businesses in the EU?

A fun bug

28 March 2014

While I’m plugging the memory leaks in my epoll-based C reverse proxy, I thought I might share an interesting bug we found today on PythonAnywhere. The following is the bug report I posted to our forums.

So, here’s what was happening.

Each web app someone has on PythonAnywhere runs on a backend server. We have a cluster of these backends, and the cluster is behind a loadbalancer. Every backend server in the cluster is capable of running any web app; the loadbalancer’s job is to spread things out between them so that each one at any given time is only running an appropriately-sized subset of them. It has a list of backends, which we can update in realtime as we add or remove backends to scale up or down, and it looks at incoming requests and uses the domain name to work out which backend to route a request to.

That’s all pretty simple. The twist comes when we add the code that reload web apps to the mix.

Reloading a PythonAnywhere web app is simply a case of making an authenticated request to a specific URL. For example, right now (and this might change, it’s not an official API, so don’t do anything that relies on it) to reload owned by user fred, you’d hit the URL ``

Now, the PythonAnywhere website itself is just another web app running on one of the backends (a bit recursive, I know). So most requests to it are routed based on the normal loadbalancing algorithm. But calls specifically to that “reload” URL need to be routed differently — they need to go to the specific backend that is running the site that needs to be reloaded. So, for that URL, and that URL only, the loadbalancer uses the domain name that’s specified second-to-the-end in the path bit of the URL to choose which backend to route the request to, instead of using the hostname at the start of the URL.

So, what happened here? Well, the clue was in the usernames of the people who were affected by the problem — IronHand and JoeButy. Both of you have mixed-case usernames. And your web apps are and

But the code on the “Web” tab that specifies the URL for reloading the selected domain specifies it using your mixed-case usernames — that is, it specifies that the reload calls should go to the URL for or

And you can probably guess what the problem was — the backend selection code was case-sensitive. So requests to your web apps were going to one backend, but reload messages were going to another different backend. The fix I just pushed made the backend selection code case-insensitive, as it should have been.

The remaining question — why did this suddenly crop up today? My best guess is that it’s been there for a while, but it was significantly less likely to happen, and so it was written off as a glitch when it happened in the past.

The reason it’s become more common is that we actually more than doubled the number of backends yesterday. Because of the way the backend selection code works, when there’s a relatively small number of backends it’s actually quite likely that the lower-case version of your domain will, by chance, route to the same backend as the mixed-case one. But the doubling of the number of servers changed that, and suddenly the probability that they’d route differently went up drastically.

Why did we double the number of servers? Previously, backends were m1.xlarge AWS instances. We decided that it would be better to have a larger number of smaller backends, so that problems on one server impacted a smaller number of people. So we changed our system to use m1.large instances instead, span up slightly more than twice as many backend servers, and switched the loadbalancer across.

So, there you have it. I hope it was as interesting to read about as it was to figure out :-)

…just resting…

12 December 2013

Just a quick note to say that I’m still here! Using rsp as a front-end for this site has usefully shown up some weird bugs, and I’m tracking them down. I’ll do a new post about it when there’s something useful to say…

Writing a reverse proxy/loadbalancer from the ground up in C, part 4: Dealing with slow writes to the network

10 October 2013

This is the fourth step along my road to building a simple C-based reverse proxy/loadbalancer, rsp, so that I can understand how nginx/OpenResty works — more background here. Here are links to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, to the second part, where I added the code to handle multiple connections by using epoll, and to the third part, where I started using Lua to configure the proxy.

This post was was unplanned; it shows how I fixed a bug that I discovered when I first tried to use rsp to act as a reverse proxy in front of this blog. The bug is fixed, and you’re now reading this via rsp. The problem was that when the connection from a browser to the proxy was slower than the connection from the proxy to the backend (that is, most of the time), then when new data was received from the backend and we tried to send it to the client, we sometimes got an error to tell us that the client was not ready. This error was being ignored, so a block of data would be skipped, so the pages you got back would be missing chunks. There’s more about the bug here.
Continue reading

A brief sidetrack: Varnish

2 October 2013

In order to use this blog as a decent real-world test of rsp, I figured that I should make it as fast as possible. The quickest way to do that was to install Varnish, which is essentially a reverse proxy that caches stuff. You configure it to say what is cachable, and then it runs in place of the web server and proxies anything it can’t cache back to it.

I basically used the instructions from Ewan Leith’s excellent “10 Million hits a day with WordPress using a $15 server” post.

So now, this server has:

  • rsp running on port 80, proxying everything to port 83.
  • varnish running on port 83, caching what it can and proxying the rest to port 81.
  • nginx running on port 81, serving static pages and sending PHP stuff to php5-fpm on port 9000.

I’ve also got haproxy running on port 82, doing the same as rsp — proxying everything to varnish — so that I can do some comparative speed tests once rsp does enough for such tests to give interesting results. Right now, all of the speed differences seem to be in the noise, with a run of ab pointed at varnish actually coming out slower than the two proxies.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: fixed it!

30 September 2013

It took a bit of work, but the bug is fixed: rsp now handles correctly the case when it can’t write as much as it wants to the client side. I think this is enough for it to properly work as a front-end for this website, so it’s installed and running here. If you’re reading this (and I’ve not had to switch it off in the meantime) then the pages you’re reading were served over rsp. Which is very pleasing :-)

The code needs a bit of refactoring before I can present it, and the same bug still exists on the communicating-to-backends side (which is one of the reasons it needs refactoring — this is something I should have been able to fix in one place only) so I’ll do that over the coming days, and then do another post.