Does #EUVAT make accepting bitcoins impossible for EU-based digital services businesses?

19 December 2014

Earlier on today I blogged a description of what we had to do at PythonAnywhere to handle the upcoming EU VAT (~= sales tax) changes for digital services. It’s a long post (though I tried to keep it as light as possible), but the short form is “it was hard, and at least in part unnecessarily so”.

In a comment, Hristo suggested a workaround: “Enter Bitcoin”.

It’s possible he meant “and ignore the tax authorities entirely”, but let’s consider the impact on businesses that want to play by the rules. I think Bitcoin could be worse than credit card billing. In fact, if I’m reading the regulations right (and I might not be, I was up at 5am for a deployment today), the EU VAT changes might make accepting bitcoins pretty much impossible for law-abiding online EU businesses selling digital services like hosting, music downloads, and ebooks.

Here’s why:

Under the new rules, we need two pieces of non-conflicting evidence as to a customer’s location. The IP address can be one of them, and for our credit card/PayPal customers we can use their billing address as another.

But for Bitcoin, there is no billing address — it’s essentially digital cash. And regarding the other kinds of acceptable evidence:

  • “location of the bank” — not applicable using Bitcoin.
  • “the country code of SIM card used by the customer” — not available for an ordinary Internet business.
  • “the location of the customer’s fixed land line through which the service is supplied to him” — not available for an ordinary Internet business.
  • “other commercially relevant information (for example, product coding information which electronically links the sale to a particular jurisdiction)” — not applicable for a standardised product. Possibly usable if you’re selling SaaS tax-filing or something else that’s entirely country-specific.

Now, perhaps that list is non-exhaustive. It’s hard to tell whether it is because it says we must “obtain and to keep in your records 2 pieces of non-contradictory evidence from the following list”, which implies that it’s an exhaustive list, but then says “Examples include”, which implies that it’s not.[UPDATE: they’ve updated the guidance, it’s definitely non-exhaustive] But even if it is non-exhaustive, and, say, you can use a scan of someone’s utility bill or other stuff like the proof of address stuff you need to provide when you start a bank account, I can’t think of anything that anyone would be likely to be willing to provide for digital services like web hosting, music, or ebooks.

All of this means that, at least from my reading of the rules, we cannot now accept bitcoins as a means of payment. I’ve asked our accountants their professional opinion. But I’m not holding out much hope.

What do you think? Am I missing something — perhaps some kind of other proof of location that an online business accepting Bitcoin could easily gather?

Or is Bitcoin now effectively sunk as a means of payment for digital goods sold by businesses in the EU?

A fun bug

28 March 2014

While I’m plugging the memory leaks in my epoll-based C reverse proxy, I thought I might share an interesting bug we found today on PythonAnywhere. The following is the bug report I posted to our forums.

So, here’s what was happening.

Each web app someone has on PythonAnywhere runs on a backend server. We have a cluster of these backends, and the cluster is behind a loadbalancer. Every backend server in the cluster is capable of running any web app; the loadbalancer’s job is to spread things out between them so that each one at any given time is only running an appropriately-sized subset of them. It has a list of backends, which we can update in realtime as we add or remove backends to scale up or down, and it looks at incoming requests and uses the domain name to work out which backend to route a request to.

That’s all pretty simple. The twist comes when we add the code that reload web apps to the mix.

Reloading a PythonAnywhere web app is simply a case of making an authenticated request to a specific URL. For example, right now (and this might change, it’s not an official API, so don’t do anything that relies on it) to reload www.foo.com owned by user fred, you’d hit the URL `http://www.pythonanywhere.com/user/fred/webapps/www.foo.com/reload`

Now, the PythonAnywhere website itself is just another web app running on one of the backends (a bit recursive, I know). So most requests to it are routed based on the normal loadbalancing algorithm. But calls specifically to that “reload” URL need to be routed differently — they need to go to the specific backend that is running the site that needs to be reloaded. So, for that URL, and that URL only, the loadbalancer uses the domain name that’s specified second-to-the-end in the path bit of the URL to choose which backend to route the request to, instead of using the hostname at the start of the URL.

So, what happened here? Well, the clue was in the usernames of the people who were affected by the problem — IronHand and JoeButy. Both of you have mixed-case usernames. And your web apps are ironhand.pythonanywhere.com and joebuty.pythonanywhere.com.

But the code on the “Web” tab that specifies the URL for reloading the selected domain specifies it using your mixed-case usernames — that is, it specifies that the reload calls should go to the URL for IronHand.pythonanywhere.com or JoeButy.pythonanywhere.com.

And you can probably guess what the problem was — the backend selection code was case-sensitive. So requests to your web apps were going to one backend, but reload messages were going to another different backend. The fix I just pushed made the backend selection code case-insensitive, as it should have been.

The remaining question — why did this suddenly crop up today? My best guess is that it’s been there for a while, but it was significantly less likely to happen, and so it was written off as a glitch when it happened in the past.

The reason it’s become more common is that we actually more than doubled the number of backends yesterday. Because of the way the backend selection code works, when there’s a relatively small number of backends it’s actually quite likely that the lower-case version of your domain will, by chance, route to the same backend as the mixed-case one. But the doubling of the number of servers changed that, and suddenly the probability that they’d route differently went up drastically.

Why did we double the number of servers? Previously, backends were m1.xlarge AWS instances. We decided that it would be better to have a larger number of smaller backends, so that problems on one server impacted a smaller number of people. So we changed our system to use m1.large instances instead, span up slightly more than twice as many backend servers, and switched the loadbalancer across.

So, there you have it. I hope it was as interesting to read about as it was to figure out :-)

…just resting…

12 December 2013

Just a quick note to say that I’m still here! Using rsp as a front-end for this site has usefully shown up some weird bugs, and I’m tracking them down. I’ll do a new post about it when there’s something useful to say…

Writing a reverse proxy/loadbalancer from the ground up in C, part 4: Dealing with slow writes to the network

10 October 2013

This is the fourth step along my road to building a simple C-based reverse proxy/loadbalancer, rsp, so that I can understand how nginx/OpenResty works — more background here. Here are links to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, to the second part, where I added the code to handle multiple connections by using epoll, and to the third part, where I started using Lua to configure the proxy.

This post was was unplanned; it shows how I fixed a bug that I discovered when I first tried to use rsp to act as a reverse proxy in front of this blog. The bug is fixed, and you’re now reading this via rsp. The problem was that when the connection from a browser to the proxy was slower than the connection from the proxy to the backend (that is, most of the time), then when new data was received from the backend and we tried to send it to the client, we sometimes got an error to tell us that the client was not ready. This error was being ignored, so a block of data would be skipped, so the pages you got back would be missing chunks. There’s more about the bug here.
Continue reading

A brief sidetrack: Varnish

2 October 2013

In order to use this blog as a decent real-world test of rsp, I figured that I should make it as fast as possible. The quickest way to do that was to install Varnish, which is essentially a reverse proxy that caches stuff. You configure it to say what is cachable, and then it runs in place of the web server and proxies anything it can’t cache back to it.

I basically used the instructions from Ewan Leith’s excellent “10 Million hits a day with WordPress using a $15 server” post.

So now, this server has:

  • rsp running on port 80, proxying everything to port 83.
  • varnish running on port 83, caching what it can and proxying the rest to port 81.
  • nginx running on port 81, serving static pages and sending PHP stuff to php5-fpm on port 9000.

I’ve also got haproxy running on port 82, doing the same as rsp — proxying everything to varnish — so that I can do some comparative speed tests once rsp does enough for such tests to give interesting results. Right now, all of the speed differences seem to be in the noise, with a run of ab pointed at varnish actually coming out slower than the two proxies.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: fixed it!

30 September 2013

It took a bit of work, but the bug is fixed: rsp now handles correctly the case when it can’t write as much as it wants to the client side. I think this is enough for it to properly work as a front-end for this website, so it’s installed and running here. If you’re reading this (and I’ve not had to switch it off in the meantime) then the pages you’re reading were served over rsp. Which is very pleasing :-)

The code needs a bit of refactoring before I can present it, and the same bug still exists on the communicating-to-backends side (which is one of the reasons it needs refactoring — this is something I should have been able to fix in one place only) so I’ll do that over the coming days, and then do another post.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: non-blocking output

28 September 2013

Before moving on to the next step in my from-scratch reverse proxy, I thought it would be nice to install it on the machine where this blog runs, and proxy all access to the blog through it. It would be useful dogfooding and might show any non-obvious errors in the code. And it did.

I found that while short pages were served up perfectly well, longer pages were corrupted and interrupted halfway through. Using curl gave various weird errors, eg. curl: (56) Problem (3) in the Chunked-Encoded data, which is a general error saying that it’s receiving chunked data and the chunking is invalid.

Doubly strangely, these problems didn’t happen when I ran the proxy on the machine where I’m developing it and got it to proxy the blog; only when I ran it on the same machine as the blog. They’re different versions of Ubuntu, the blog server being slightly older, but not drastically so — and none of the stuff I’m using is that new, so it seemed unlikely to be a bug in the blog server’s OS. And anyway, select isn’t broken.

After a ton of debugging with printfs here there and everywhere, I tracked it down. You’ll remember that our code to transfer data from the backend to the client looks like this:

void handle_backend_socket_event(struct epoll_event_handler* self, uint32_t events)
{
    struct backend_socket_event_data* closure = (struct backend_socket_event_data*) self->closure;

    char buffer[BUFFER_SIZE];
    int bytes_read;

    if (events & EPOLLIN) {
        bytes_read = read(self->fd, buffer, BUFFER_SIZE);
        if (bytes_read == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
            return;
        }

        if (bytes_read == 0 || bytes_read == -1) {
            close_client_socket(closure->client_handler);
            close_backend_socket(self);
            return;
        }

        write(closure->client_handler->fd, buffer, bytes_read);
    }

    if ((events & EPOLLERR) | (events & EPOLLHUP) | (events & EPOLLRDHUP)) {
        close_client_socket(closure->client_handler);
        close_backend_socket(self);
        return;
    }

}

If you look closely, there’s a system call there where I’m not checking the return value — always risky. It’s this:

        write(closure->client_handler->fd, buffer, bytes_read);

The write function returns the number of bytes it managed to write, or an error code. The debugging code revealed that sometimes it was returning -1, and errno was set to EAGAIN, meaning that the operation would have blocked on a non-blocking socket.

This makes a lot of sense. Sending stuff out over the network is a fairly complex process. There are kernel buffers of stuff to send, and as we’re using TCP, which is connection-based, I imagine there’s a possibility that the client being slow or transmission of data over the Internet might be causing things to back up. Possibly sometimes it was returning a non-error code, too, but was still not able to write all of the bytes I asked it to write, so stuff was getting skipped.

So that means that even for this simple example of an epoll-based proxy to work properly, we need to do some kind of buffering in the server to handle cases where we’re getting stuff from the backend faster than we can send it to the client. And possibly vice versa. It’s possible to get epoll events on an FD when it’s ready to accept output, so that’s probably the way to go — but it will need a bit of restructuring. So the next step will be to implement that, rather than the multiple-backend handling stuff I was planning.

This is excellent. Now I know a little more about why writing something like nginx is hard, and have a vague idea of why I sometimes see stuff in its logs along the lines of an upstream response is buffered to a temporary file. Which is entirely why I started writing this stuff in the first place :-)

Here’s a run-through of the code I had to write to fix the bug.

Writing a reverse proxy/loadbalancer from the ground up in C, part 3: Lua-based configuration

11 September 2013

This is the third step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works — more background here. Here’s a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, and to the second part, where I added the code to handle multiple connections by using epoll.

This post is much shorter than the last one. I wanted to make the minimum changes to introduce some Lua-based scripting — specifically, I wanted to keep the same proxy with the same behaviour, and just move the stuff that was being configured via command-line parameters into a Lua script, so that just the name of that script would be specified on the command line. It was really easy :-) — but obviously I may have got it wrong, so as ever, any comments and corrections would be much appreciated.
Continue reading