LLM Quantisation Weirdness

Posted on 27 February 2024 in Programming, Python, AI

I bought myself an Nvidia RTX 3090 for Christmas to play around with local AI models. Serious work needs larger, more powerful cards, and it's easy (and not that expensive) to rent such cards by the minute from the likes of Paperspace. But the way I see it, I'm not going to be doing any serious work -- and what I really want to do is be able to run little experiments quickly and easily without worrying about spinning up a machine, getting stuff onto it, and so on.

One experiment that I tried the other day was to try to get a mental model of how model size and quantisation affect the quality of responses from LLMs. Quantisation is the process of running a model that has, say, 16 bits for each of its parameters with the parameters clipped to eight bits, four bits, or even less -- people have found that it often has a surprisingly small effect on output quality, and I wanted to play with that. Nothing serious or in-depth -- just trying stuff out with different model sizes and quantisations, and running a few prompts through them to see how the outputs differed.

I was comparing three sizes of the Code Llama HF model, with different quantisations:

Code Llama is a model from Meta, and the HF version ("Human Feedback") is designed to receive questions about programming (with specific formatting), and to reply with code. I chose those particular quantisations because the 13b model wouldn't fit in the 3090's 24GiB RAM without quantisation to a least 8-bit, and the 34b model would only fit if it was 4-bit quantised.

The quality of the response to my test question was not too bad with any of these, apart from codellama/CodeLlama-34b-Instruct-hf in 4-bit, which was often (but not always) heavily glitched with missing tokens -- that is, it was worse than codellama/CodeLlama-7b-Instruct-hf in 4-bit. That surprised me!

I was expecting quantisation to worsen the results, but not to make a larger model worse than a smaller one at the same level of quantisation. I've put a repo up on GitHub to see if anyone can repro these results, and to find out if anyone has any idea why it's happening.

[ Read more ]

Giving up on the AI chatbot tutorial (for now)

Posted on 27 February 2024 in Programming, Python, AI

I'm a big fan of learning in public, and early last year I started trying to do that by writing an AI chatbot tutorial as I learned the technology myself. But somehow it just wasn't working -- perhaps because my understanding was evolving so quickly that each time I sat down to write, I spotted dozens of errors in the previous posts, and felt I should fix those first. So I've decided to give up on that one, at least for now.

So, back to something a bit more achievable! Some lab notes will be coming on things I've been working on, including -- later on this evening -- a post about an oddity I found the other day.

In the meantime, here's a blog post I did for PythonAnywhere late last year: Five steps to create your own PythonAnywhere AI guru, on PythonAnywhere.

Building an AI chatbot for beginners: part 2

Posted on 4 April 2023 in Programming, Python, AI

[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]

Welcome to the second part of my tutorial on how to build a chatbot using OpenAI's interface to their Large Language Models (LLMs)! You can read the introduction here, and the first part here. As a reminder, I'm writing this not because I'm an expert, but because I'm learning how to do it myself, and writing about it helps me learn faster. Caveat lector :-)

In this post, we'll give the bot some memory of the conversation so far.

At the end of the first part, we had a program that would accept input from a user, combine it with some static text to make a prompt that an LLM would complete in the character of a chatbot (stopping at the point that the chatbot should stop, and not trying to carry on the conversation), then send it to OpenAI's API specifying an LLM model, and print out the result.

[ Read more ]

Building an AI chatbot for beginners: part 1

Posted on 19 March 2023 in Programming, Python, AI

[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]

Welcome to the first part of my tutorial on how to build a chatbot using OpenAI's interface to their Large Language Models (LLMs)! You can read the introduction here.

If you're reading this and want to get the best out of it, I strongly recommend that you run the code on your own machine as you go along: trust me, it will stick in your mind much better if you do that.

The goal in this post is to write a basic bot script that accepts user input, and just bounces it off an OpenAI LLM to generate a response.

[ Read more ]

Building an AI chatbot for beginners: part 0

Posted on 19 March 2023 in Programming, Python, AI

[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]

Like a lot of people, I've been blown away by the capabilities of Large Language Model (LLM) based systems over the last few months. I'm using ChatGPT regularly for all kinds of things, from generating basic code to debugging errors to writing emails.

I wanted to understand more about how these tools worked, and feel strongly that there's no better way to learn something than by doing it. Building an LLM is, at least right now, super-expensive -- in the millions of dollars (although maybe that will be coming down fast?). It also requires a lot of deep knowledge to get to something interesting. Perhaps something to try in the future, but not right now.

However, using LLMs to create something interesting -- that's much easier, especially because OpenAI have a powerful API, which provides ways to do all kinds of stuff. Most relevantly, they provide access to a Completion API. That, as I understand it, is the lowest-level way of interacting with an LLM, so building something out of it is probably the best bang for the buck for learning.

Over the last few weeks I've put together a bunch of things I found interesting, and have learned a lot. But it occurred to me that an even better way to learn stuff than by building it is to build it, and then explain it to someone else, even if that person is an abstract persona for "someone out there on the Internet". So: time for a LLM chatbot tutorial!

[ Read more ]

Acquired!

Posted on 28 September 2022 in PythonAnywhere, Business of Software

As those of you who know me (and probably a fair few that don't) will already know, PythonAnywhere was acquired by Anaconda, Inc back in June of this year. We're still the same team, and I'm still leading it, but now we're part of a larger company.

It's been quite a ride. Due diligence and negotiation in the months up to the close was just as tough as I'd always been told it would be (and that's despite the fact that according to our lawyers it was a pretty smooth one as these things go). And now I have to get used to having a boss again, which is weird... but is helped by the fact that said boss is a great guy, and is aligned with us (you can tell from the lingo that I work for a larger company now, right?) on keeping the platform up and running as it was, while investing into it so that it can get better and grow faster.

So, all good news :-)

I've been vaguely considering putting together a few blog posts outlining what happens during an acquisition -- just a general discussion of the steps and what they involve. I wouldn't be putting anything in about this particular deal, of course -- there are strict non-disclosures about the terms and so on -- but just a description of what happens might be useful for other people in the position I was in earlier on this year. I had to learn a lot of stuff very quickly, and while our lawyers were awesome and explained things brilliantly, it would have been useful to have some kind of layman's background information.

What do you think -- worth posting?

A somewhat indirect way of reporting stolen cards to the bank

Posted on 6 February 2022 in PythonAnywhere, Business of Software

One of the interesting things about having a business that accepts cards on the Internet is seeing what odd things people do when trying to use your site. A case in point is someone we've noticed over the last few months, who appears to be using our site as a rather indirect way to report stolen cards.

The behaviour that we see is that they run some kind of script that signs up for a bunch of accounts, with randomly-generated usernames, and then try to upgrade them all using stolen card numbers.

Naturally, our fraud-prevention systems pick that up pretty much immediately, and we run our own script that identifies every account that they've created, finds the card details used for them, and reports every transaction and attempted transaction as fraudulent. This means that our payment processor, Stripe, can flag the card numbers as stolen, so that they can't be used elsewhere without triggering fraud alerts to the other merchants. And, if a charge actually goes through (most of the cards tend to be pre-paid with no money on them, so most charges fail), then we refund it as fraudulent, which not only notifies Stripe, but I believe notifies the bank that the card number is circulating amongst card fraudsters.

Now, the fact that we do this should be obvious to them. Every time they run their scripts, it causes a minor inconvenience to us (the scripts that we have to handle the problem are getting ever-simpler to use), and it means that every card that they tried on our site is now significantly less valuable as an asset to them. They're essentially paying money for lists of stolen card numbers, and then burning it up.

Given that we're doing this, and they must know that we're doing it, the only explanation I can think of is that they're actually running some kind of strange public service where they buy lists of stolen card details and then get them blocked. It does seem a very roundabout way to do it, though. Surely it would be easier to just tell the banks directly?

But perhaps there's something I'm missing.

Or perhaps they really are dim enough to be using us to check stolen cards for validity, and haven't yet noticed that doing so against a site that reports every fraudulent transaction to the card processor is not a terribly good idea...

COVID-19 breakthrough / re-infection: a personal tale

Posted on 30 November 2021 in Personal

I'm just recovering from (PCR-confirmed) covid after (I believe) having had it in 2020, and having been double-jabbed with AstraZeneca over the course of the last year. I'm completely fine, and listening to people moaning about their health is rather dull, so I won't bore you by posting at length here. But a number of people I know were really surprised to hear about it, thinking that re-infection and breakthrough infections were rare. Given that I, my partner Sara, and a close friend have all had it again (PCR tested in each case) over the last month, it seems that it might be more common than generally suspected -- so I figured that a first-person account might be of some interest.

[ Read more ]

Fun with network namespaces, part 1

Posted on 13 March 2021 in Linux, Programming

Linux has some amazing kernel features to enable containerization. Tools like Docker are built on top of them, and at PythonAnywhere we have built our own virtualization system using them.

One part of these systems that I've not spent much time poking into is network namespaces. Namespaces are a general abstraction that allows you to separate out system resources; for example, if a process is in a mount namespace, then it has its own set of mounted disks that is separate from those seen by the other processes on a machine -- or if it's in a process namespace, then it has its own cordoned-off set of processes visible to it (so, say, ps auxwf will just show the ones in its namespace).

As you might expect from that, if you put a process into a network namespace, it will have its own restricted view of what the networking environment looks like -- it won't see the machine's main network interface,

This provides certain advantages when it comes to security, but one that I thought was interesting is that because two processes inside different namespaces would have different networking environments, they could both bind to the same port -- and then could be accessed from outside via port forwarding.

To put that in more concrete terms: my goal was to be able to start up two Flask servers on the same machine, both bound to port 8080 inside their own namespace. I wanted to be able to access one of them from outside by hitting port 6000 on the machine, and the other by hitting port 6001.

Here is a run through how I got that working; it's a lightly-edited set of my "lab notes".

[ Read more ]

Comments are back!

Posted on 22 February 2021 in Blogging, Meta, Website design

Comments are now back up and running. They were interesting to put together; as a concept they don't play well with a static site, as they are by their very nature dynamic.

I was considering using Disqus, but I do want to try to keep my data to myself with this blog. I wound up putting together a separate site, comments.gilesthomas.com, which is non-static, and handles all of the comments -- some simple JavaScript injects them into each post page. It uses Akismet -- the one external dependency I feel I can allow myself -- to filter spam.

Should be interesting to see how it works! I'll give the new system a few days to bed in, and for a spot of code-tidying, then I'll post on the design of the new blog as a whole. I feel that I have Things To Say.