A classifier using Qwen3

Posted on 24 October 2025 in AI

I wanted to build on what I'd learned in chapter 6 of Sebastian Raschka's book "Build a Large Language Model (from Scratch)". That chapter takes the LLM that we've built, and then turns it into a spam/ham classifier. I wanted to see how easy it would be to take another LLM -- say, one from Hugging Face -- and do the same "decapitation" trick on it: removing the output head and replacing it with a small linear layer that outputs class logits

Turns out it was really easy! I used Qwen/Qwen3-0.6B-Base, and you can see the code here.

The only real difference between our normal PyTorch LLMs and one based on Hugging Face is that the return value when you call your model is a ModelOutput object with more to it than just the output from the model itself. But it has a logits field on it to get the raw output, and with that update, the code works largely unchanged. The only other change I needed to make was to change the padding token from the fixed 50256 that the code from the book uses to tokenizer.pad_token_id.

ChatGPT wrote a nice, detailed README for it, so hopefully it's a useful standalone artifact.