Moving from Fabric3 to Fabric
I decided to see how long I could go without coding anything after starting my sabbatical -- I was thinking I could manage a month or so, but it turns out that the answer was one week. Ah well. But it's a little less deep-tech than normal -- just the story of some cruft removal I've wanted to do for some time, and I'm posting it because I'm pretty sure I know some people who are planning to do the same upgrade -- hopefully it will be useful for them :-)
I have a new laptop on the way, and wanted to polish up the script I have to install the OS. I like all of my machines to have a pretty consistent Arch install, with a few per-machine tweaks for different screen sizes and whatnot. The process is:
- Boot the new machine from the Arch install media, and get it connected to the LAN.
- Run the script on another machine on the same network -- provide it with the root password (having temporarily enabled root login via SSH) and the IP.
- Let it do its thing, providing it with extra information every now and then -- for example, after reboots it will ask whether the IP address is still the same.
This works pretty well, and I've been using it since 2017. The way it interacts with the machine is by using Fabric. I've never taken to declarative machine setup systems like Ansible -- I always find you wind up re-inventing procedural logic in them eventually, and it winds up being a mess -- so a tool like Fabric is ideal. You can just run commands over the network, upload and download files, and so on.
The problem was that I was using Fabric3. When I started writing these scripts in 2017, it was a bit of a weird time for Fabric. It didn't support Python 3 yet, so the only way to work in a modern Python was to use Fabric3, a fork that just added that.
I think the reason behind the delay in Python 3 support for the main project was that the team behind it were in the process of redesigning it with a new API, and wanted to batch the changes together; when Fabric 2.0.0 came out in 2018, with a completely different usage model, it was Python 3 compatible. (It does look like they backported the Python 3 stuff to the 1.x series later -- at least on PyPI there is a release of 1.15.0 in 2022 that added it.)
So, I was locked in to an old dependency, Fabric3, which hadn't been updated since 2018. This felt like something I should fix, just to keep things reasonably tidy. But that meant completely changing the model of how my scripts ran -- this blog post is a summary of what I had to do. The good news is: it was actually really simple, and the new API is definitely an improvement.
What it looked like and why
Here's an example of some of the code I had:
def fabric_settings(address, username, password=None, key=None, connection_attempts=6, **kwargs):
host_string = '%s@%s' % (username, address)
return settings(
host_string=host_string, password=password, key=key, disable_known_hosts=True,
connection_attempts=connection_attempts,
keepalive=15,
**kwargs
)
...
def main():
...
with fabric_settings(ip, "root", password=root_password):
run("fdisk -l")
...
with fabric_settings(ip, "giles", password=giles_password):
sudo("pacman -S --noconfirm ntfs-3g")
for mount_point, owner, mount in config["extra_mounts"]:
sudo("mkdir -p {}".format(mount_point))
sudo("chown {} {}".format(owner, mount_point))
append(
"/etc/fstab",
mount,
use_sudo=True,
)
if config["network_manager"] == "NetworkManager":
put(
"deploy-files/etc/NetworkManager/NetworkManager.conf",
"/etc/NetworkManager/NetworkManager.conf",
use_sudo=True
)
put_interpolated(
"deploy-files/router/etc/named.conf",
"/etc/named.conf",
dict(
zone=config["router_info"]["zone"],
primary_subnet=config["router_info"]["primary_subnet"],
),
use_sudo=True
)
So let's look into what that was doing. The model of the original Fabric API was
that you had a bunch of utility functions -- put
, run
, sudo
, and so on --
which ran in the current context. The context was provided by a settings
context manager, which told it which remote host to connect to, which user and
credentials to use, and so on. So you can see that in that code I had a utility
function to create a context using the parameters I preferred, and then I was
using it to connect firstly as root (for the code that formats the disk and sets
up the giles
user with sudo rights), and then as the giles
user later on to do
the real setup via sudo.
There were also some useful convenience functions -- the ones I used in particular
were sed
, which was primarily aimed at find and replace, and append
which did exactly
what it said on the tin. And I'd written a put_interpolated
function for the (very
common during machine setup) process of taking a local file that has stuff that needs
to be changed before it's uploaded.
There were issues with this design; it felt clever, and was quite clean to use, but
it was using context managers to mangle what was essentially global state,
and that abstraction leaked from time to time. For example, I had this reboot
function
(because the built-in one was unreliable):
def reboot():
print("Rebooting")
with settings(warn_only=True):
sudo("shutdown -r +0")
time.sleep(10)
print("Disconnecting")
disconnect_all()
You see that disconnect_all()
at the end? That closed all of Fabric's open connections.
That's because it cached all of the connections it was currently using, and it was
actually rather hard to identify which of them might relate to the machine that you just
rebooted (remember you can have multiple connections using different credentials). So
it was just easier to close them all and let the next attempt to use them kick off a reconnect.
How it changed: first pass
So what did Fabric 2 change? Basically, all of that. Now, you have a Connection
object, and
that is primary. In order to perform operations, you call methods on it, like put
,
run
, sudo
, and so on:
conn = Connection(host=host_ip, user=user)
result = conn.run('hostname')
This doesn't feel as clever as the old system, but to be honest, I think it's actually better in practice.
So, the first change was to replace all of my blocks like this:
with fabric_settings(ip, "giles", password=giles_password):
...with code to create a Connection. I wrote a helper function with the same parameters as my existing one:
from fabric import Connection
def fabric_connection(host, username, password=None, key=None, connection_attempts=6, **kwargs):
return Connection(
host=host,
user=username,
connect_kwargs={
"password": password,
"pkey": key,
**kwargs
}
)
Note that the disable_known_hosts
, connection_attempts
, and keepalive
stuff has
all gone -- these were all workarounds for specific issues I'd encountered in the past, and
the new Fabric defaults seem to avoid those problems.
So now I could use it:
conn = fabric_connection(ip, "giles", password=giles_password)
...and then dedent all of the code inside the old with
block, replacing every run
with
conn.run
, every put
with conn.put
, and every sudo
with conn.sudo
. (For this
task I was working with Claude but had decided to let it be the navigator while I drove --
an interesting experiment, but I should have let it drive for this bit as it's the kind
of thing AIs excel at.)
There was one little tweak that we discovered during this, though -- the new Fabric
requires the password to be used for sudo to be set on the connection, as otherwise
it will prompt for one. (The old one just automatically used the provided password, I
think.) So a quick tweak was to add this to the fabric_connection
function:
conn = Connection(....)
if password:
conn.config.sudo.password = password
return conn
append
and sed
That was a good start, but we didn't have the append
and sed
helpers that we previously
did; they were part of fabric.contrib
, which was removed in the new version of Fabric.
There is a higher-level library called Patchwork
which provides stuff like that, but looking at the code, Claude and I figured that
the actual uses I was making of those functions were so simple that we might as well
inline them. So, sed
was just replaced with conn.run
calls that actually used
sed
itself, and for append
we could either conn.run
an
echo SOMETHING >> filename.txt
...or, for more complex multi-line cases we could use a here-document1
-- that is, the <<
syntax you often see in shell scripts, like this:
conn.run("cat >> /home/giles/.bashrc << 'EOF'" + BASH_PROMPT_GIT_CODE + "EOF")
With those covered, we had something that worked! Almost.
reboot
The reboot
function had also gone. I remember having all kinds of problems with it
anyway, which is why I had written my own. I decided I'd kick off by making it as simple as possible:
def reboot(conn):
conn.sudo("reboot", warn=True)
time.sleep(10)
That seemed to work fine!
use_sudo
This was the trickiest one. Remember this code, running as non-root:
put(
"deploy-files/etc/NetworkManager/NetworkManager.conf",
"/etc/NetworkManager/NetworkManager.conf",
use_sudo=True
)
The conn.put
method did not have that use_sudo
kwarg. Claude and I brainstormed
about this a bit, and eventually decided that the best way to implement it would be
to write our own put_sudo
function that took a connection and uploaded the file to
/tmp
using conn.put
, then used conn.sudo
to move it to the desired location.
That worked fine.
put_interpolated
This, of course, was already a helper function, so it could just be modified to use
either conn.put
or put_sudo
as appropriate based on its use_sudo
argument.
sudo
weirdness
In my code I had a place where I had to temporarily change sudo
settings to say that
when running pacman
, it should not prompt for the password. 2 It looked like this
(after being ported to the new connection system):
conn.sudo('echo "giles ALL = (root) NOPASSWD: /usr/bin/pacman" >> /etc/sudoers')
I kept getting permission errors -- even though the command was being run with conn.sudo
,
it didn't have permission to write to /etc/sudoers
.
I think that what was happening here was that the old Fabric system would run the
whole command with sudo
, so the redirection at the end happens with elevated permissions.
But the new one, I think just prefixes the command with sudo, so what is actually being
run is this:
sudo echo "giles ALL = (root) NOPASSWD: /usr/bin/pacman" >> /etc/sudoers
So only the echo
has superuser permissions, and the write fails. The workaround
was simple:
conn.sudo('bash -c \'echo "giles ALL = (root) NOPASSWD: /usr/bin/pacman" >> /etc/sudoers\'')
Success!
With that relatively minimal work -- maybe a couple of elapsed hours, but a lot of that was testing by firing up a fresh VM and running the script against it to make sure that it worked -- I had ported everything over to the most recent version of Fabric.
Red, green, refactor
All of that was quite nice, but it felt a bit messy -- using a helper function to put files if we needed to use sudo, but using the method on the connection if we didn't.
So I created a new ExtendedConnection
class that re-introduced the use_sudo
kwarg,
and added reboot
and put_interpolated
to it too. Its __init__
also does the sudo password
wrangling, and takes parameters in a format I prefer so that I could get rid of my
fabric_connection
helper function and just instantiate it directly from my code.
I've put that into a public MIT-licensed
repo with the catchy name fabric-utils-extended
for anyone that wants to use it -- you can just directly pip install the repo:
pip install git+https://github.com/gpjt/fabric-utils-extended.git
All done
So now I just need to wait for my new laptop to try it out in earnest. But a well-spent couple of hours clearing out a bit of cruft that had been annoying me for years.
Back to AI on Monday, I think!
-
I must admit, although I'd used this trick in the past, this was the first time I'd heard what it was called. ↩
-
This is because I was using
yay
to install AURs; you cannot runyay
as root, but it needs to sudo runpacman
as part of its operation to install the packages. So before running it I wanted to make that happen without prompting for a password -- I roll the config back later, of course, as it's not something you would want to keep on a machine long-term. ↩