I’ve been adding to scriv, my changelog management tool.
The latest release is 1.2.0, with improvements to creating GitHub releases.
As I mentioned on last
month’s podcast, I think it’s important to have one source of truth for
information, and then build tooling to publish it wherever you want it to be
read.
I think GitHub Releases are a fine way to publicize new versions of software
and summarize the changes. But I don’t like them as the one place to store that
information. The CHANGELOG file is an ideal source of truth: it’s a text file
in your repo, so it’s version-controlled, can be modified in pull requests, can
work with other tools, is portable beyond GitHub, and so on. The change
information should be authored in the CHANGELOG.
Once the CHANGELOG has the information, a GitHub Release is a fine place to
publish it.
To help support this approach, scriv has a “github-release” command which
parses your CHANGELOG file and creates a corresponding GitHub Release. Scriv
also has commands for building the CHANGELOG file, but this github-release
command will work even if your CHANGELOG wasn’t created by scriv.
In fact, once scriv 1.2 was ready, I was able to remove
some custom tooling from the coverage.py repo that did the same job, a clear
sign that things are moving in the right direction. The coverage.py CHANGES.rst
file is hand-edited, but is published to GitHub releases by scriv.
Scriv started small, and has been slowing gaining features and users. I’d be
interested to know what you think of it.
One of the difficulties when comparing programming languages is that they
sometimes use the same words to describe similar things, but always with
differences. Sometimes the differences are large enough that we want to use
different words, but often they are not.
This came up the other day when someone said, “__init__() is not a
constructor.” I understand this idea, I’ve heard it before. I just don’t think
it’s a good way to help people understand Python.
The reason he said it is because in C++ (and maybe other languages), an
object is not truly a usable object of its class until the constructor returns.
The function is called the “constructor” because it is the code that constructs
the object.
In Python, an object called self is passed into __init__, and it is already a
usable object of its class before __init__ is called. So a C++ expert will say,
“__init__() is not a constructor.”
I look at it differently: a constructor is the code a class author writes
to fully get the object ready for use. That’s what __init__() does in Python.
To deal with the difference between Python and C++, I’d say, “constructors in
Python and C++ are different: in Python, the object is ready to use before the
constructor is called, but in C++, it isn’t until the constructor finishes.”
If you haven’t used C++, you might not even see what the big deal is. There
are aspects of C++ behavior where this dividing line between “isn’t of the
class” and “is of the class” makes a difference (non-virtual methods, for
example). Python doesn’t have those behaviors, so drawing a strict line isn’t
as necessary.
So let’s use “constructor” for the user’s view of the code, rather than the
internal implementation’s view. Part of Python’s construction of an object is
handled by __new__, but you almost never need to deal with __new__, so let’s
keep it simple.
Constructor is just one example of two languages using the same word for
similar but slightly different things:
- A classic example is “variable.” Some people say, “Python has no variables,”
when what they mean is, “Python variables work differently than C variables.”
They work exactly the same as JavaScript variables. (For more on this, see my
talk Python Names and Values.)
- Python lists (which are just like JavaScript arrays) are different than Lisp
lists (which are linked lists).
- Java classes have many differences from Python classes: access modifiers on
the Java side, multiple inheritance and special methods on the Python side.
- Python functions (which can have side-effects) are not like Haskell
functions (which cannot).
- And let’s not even get started on strings!
Different programming languages are different, but with similarities. We
have to re-use words to talk about them. Then we can explore where the
constructs are the same, and where they are different.
Michael Kennedy invited me on his podcast to chat about a round-up of tools
for README maintenance and other similar project automation:
Talk Python to Me #395: Tools for README.md Creation and
Maintenance.
We talked about almost 20 different tools, intermixed with philosophical
discussion about the problems maintainers face and the solutions they use.
Project maintainers need to write about their work, and present it to their
users. How to do that in a way that is convenient for both the maintainer and
the user? What to automate, and how?
Of course, I have opinions about this, which come out in the course of the
episode. Here are some of my top principles:
- Writing about your work helps you understand your work.
- You need to write with your audience in mind, and “the same thing” may need
to be said multiple times, in different ways, for different audiences.
- As much as possible, have a single source of truth, published to multiple
places as needed.
- It’s good to automate collecting information together for publishing,
but a human editing pass makes the result much better.
It was a fun discussion. Because we are both nerds caught up in the energy
of the moment, along the way we also touched on other topics like Twitter vs
Mastodon, ChatGPT and so on. Give it a listen. I’m always interested to
hear about other approaches and tools.
Picking up from Secure maintainer workflow, especially
the comments there (thanks!), here are some more things I’m doing to keep my
maintainer workflow safe.
1Password ssh: I’m using 1Password as my SSH agent. It works really
well, and uses the Mac Touch ID for authorization. Now I have no private keys
in my ~/.ssh directory. I’ve been very impressed with 1Password’s helpful and
comprehensive approach to configuration and settings.
Improved environment variables: I’ve updated my
opvars and unopvars shell functions that set environment
variables from 1Password. Now I can name sets of credentials (defaulting to the
current directory name), and apply multiple sets. Then unopvars knows all that
have been set, and clears all of them.
Public/private GitHub hosts: There’s a problem with using a
fingerprint-gated SSH agent: some common operations want an SSH key but aren’t
actually security sensitive. When pulling from a public repo, you don’t want to
be interrupted to touch the sensor. Reading public information doesn’t need
authentication, and you don’t want to become desensitized to the importance of
the sensor. Pulling changes from a git repo with a “git@” address always
requires SSH, even if the repo is public. It shouldn’t require an alarming
interruption.
Git lets you define “insteadOf” aliases so that you can pull using “https:”
and push using “git@”. The syntax seems odd and backwards to me, partly
because I can define pushInsteadOf, but there’s no pullInsteadOf:
[url "git@github.com:"]
# Git remotes of "git@github.com" should really be pushed using ssh.
pushInsteadOf = git@github.com:
[url "https://github.com/"]
# Git remotes of "git@github.com" should be pulled over https.
insteadOf = git@github.com:
This works great, except that private repos still need to be pulled using
SSH. To deal with this, I have a baroque contraption arrangement using a fake
URL scheme “github_private:” like this:
[url "git@github.com:"]
pushInsteadOf = git@github.com:
# Private repos need ssh in both directions.
insteadOf = github_private:
[url "https://github.com/"]
insteadOf = git@github.com:
Now if I set the remote URL to “github_private:nedbat/secret.git”, then
activity will use “git@github.com:nedbat/secret.git” instead, for both pushing
and pulling. (BTW: if you start fiddling with this, “git remote -v” will show
you the URLs after these remappings, and “git config --get-regex ‘remote.*.url’”
will show you the actual settings before remapping.)
But how to set the remote to “github_private:nedbat/secret.git”? I can set
it manually for specific repos with “git remote”, but I also clone entire
organizations and don’t want to have to know which repos are private. I automate
the remote-setting with an aliased git command I
can run in a repo directory that sets the remote correctly if the repo is
private:
[alias]
# If this is a private repo, change the remote from "git@github.com:" to
# "github_private:". You can remap "github_private:" to "git@" like this:
#
# [url "git@github.com:"]
# insteadOf = github_private:
#
# This requires the gh command: https://cli.github.com/
#
fix-private-remotes = "!f() { \
vis=$(gh api 'repos/{owner}/{repo}' --template '{{.visibility}}'); \
if [[ $vis == private ]]; then \
for rem in $(git remote); do \
echo Updating remote $rem; \
git config remote.$rem.url $(git config remote.$rem.url | \
sed -e 's/git@github.com:/github_private:/'); \
done \
fi; \
}; f"
This uses GitHub’s gh command-line
tool, which is quite powerful. I’m using it more and more.
This is getting kind of complex, and is still a work in progress, but it’s
working. I’m always interested in ideas for improvements.
Like many people these days, I’m looking around at alternatives to Twitter,
and Mastodon is a clear favorite. Comparing the
two, the big differences, both good and bad, come down to servers.
Twitter is one web site, run by one corporation. This is easy to understand,
and makes it easy to find lots of people, but has the disadvantage that a
deranged over-leveraged misanthropic moron can buy it and make it terrible.
Mastodon is both a web application and a galaxy of
tens of thousands of separate servers running that
software. Each server is run by a different person (or team), each with their
own idea of what the server is for, and how it should be run. The servers are
not isolated islands, they are federated together: they communicate with each
other so that people on one server can see the posts from other servers.
This is the big difference between Twitter (centralized) and Mastodon
(distributed and federated), and it plays out in both good and bad ways. I’m
new to Mastodon, so take all this with a grain of salt. Some of what I label as
bad is due to growing pains and can be improved over time. Some is essential:
distributed grass-roots organization is messy and requires work. But it doesn’t
suffer from Twitter’s current problem, so it’s worth working at.
Here are my impressions of the pros and cons of distributed federation:
Initial hurdles (Bad) When you try to join
Mastodon, you are immediately asked to choose a server. This is not an easy
decision. Coming from a centralized service like Twitter (or Facebook, YouTube,
Reddit, etc, etc), the question makes no sense. The list of servers to choose
from is a tiny fraction of all the ones out there, and they seem to be very
strange niche communities. The implications of the choice are not obvious
either: how will it affect my Mastodon experience, and can I change my mind
later? The choice cannot be avoided, you need to pick a server to be on. But
over time, Mastodon can make the choice easier.
Independence (Good) Because there are many servers, Mastodon can’t be
bought by a crazed billionaire. There is no company, no centralized thing that
can suddenly change hands. If part of the galaxy goes in a bad direction, the
rest of the servers are unaffected.
Moderation (Mixed) Moderation is handled on each server by the server
admins. This is good: you can join a server with a moderation policy you approve
of. But it also means that server operators have to handle this delicate and
complicated job. It might become a chore they don’t want to do, or can’t do
because of volume or shifting interests. A well-moderated server could slide
into disrepair.
Server as community (Mixed) One philosophy of server choice is that a
server will be for a certain community. There are knitting servers and
pet-lover servers, liberal servers and conservative servers, and so on. This
can be a good way to cluster together with people you have an especially tight
bond with. But most people are not members of a single community. I live in
Boston, make software, like math, and am a parent. Which of those community
identities should I use to choose a server? I don’t have to use any of them.
Not all servers are organized around a topic at all. No matter what server you
choose, you are not limited to just one topic. On the other hand, some servers
are so topic-focused that they might consider some posts offensive that other
servers see as perfectly normal. Some aspects of Mastodon’s design and
culture grew from this idea of “server as community,” like the Local Timeline,
which is posts by anyone on your server. As Mastodon attracts more people, the
value and feasibility of those aspects are weakening.
Re-centralization (Bad) As people join Mastodon, there’s a natural
tendency to avoid the question of choosing a server by joining what appear to be
the “default servers.” This leads to over-sized servers that don’t fit well
with the original design of Mastodon and can lead to problems. Right now,
the largest servers are closed to new accounts, pushing people to spread out
more, which is good.
Defederation (Mixed) A server can decide to never communicate with
some other specific server. Your vegan community server could decide not to
federate with the safari hunter community server, because of the clash of
interests. That’s good, it helps focus you on what you want to see and keeps
communities cohesive. It gives server admins a giant kill switch to keep their
servers safe. The downside is that one malicious user on a large server could
cause other admins to defederate that entire server, and now all of that
servers’ users are invisible to you. Wholesale defederation is frowned upon and
a last resort, but as small-server moderators try to handle the growth of
Mastodon, it could happen more, and you have little say over it.
Incomplete visibility (Bad) Federation is not total: there will always
be posts that you cannot see. Roughly speaking, you can see posts by anyone on
your server, and by anyone on the home server of any person followed by people
on your server. This will be a lot of posts, but it will never be complete.
Server turnover (Bad) Your server could go away. Many servers are run
by people because it seemed like an interesting thing to do. Maybe their
interests shift. Maybe their server attracted more people than they were willing
to support, either in infrastructure costs and effort, or in moderation effort.
The server could stop operating, or could just silently wither. It could even be
purchased by an egomaniacal tyrant. But it would only be one server, and you can
choose a new server and move to it, bringing along your followers.
Open source (Mostly Good) Mastodon software is open source, which
means it is safe from takeover, because anyone can take a copy of the software
and extend it if they need to. It also means that many people can help improve
it, although open source can sometimes be weak on the processes for how to
decide what changes to make. A downside is that different servers can be
running different versions of the same software, or can be running different
forks of the software, so there can be differences between them in what they
support and how they work.
Funding (Mixed) Because there is no central company, every server
admin has to fund their own server, both the technical infrastructure, and the
time they put into it. This can be difficult, and given the recent surge of
interest, unpredictable. Some servers ask for donations, some even charge money
to join. The good news is there are no ads (yet?), but you might need to chip
in a few dollars to keep your server going. The good news is that you can
choose who to give your money to.
Following and liking (Bad) When you see a post in Mastodon, it is easy
to follow the author: there’s a Follow button just like on Twitter. You can
also favorite (like) their post easily. But if someone sends you a link to a
Mastodon post, you will be visiting that post’s server instead of your own. You
won’t be logged in there, because you have an account on your own server, but
not on any others. Then following the author or liking the post is more
difficult: you have to copy a URL and paste it into the search box on your own
Mastodon. This makes off-Mastodon discovery of new people harder.
User names (Bad) Your handle on Mastodon is more complicated than on
Twitter, because you need to name your server also. I am
@nedbat@hachyderm.io. It’s more to remember than just
@nedbat on Twitter, and it doesn’t fit as well in
places like the footer of presentations. Also, someone else can take
@nedbat@somewhere.else.xyz, and there’s nothing I can do about it.
ActivityPub (Good) The rabbit-hole goes deeper: Mastodon is just one
application built on a standard protocol called
ActivityPub. Other services can also implement it,
and be part of the universe of federated applications known as the Fediverse.
Tumblr has announced they will support it, so you could
see many services all mingling together.
• • •
So, what to make of all this? Twitter seems on a bad track, and it’s hard to
imagine any new centralized social network gaining Twitter-scale adoption.
Mastodon is harder than Twitter, but has its advantages. Decentralization is
messy, and while authoritarian centralized services are simpler to understand,
they offer little freedom, and bring great risk.
But as Dan Sinker said:
*whispers* Most people don’t care about decentralization and
governance models, they just want someplace to hang out that’s not a
hassle.
Twitter was never going to last forever. I don’t know if Mastodon will grow
to anything close to its size, or what else might. But it’s interesting to watch
the evolution and try it out.
I’m trying to establish a more secure workflow for maintaining public
packages.
Like most developers, I have terminal sessions with implicit access to
credentials. For example, I can make git commits and push to GitHub without
having to type a password.
There are two ways this might be a problem. The first is unlikely: a bad guy
gets onto my computer and uses the credentials to cause havoc. This is unlikely
mostly because a bad guy won’t get my computer, but also, if it does fall into
the wrong hands, it will probably be someone looking to resell the laptop, not
use my coverage.py credentials maliciously.
The second way is a more serious concern: I could unknowingly run evil or
buggy code that uses my credentials in bad ways. People write
bug reports for coverage.py, and if I am lucky, they
include steps to reproduce the problem. Sometimes the instructions involve
small self-contained examples, and I can just run them
without fear. But sometimes the steps are clone this repo, and
run this large test suite. It’s impossible to review all of that code. I
don’t know what the code will do, but if I want to see and diagnose the problem,
I have to run it.
I’m trying to reduce the possibilities for bad outcomes, in a few ways:
1Password: where possible, I store credentials in
1Password, and use tooling to get them
into environment variables. I have two shell functions
(opvars / unopvars) that find values in a vault based on
the current directory, and can set and unset them in the environment.
With this, I can have the credentials in the environment for just long enough
to use them. This works well for things like PyPI credentials, which are used
rarely and could cause significant damage.
But I still also have implicit credentials in my ~/.ssh directory and
~/.netrc file. I’m not sure the best approach to keep them from being available
to programs that shouldn’t have them.
Docker: To really isolate unknown code, I use a Docker container. I
start with a base image with many versions of Python:
base.dockerfile, and then build on it to create a
main image that doesn’t even have sudo. In the container,
there are no credentials, so I don’t have to worry about malice or accidents.
For involved debugging, I might write another Dockerfile FROM these to reduce
the re-work that has to happen when starting over.
What else can I be doing to keep safe?
Older: