Scriv 1.2: create GitHub releases

Wednesday 18 January 2023

I’ve been adding to scriv, my changelog management tool. The latest release is 1.2.0, with improvements to creating GitHub releases.

As I mentioned on last month’s podcast, I think it’s important to have one source of truth for information, and then build tooling to publish it wherever you want it to be read.

I think GitHub Releases are a fine way to publicize new versions of software and summarize the changes. But I don’t like them as the one place to store that information. The CHANGELOG file is an ideal source of truth: it’s a text file in your repo, so it’s version-controlled, can be modified in pull requests, can work with other tools, is portable beyond GitHub, and so on. The change information should be authored in the CHANGELOG.

Once the CHANGELOG has the information, a GitHub Release is a fine place to publish it.

To help support this approach, scriv has a “github-release” command which parses your CHANGELOG file and creates a corresponding GitHub Release. Scriv also has commands for building the CHANGELOG file, but this github-release command will work even if your CHANGELOG wasn’t created by scriv.

In fact, once scriv 1.2 was ready, I was able to remove some custom tooling from the coverage.py repo that did the same job, a clear sign that things are moving in the right direction. The coverage.py CHANGES.rst file is hand-edited, but is published to GitHub releases by scriv.

Scriv started small, and has been slowing gaining features and users. I’d be interested to know what you think of it.

Same words, different meanings

Tuesday 10 January 2023

One of the difficulties when comparing programming languages is that they sometimes use the same words to describe similar things, but always with differences. Sometimes the differences are large enough that we want to use different words, but often they are not.

This came up the other day when someone said, “__init__() is not a constructor.” I understand this idea, I’ve heard it before. I just don’t think it’s a good way to help people understand Python.

The reason he said it is because in C++ (and maybe other languages), an object is not truly a usable object of its class until the constructor returns. The function is called the “constructor” because it is the code that constructs the object.

In Python, an object called self is passed into __init__, and it is already a usable object of its class before __init__ is called. So a C++ expert will say, “__init__() is not a constructor.”

I look at it differently: a constructor is the code a class author writes to fully get the object ready for use. That’s what __init__() does in Python. To deal with the difference between Python and C++, I’d say, “constructors in Python and C++ are different: in Python, the object is ready to use before the constructor is called, but in C++, it isn’t until the constructor finishes.”

If you haven’t used C++, you might not even see what the big deal is. There are aspects of C++ behavior where this dividing line between “isn’t of the class” and “is of the class” makes a difference (non-virtual methods, for example). Python doesn’t have those behaviors, so drawing a strict line isn’t as necessary.

So let’s use “constructor” for the user’s view of the code, rather than the internal implementation’s view. Part of Python’s construction of an object is handled by __new__, but you almost never need to deal with __new__, so let’s keep it simple.

Constructor is just one example of two languages using the same word for similar but slightly different things:

  • A classic example is “variable.” Some people say, “Python has no variables,” when what they mean is, “Python variables work differently than C variables.” They work exactly the same as JavaScript variables. (For more on this, see my talk Python Names and Values.)
  • Python lists (which are just like JavaScript arrays) are different than Lisp lists (which are linked lists).
  • Java classes have many differences from Python classes: access modifiers on the Java side, multiple inheritance and special methods on the Python side.
  • Python functions (which can have side-effects) are not like Haskell functions (which cannot).
  • And let’s not even get started on strings!

Different programming languages are different, but with similarities. We have to re-use words to talk about them. Then we can explore where the constructs are the same, and where they are different.

Talk Python to Me: Tools for README

Monday 26 December 2022

Michael Kennedy invited me on his podcast to chat about a round-up of tools for README maintenance and other similar project automation: Talk Python to Me #395: Tools for README.md Creation and Maintenance.

We talked about almost 20 different tools, intermixed with philosophical discussion about the problems maintainers face and the solutions they use. Project maintainers need to write about their work, and present it to their users. How to do that in a way that is convenient for both the maintainer and the user? What to automate, and how?

Of course, I have opinions about this, which come out in the course of the episode. Here are some of my top principles:

  • Writing about your work helps you understand your work.
  • You need to write with your audience in mind, and “the same thing” may need to be said multiple times, in different ways, for different audiences.
  • As much as possible, have a single source of truth, published to multiple places as needed.
  • It’s good to automate collecting information together for publishing, but a human editing pass makes the result much better.

It was a fun discussion. Because we are both nerds caught up in the energy of the moment, along the way we also touched on other topics like Twitter vs Mastodon, ChatGPT and so on. Give it a listen. I’m always interested to hear about other approaches and tools.

Secure maintainer workflow, continued

Thursday 22 December 2022

Picking up from Secure maintainer workflow, especially the comments there (thanks!), here are some more things I’m doing to keep my maintainer workflow safe.

1Password ssh: I’m using 1Password as my SSH agent. It works really well, and uses the Mac Touch ID for authorization. Now I have no private keys in my ~/.ssh directory. I’ve been very impressed with 1Password’s helpful and comprehensive approach to configuration and settings.

Improved environment variables: I’ve updated my opvars and unopvars shell functions that set environment variables from 1Password. Now I can name sets of credentials (defaulting to the current directory name), and apply multiple sets. Then unopvars knows all that have been set, and clears all of them.

Public/private GitHub hosts: There’s a problem with using a fingerprint-gated SSH agent: some common operations want an SSH key but aren’t actually security sensitive. When pulling from a public repo, you don’t want to be interrupted to touch the sensor. Reading public information doesn’t need authentication, and you don’t want to become desensitized to the importance of the sensor. Pulling changes from a git repo with a “git@” address always requires SSH, even if the repo is public. It shouldn’t require an alarming interruption.

Git lets you define “insteadOf” aliases so that you can pull using “https:” and push using “git@”. The syntax seems odd and backwards to me, partly because I can define pushInsteadOf, but there’s no pullInsteadOf:

[url "git@github.com:"]
    # Git remotes of "git@github.com" should really be pushed using ssh.
    pushInsteadOf = git@github.com:

[url "https://github.com/"]
    # Git remotes of "git@github.com" should be pulled over https.
    insteadOf = git@github.com:

This works great, except that private repos still need to be pulled using SSH. To deal with this, I have a baroque contraption arrangement using a fake URL scheme “github_private:” like this:

[url "git@github.com:"]
    pushInsteadOf = git@github.com:
    # Private repos need ssh in both directions.
    insteadOf = github_private:

[url "https://github.com/"]
    insteadOf = git@github.com:

Now if I set the remote URL to “github_private:nedbat/secret.git”, then activity will use “git@github.com:nedbat/secret.git” instead, for both pushing and pulling. (BTW: if you start fiddling with this, “git remote -v” will show you the URLs after these remappings, and “git config --get-regex ‘remote.*.url’” will show you the actual settings before remapping.)

But how to set the remote to “github_private:nedbat/secret.git”? I can set it manually for specific repos with “git remote”, but I also clone entire organizations and don’t want to have to know which repos are private. I automate the remote-setting with an aliased git command I can run in a repo directory that sets the remote correctly if the repo is private:

[alias]
    # If this is a private repo, change the remote from "git@github.com:" to
    # "github_private:".  You can remap "github_private:" to "git@" like this:
    #
    #   [url "git@github.com:"]
    #       insteadOf = github_private:
    #
    # This requires the gh command: https://cli.github.com/
    #
    fix-private-remotes = "!f() { \
        vis=$(gh api 'repos/{owner}/{repo}' --template '{{.visibility}}'); \
        if [[ $vis == private ]]; then \
            for rem in $(git remote); do \
                echo Updating remote $rem; \
                git config remote.$rem.url $(git config remote.$rem.url | \
                    sed -e 's/git@github.com:/github_private:/'); \
            done \
        fi; \
    }; f"

This uses GitHub’s gh command-line tool, which is quite powerful. I’m using it more and more.

This is getting kind of complex, and is still a work in progress, but it’s working. I’m always interested in ideas for improvements.

Mastodon: servers, good and bad

Sunday 27 November 2022

Like many people these days, I’m looking around at alternatives to Twitter, and Mastodon is a clear favorite. Comparing the two, the big differences, both good and bad, come down to servers.

Twitter is one web site, run by one corporation. This is easy to understand, and makes it easy to find lots of people, but has the disadvantage that a deranged over-leveraged misanthropic moron can buy it and make it terrible.

Mastodon is both a web application and a galaxy of tens of thousands of separate servers running that software. Each server is run by a different person (or team), each with their own idea of what the server is for, and how it should be run. The servers are not isolated islands, they are federated together: they communicate with each other so that people on one server can see the posts from other servers.

This is the big difference between Twitter (centralized) and Mastodon (distributed and federated), and it plays out in both good and bad ways. I’m new to Mastodon, so take all this with a grain of salt. Some of what I label as bad is due to growing pains and can be improved over time. Some is essential: distributed grass-roots organization is messy and requires work. But it doesn’t suffer from Twitter’s current problem, so it’s worth working at.

Here are my impressions of the pros and cons of distributed federation:

Initial hurdles (Bad) When you try to join Mastodon, you are immediately asked to choose a server. This is not an easy decision. Coming from a centralized service like Twitter (or Facebook, YouTube, Reddit, etc, etc), the question makes no sense. The list of servers to choose from is a tiny fraction of all the ones out there, and they seem to be very strange niche communities. The implications of the choice are not obvious either: how will it affect my Mastodon experience, and can I change my mind later? The choice cannot be avoided, you need to pick a server to be on. But over time, Mastodon can make the choice easier.

Independence (Good) Because there are many servers, Mastodon can’t be bought by a crazed billionaire. There is no company, no centralized thing that can suddenly change hands. If part of the galaxy goes in a bad direction, the rest of the servers are unaffected.

Moderation (Mixed) Moderation is handled on each server by the server admins. This is good: you can join a server with a moderation policy you approve of. But it also means that server operators have to handle this delicate and complicated job. It might become a chore they don’t want to do, or can’t do because of volume or shifting interests. A well-moderated server could slide into disrepair.

Server as community (Mixed) One philosophy of server choice is that a server will be for a certain community. There are knitting servers and pet-lover servers, liberal servers and conservative servers, and so on. This can be a good way to cluster together with people you have an especially tight bond with. But most people are not members of a single community. I live in Boston, make software, like math, and am a parent. Which of those community identities should I use to choose a server? I don’t have to use any of them. Not all servers are organized around a topic at all. No matter what server you choose, you are not limited to just one topic. On the other hand, some servers are so topic-focused that they might consider some posts offensive that other servers see as perfectly normal. Some aspects of Mastodon’s design and culture grew from this idea of “server as community,” like the Local Timeline, which is posts by anyone on your server. As Mastodon attracts more people, the value and feasibility of those aspects are weakening.

Re-centralization (Bad) As people join Mastodon, there’s a natural tendency to avoid the question of choosing a server by joining what appear to be the “default servers.” This leads to over-sized servers that don’t fit well with the original design of Mastodon and can lead to problems. Right now, the largest servers are closed to new accounts, pushing people to spread out more, which is good.

Defederation (Mixed) A server can decide to never communicate with some other specific server. Your vegan community server could decide not to federate with the safari hunter community server, because of the clash of interests. That’s good, it helps focus you on what you want to see and keeps communities cohesive. It gives server admins a giant kill switch to keep their servers safe. The downside is that one malicious user on a large server could cause other admins to defederate that entire server, and now all of that servers’ users are invisible to you. Wholesale defederation is frowned upon and a last resort, but as small-server moderators try to handle the growth of Mastodon, it could happen more, and you have little say over it.

Incomplete visibility (Bad) Federation is not total: there will always be posts that you cannot see. Roughly speaking, you can see posts by anyone on your server, and by anyone on the home server of any person followed by people on your server. This will be a lot of posts, but it will never be complete.

Server turnover (Bad) Your server could go away. Many servers are run by people because it seemed like an interesting thing to do. Maybe their interests shift. Maybe their server attracted more people than they were willing to support, either in infrastructure costs and effort, or in moderation effort. The server could stop operating, or could just silently wither. It could even be purchased by an egomaniacal tyrant. But it would only be one server, and you can choose a new server and move to it, bringing along your followers.

Open source (Mostly Good) Mastodon software is open source, which means it is safe from takeover, because anyone can take a copy of the software and extend it if they need to. It also means that many people can help improve it, although open source can sometimes be weak on the processes for how to decide what changes to make. A downside is that different servers can be running different versions of the same software, or can be running different forks of the software, so there can be differences between them in what they support and how they work.

Funding (Mixed) Because there is no central company, every server admin has to fund their own server, both the technical infrastructure, and the time they put into it. This can be difficult, and given the recent surge of interest, unpredictable. Some servers ask for donations, some even charge money to join. The good news is there are no ads (yet?), but you might need to chip in a few dollars to keep your server going. The good news is that you can choose who to give your money to.

Following and liking (Bad) When you see a post in Mastodon, it is easy to follow the author: there’s a Follow button just like on Twitter. You can also favorite (like) their post easily. But if someone sends you a link to a Mastodon post, you will be visiting that post’s server instead of your own. You won’t be logged in there, because you have an account on your own server, but not on any others. Then following the author or liking the post is more difficult: you have to copy a URL and paste it into the search box on your own Mastodon. This makes off-Mastodon discovery of new people harder.

User names (Bad) Your handle on Mastodon is more complicated than on Twitter, because you need to name your server also. I am @nedbat@hachyderm.io. It’s more to remember than just @nedbat on Twitter, and it doesn’t fit as well in places like the footer of presentations. Also, someone else can take @nedbat@somewhere.else.xyz, and there’s nothing I can do about it.

ActivityPub (Good) The rabbit-hole goes deeper: Mastodon is just one application built on a standard protocol called ActivityPub. Other services can also implement it, and be part of the universe of federated applications known as the Fediverse. Tumblr has announced they will support it, so you could see many services all mingling together.

•    •    •

So, what to make of all this? Twitter seems on a bad track, and it’s hard to imagine any new centralized social network gaining Twitter-scale adoption. Mastodon is harder than Twitter, but has its advantages. Decentralization is messy, and while authoritarian centralized services are simpler to understand, they offer little freedom, and bring great risk.

But as Dan Sinker said:

*whispers* Most people don’t care about decentralization and governance models, they just want someplace to hang out that’s not a hassle.

Twitter was never going to last forever. I don’t know if Mastodon will grow to anything close to its size, or what else might. But it’s interesting to watch the evolution and try it out.

Secure maintainer workflow

Monday 21 November 2022

I’m trying to establish a more secure workflow for maintaining public packages.

Like most developers, I have terminal sessions with implicit access to credentials. For example, I can make git commits and push to GitHub without having to type a password.

There are two ways this might be a problem. The first is unlikely: a bad guy gets onto my computer and uses the credentials to cause havoc. This is unlikely mostly because a bad guy won’t get my computer, but also, if it does fall into the wrong hands, it will probably be someone looking to resell the laptop, not use my coverage.py credentials maliciously.

The second way is a more serious concern: I could unknowingly run evil or buggy code that uses my credentials in bad ways. People write bug reports for coverage.py, and if I am lucky, they include steps to reproduce the problem. Sometimes the instructions involve small self-contained examples, and I can just run them without fear. But sometimes the steps are clone this repo, and run this large test suite. It’s impossible to review all of that code. I don’t know what the code will do, but if I want to see and diagnose the problem, I have to run it.

I’m trying to reduce the possibilities for bad outcomes, in a few ways:

1Password: where possible, I store credentials in 1Password, and use tooling to get them into environment variables. I have two shell functions (opvars / unopvars) that find values in a vault based on the current directory, and can set and unset them in the environment.

With this, I can have the credentials in the environment for just long enough to use them. This works well for things like PyPI credentials, which are used rarely and could cause significant damage.

But I still also have implicit credentials in my ~/.ssh directory and ~/.netrc file. I’m not sure the best approach to keep them from being available to programs that shouldn’t have them.

Docker: To really isolate unknown code, I use a Docker container. I start with a base image with many versions of Python: base.dockerfile, and then build on it to create a main image that doesn’t even have sudo. In the container, there are no credentials, so I don’t have to worry about malice or accidents. For involved debugging, I might write another Dockerfile FROM these to reduce the re-work that has to happen when starting over.

What else can I be doing to keep safe?

Older:

Aug 27:

Stilted