Loop targets

Tuesday 19 November 2024

I posted a Python tidbit about how for loops can assign to other things than simple variables, and many people were surprised or even concerned:

Sample Python assigning to a dict item in a for loop, same as text below
params = {
    "query": QUERY,
    "page_size": 100,
}

# Get page=0, page=1, page=2, ...
for params["page"] in itertools.count():
    data = requests.get(SEARCH_URL, params).json()
    if not data["results"]:
        break
    ...

This code makes successive GET requests to a URL, with a params dict as the data payload. Each request uses the same data, except the “page” item is 0, then 1, 2, and so on. It has the same effect as if we had written it:

for page_num in itertools.count():
    params["page"] = page_num
    data = requests.get(SEARCH_URL, params).json()

One reply asked if there was a new params dict in each iteration. No, loops in Python do not create a scope, and never make new variables. The loop target is assigned to exactly as if it were an assignment statement.

As a Python Discord helper once described it,

While loops are “if” on repeat. For loops are assignment on repeat.

A loop like for <ANYTHING> in <ITER>: will take successive values from <ITER> and do an assignment exactly as this statement would: <ANYTHING> = <VAL>. If the assignment statement is ok, then the for loop is ok.

We’re used to seeing for loops that do more than a simple assignment:

for i, thing in enumerate(things):
    ...

for x, y, z in zip(xs, ys, zs):
    ...

These work because Python can assign to a number of variables at once:

i, thing = 0, "hello"
x, y, z = 1, 2, 3

Assigning to a dict key (or an attribute, or a property setter, and so on) in a for loop is an example of Python having a few independent mechanisms that combine in uniform ways. We aren’t used to seeing exotic combinations, but you can reason through how they would behave, and you would be right.

You can assign to a dict key in an assignment statement, so you can assign to it in a for loop. You might decide it’s too unusual to use, but it is possible and it works.

Coverage.py originally

Saturday 2 November 2024

Something many people don’t realize is that I didn’t write the original coverage.py. It was written by Gareth Rees in 2001. I’ve been extending and maintaining it since 2004. This ancient history came up this week, so I grabbed the 2001 version from archive.org to keep it here for posterity.

I already had a copy of Gareth’s original page about coverage.py, which now links to my local copy of coverage.py from 2001. BTW: that page is itself a historical artifact now, with the header from this site as it looked when I first copied the page.

The original coverage.py was a single file, so the “coverage.py” name was literal: it was the name of the file. It only had about 350 lines of code, including a few to deal with pre-2.0 Python! Some of those lines remain nearly unchanged to this day, but most of it has been heavily refactored and extended.

Coverage.py now has about 20k lines of Python in about 100 files. The project now has twice the amount of C code as the original file had Python. I guess in almost 20 years a lot can happen!

It’s interesting to see this code again, and to reflect on how far it’s come.

GitHub action security: zizmor

Wednesday 30 October 2024

Zizmor is a new tool to check your GitHub action workflows for security concerns. I found it really helpful to lock down actions.

Action workflows can be esoteric, and continuous integration is not everyone’s top concern, so it’s easy for them to have subtle flaws. A tool like zizmor is great for drawing attention to them.

When I ran it, I had a few issues to fix:

  • Some data available to actions is manipulable by unknown people, so you have to avoid interpolating it directly into shell commands. For example, you might want to add the branch name to the action summary:
    - name: "Summarize"
      run: |
        echo "### From branch ${{ github.ref }}" >> $GITHUB_STEP_SUMMARY
    But github.ref is a branch name chosen by the author of the pull request. It could have a shell injection which could let an attacker exfiltrate secrets. Instead, put the value into an environment variable, then use it to interpolate:
    - name: "Summarize"
      env:
        REF: ${{ github.ref }}
      run: |
        echo "### From branch ${REF}" >> $GITHUB_STEP_SUMMARY
  • The actions/checkout step should avoid persisting credentials:
    - name: "Check out the repo"
      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      with:
        persist-credentials: false
  • In steps where I was pushing to GitHub, this meant I needed to explicitly set a remote URL with credentials:
    - name: "Push digests to pages"
      env:
        GITHUB_TOKEN: ${{ secrets.token }}
      run: |
        git config user.name nedbat
        git config user.email ned@nedbatchelder.com
        git remote set-url origin https://x-access-token:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git

There were some other things that were easy to fix, and of course, you might have other issues. One improvement to zizmor: it could link to explanations of how to fix the problems it finds, but it wasn’t hard to find resources, like GitHub’s Security hardening for GitHub Actions.

William Woodruff is zizmor’s author. He was incredibly responsive when I had problems or questions about using zizmor. If you hit a snag, write an issue. It will be a good experience.

If you are like me, you have repos lying around that you don’t think about much. These are a special concern, because their actions could be years old, and not well maintained. These dusty corners could be a good vector for an attack. So I wanted to check all of my repos.

With Claude’s help I wrote a shell script to find all git repos I own and run zizmor on them. It checks the owner of the repo because my drive is littered with git repos I have no control over:

#!/bin/bash
# zizmor-repos.sh

echo "Looking for workflows in repos owned by: $*"

# Find all git repositories in current directory and subdirectories
find . \
    -type d \( \
        -name "Library" \
        -o -name "node_modules" \
        -o -name "venv" \
        -o -name ".venv" \
        -o -name "__pycache__" \
    \) -prune \
    -o -type d -name ".git" -print 2>/dev/null \
| while read gitdir; do
    # Get the repository directory (parent of .git)
    repo_dir="$(dirname "$gitdir")"

    # Check if .github/workflows exists
    if [ -d "${repo_dir}/.github/workflows" ]; then
        # Get the GitHub remote URL
        remote_url=$(git -C "$repo_dir" remote get-url origin)

        # Check if it's our repository
        # Handle both HTTPS and SSH URL formats
        for owner in $*; do
            if echo "$remote_url" | grep -q "github.com[/:]$owner/"; then
                echo ""
                echo "Found workflows in $owner repository: $repo_dir"
                ~/.cargo/bin/zizmor "$repo_dir/.github/workflows"
            fi
        done
    fi
done

After fixing issues, it’s very satisfying to see:

% zizmor-repos.sh nedbat BostonPython
Looking for workflows in repos owned by: nedbat BostonPython

Found workflows in nedbat repository: ./web/stellated
🌈 completed ping-nedbat.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./web/nedbat_nedbat
🌈 completed build.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./scriv
🌈 completed tests.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./lab/gh-action-tests
🌈 completed matrix-play.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./aptus/trunk
🌈 completed kit.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./cog
🌈 completed ci.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./dinghy/nedbat
🌈 completed test.yml
🌈 completed daily-digest.yml
🌈 completed docs.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./dinghy/sample
🌈 completed daily-digest.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./coverage/badge-samples
🌈 completed samples.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./coverage/django_coverage_plugin
🌈 completed tests.yml
No findings to report. Good job!

Found workflows in nedbat repository: ./coverage/trunk
🌈 completed dependency-review.yml
🌈 completed publish.yml
🌈 completed codeql-analysis.yml
🌈 completed quality.yml
🌈 completed kit.yml
🌈 completed python-nightly.yml
🌈 completed coverage.yml
🌈 completed testsuite.yml
No findings to report. Good job!

Found workflows in BostonPython repository: ./bospy/about
🌈 completed past-events.yml
No findings to report. Good job!

Nice.

Git aliases: switch to mainster, etc

Saturday 26 October 2024

I use a lot of git aliases because I work in the terminal and aliases give me short commands for common operations. They are defined in my global git config file and range from simple to powerful but twisty.

First, some basic aliases for operations I do often:

[alias]
    br = branch
    co = checkout
    sw = switch
    d = diff
    di = diff
    s = status -s -b --show-stash

These are simple, but others could use some explanation.

Committing

I have a few aliases for committing code. The “ci” alias provides the default option “--edit” so that even if I provide a message on the command line with “git ci -m”, it will pop me into the editor to provide more detail. “git amend” is for updating the last commit with the latest file edits I’ve made, and “git edit” is for updating the commit message on the latest commit:

[alias]
    ci = commit --edit
    amend = commit --amend --no-edit
    edit = commit --amend --only

Returning home

I work in many repos. Many have a primary branch called “main” but in some it’s called “master”. I don’t want to have to remember which is which, so I have an alias “git ma” that returns me to the primary branch however it’s named. It uses a helper alias to find the name of the primary branch:

[alias]
    # Find the name of the primary branch, either "main" or "master".
    primary = "!f() { \
        git branch -a | \
        sed -n -E -e '/remotes.origin.ma(in|ster)$/s@remotes/origin/@@p'; \
    }; f"

If you haven’t seen this style of alias before, the initial exclamation point means it’s a shell command not a git command. Then we use shell f() {···}; f syntax to define a function and immediately invoke it. This lets us use shell commands in a pipeline, access arguments with $1, and so on. (Fetching GitHub pull requests has more about this technique.)

This alias uses the “git branch -a” command to list all the branches, then pipes it into the Unix sed command to find the remote one named either “main” or “master”.

With “git primary” defined, we can define the “ma” alias to switch to the primary branch and pull the latest code. I like “ma” because it’s short for both main and master, and because it feels like coming home (“Hi ma!”):

[alias]
    # Switch to main or master, whichever exists, and update it.
    ma = "!f() { \
        git checkout $(git primary) && \
        git pull; \
    }; f"

For repos with an upstream, I need to pull their latest code and also push to my fork to get everything in sync. For that I have “git mma” (like ma but more):

[alias]
    # Pull the upstream main/master branch and update our fork.
    mma = "!f() { \
        git ma && \
        git pull upstream $(git primary) --ff-only && \
        git push; \
    }; f"

Merging and finishing branches

For personal projects, I don’t use pull requests to make changes. I work on a branch and then merge it to main. The “brmerge” alias merges a branch and then deletes the merged branch:

[alias]
    # Merge a branch, and delete it here and on the origin.
    brmerge = "!f() { \
        : git show; \
        git merge $1 && \
        git branch -d $1 && \
        git push origin --delete $1; \
    }; f"

This shows another technique: the : git show; command does nothing but instructs zsh’s tab completion that this command takes the same arguments as “git show”. In other words, the name of a branch. That argument is available as $1 so we can use it in the aliased shell commands.

Often what I want to do is switch from my branch to main, then merge the branch. The “brmerge-” alias does that. The “-” is similar to “git switch -” which switches to the branch you last left:

[alias]
    # Merge the branch we just switched from.
    brmerge- = "!f() { \
        git brmerge $(git rev-parse --abbrev-ref @{-1}); \
    }; f"

Finally, “git brdone” is what I use from a branch that has already been merged in a pull request. I return to the main branch, and delete the work branch:

[alias]
    # I'm done with this merged branch, ready to switch back to another one.
    brdone = "!f() { \
        : git show; \
        local brname=\"$(git symbolic-ref --short HEAD)\" && \
        local primary=\"$(git primary)\" && \
        git checkout ${1:-$primary} && \
        git pull && \
        git branch -d $brname && \
        git push origin --delete $brname; \
    }; f"

This one is a monster, and uses “local” to define shell variables I can use in a few places.

There are other aliases in my git config file, some of which I’d even forgotten I had. Maybe you’ll find other useful pieces there.

Changelog automation

Sunday 29 September 2024

I have two main approaches for producing changelogs, but both are based on the same principles: make it convenient for the author to create them, then make it possible to use the information automatically to benefit the readers.

The first way is with a tool such as scriv, which I wrote, but which was inspired by previous similar tools like towncrier and CPython’s blurb. They let you write your changelog one entry at a time in the same pull request as the product change itself. The entries are individual uniquely named files that are collected together when a release is made. This avoids merge conflicts that will happen if a number of developers have to all edit the same changelog file.

The second way I maintain a changelog is how I do it for coverage.py. This predates scriv, and is more custom-coded, so I’ll walk through the steps. Maybe you will be inspired to add bits to other tooling.

I hand-edit a CHANGES.rst file. An entry there might look like this:

CHANGES.rst

- Fix: we failed calling
  :func:`runpy.run_path <python:runpy.run_path>`, as described
  in `issue 1234`_.  This is now fixed, thanks to `Debbie Developer
  <pull 2345_>`_.  Details are on the :ref:`configuration page
  <config_report_format>`.

.. _issue 1234: https://github.com/nedbat/coveragepy/issues/1234
.. _pull 2345: https://github.com/nedbat/coveragepy/pull/2345

This lets me use semantic linking mechanisms. GitHub displays .rst files, but doesn’t understand the :ref:-style of links unfortunately.

The changelog is part of the docs for the project, pulled into the docs/ tree with a Sphinx directive. The :end-before: lets me have end-page content in CHANGES.rst that don’t appear in the docs:

doc/changes.rst

.. include:: ../CHANGES.rst
    :end-before: scriv-end-here

It’s great when researching a bug fix in other projects to see an issue closed with a comment about the commit that fixed it. Even better is when the issue mentions what release first had the fix. I automate that process for coverage.py.

To do that and a few other things, I have some custom tooling. It’s a bit baroque because it grew over time, but it suits my purposes. First I need to get the changelog into a more easily understood form. Sphinx has a little-known feature to produce .rst files as output. It sounds paradoxical, but the benefit is that all links are reduced to their simplest form. The entry above becomes:

tmp/changes.rst

*  Fix: we failed calling
   https://docs.python.org/3/library/runpy.html#runpy.run_path, as
   described in `issue 1234
   <https://github.com/nedbat/coveragepy/issues/1234>`_.  This is now
   fixed, thanks to `Debbie Developer
   <https://github.com/nedbat/coveragepy/pull/2345>`_.  Details are on
   the `configuration page <config.rst#config-report-format>`_.

Then pandoc converts it to Markdown and my parse_relnotes.py creates a JSON file to make it easy to find entries for each version:

[
    {
        "version": "7.6.1",
        "text": "-   Fix: coverage used to fail when measuring code using ...",
        "prerelease": false,
        "when": "2024-08-04"
    },
    ...

Finally(!) comment_on_fixes.py gets the latest release from the JSON file, regexes it for GitHub URLs in the text, and adds a comment to closed issues and merged pull requests:

This is now released as part of [coverage 7.x.y](https://pypi.org/project/coverage/7.x.y).

The other automated output from my CHANGES.rst file is a GitHub release. GitHub releases are both convenient and problematic. I don’t like the idea of authoring content on GitHub that is only available on GitHub. The history of my project is an important part of my project, so I want the source of truth to be a version-controlled text file in my source distribution. But people want to see GitHub releases. So I author in CHANGES.rst, but publish to GitHub releases.

Using github_releases.py I automatically generate a GitHub release from the JSON file. This was useful enough that I added a github-release command to scriv to do a similar thing, but coverage.py still has the custom code to take advantage of the rst link simplifications I showed above.

One of the things I don’t like about GitHub releases is that they always have “Assets” appended to the end, with links to .zip and .tar.gz snapshots of the repo. Those aren’t the right way to get the package, so I include the link to the PyPI page and the correct command to install the package.

Describing all this, it sounds complicated, and I guess it is. I like being able to publish information to people who want it, and this automation accomplishes that.

Older: