Ordered dict surprises

Monday 12 October 2020

Since Python 3.6, regular dictionaries retain their insertion order: when you iterate over a dict, you get the items in the same order they were added to the dict. Before 3.6, dicts were unordered: the iteration order was seemingly random.

Here are two surprising things about these ordered dicts.

You can’t get the first item

Since the items in a dict have a specific order, it should be easy to get the first (or Nth) item, right? Wrong. It’s not possible to do this directly. You might think that d[0] would be the first item, but it’s not, it’s the value of the key 0, which could be the last item added to the dict.

The only way to get the Nth item is to iterate over the dict, and wait until you get to the Nth item. There’s no random access by ordered index. This is one place where lists are better than dicts. Getting the Nth element of a list is an O(1) operation. Getting the Nth element of a dict (even if it is ordered) is an O(N) operation.

OrderedDict is a little different

If dicts are ordered now, collections.OrderedDict is useless, right? Well, maybe. It won’t be removed because that would break code using that class, and it has some methods that regular dicts don’t. But there’s also one subtle difference in behavior. Regular dicts don’t take order into account when comparing dicts for equality, but OrderedDicts do:

>>> d1 = {"a": 1, "b": 2}
>>> d2 = {"b": 2, "a": 1}
>>> d1 == d2
True
>>> list(d1)
['a', 'b']
>>> list(d2)
['b', 'a']

>>> from collections import OrderedDict
>>> od1 = OrderedDict([("a", 1), ("b", 2)])
>>> od2 = OrderedDict([("b", 2), ("a", 1)])
>>> od1 == od2
False
>>> list(od1)
['a', 'b']
>>> list(od2)
['b', 'a']
>>>

BTW, this post is the result of a surprisingly long and contentious discussion in the Python Discord.

Working with many git repos

Monday 12 October 2020

Some of my work on the Open edX team at edX requires working with the three dozen or so repos that form the backbone of the Open edX software. That often means doing the same thing to all of them (tagging, logs, etc).

To make it easier to work with a collection of repos, I have a shell function to run the same command on the git directories found under the current directory. It gets a little more complicated than that: I might have 100 repos in the current directory, but only the ones that have certain master branches should be included in an operation.

My function is called “gittreeif”: it takes a branch name and a command, and walks the current directory tree looking for git repos that have that branch. For each one, it executes the command:

$ gittreeif origin/juniper.master git status

I also define “gittree”, which runs on every repo regardless of its branches.

Here is the definition of gittreeif. Put it in your shell startup file (.bashrc, .zshrc, whatever):

# Run a command for every repo found somewhere beneath the current directory.
#
#   $ gittree git fetch --all --prune
#
# To only run commands in repos with a particular branch, use gittreeif:
#
#   $ gittreeif branch_name git fetch --all --prune
#
# If the command has subcommands that need to run in each directory, quote the
# entire command:
#
#   $ gittreeif origin/foo 'git log --format="%s" origin/foo ^$(git merge-base origin/master origin/foo)'
#
# The directory name is printed before each command.  Use -q to suppress this,
# or -r to show the origin remote url instead of the directory name.
#
#   $ gittreeif origin/foo -q git status
#
gittreeif() {
    local test_branch="$1"
    shift
    local show_dir=true show_repo=false
    if [[ $1 == -r ]]; then
        # -r means, show the remote url instead of the directory.
        shift
        local show_dir=false show_repo=true
    fi
    if [[ $1 == -q ]]; then
        # -q means, don't echo the separator line with the directory.
        shift
        local show_dir=false show_repo=false
    fi
    find . -name .git -type d -prune | while read d; do
        local d=$(dirname "$d")
        git -C "$d" rev-parse --verify -q "$test_branch" >& /dev/null || continue
        if [[ $show_dir == true ]]; then
            echo "---- $d ----"
        fi
        if [[ $show_repo == true ]]; then
            echo "----" $(git -C "$d" config --get remote.origin.url) "----"
        fi
        if [[ $# == 1 && $1 == *' ']]; then
            (cd "$d" && eval "$1")
        else
            (cd "$d" && "$@")
        fi
    done
}

gittree() {
    # @ is in every repo, so this runs on all repos
    gittreeif @ "$@"
}

Let’s say I want to summarize the changes between two tags. Here’s a convenient alias to put in your ~/.gitconfig:

[alias]
    relnotes = log --pretty='%h %ad %an: %s' --date=short --no-merges

The git command to show the changes between “old-commit” and “new-commit” is:

git log new-commit ^old-commit

Putting it all together: to see the changes between juniper.2 and juniper.3 in all the repos that have Juniper branches, using “relnotes” to get the summary style I like:

$ gittreeif \
    open-release/juniper.master \
    git relnotes open-release/juniper.3 ^open-release/juniper.2
---- ./ecommerce ----
 ca9cddb4 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./devstack ----
 10f02ca 2020-08-17 Zachary Trabookis: Remove `xqueue` as `DEFAULT_SERVICES` for
 8ff8dd0 2020-08-17 Zachary Trabookis: Make additional adjustments to the docume
 57455fe 2020-08-10 Zachary Trabookis: Add `xqueue` to default services to provi
 3ca4c9d 2020-07-29 Zachary Trabookis: Make sure to pass in `DOCKER_COMPOSE_FILE
 cef4aa2 2020-07-28 Zachary Trabookis: Updated `README` to include necessary inf
 9415683 2020-07-27 Zachary Trabookis: Update `docker` commands to be `docker-co
 67c7c9b 2020-08-16 morenol: Do not use openedx release for registrar and edx-mk
 56312bc 2020-08-04 Guruprasad Lakshmi Narayanan: Remove duplicate section
 34a46a3 2020-07-24 Guruprasad Lakshmi Narayanan: Remove the non-release service
---- ./xqueue ----
 f004caa 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./edx-e2e-tests ----
---- ./edx-platform ----
 d9e0ca5e70 2020-08-12 Ali-D-Akbar: This commit contains security fixes for the
 c8421f66fc 2020-08-07 uzairr: Fix xss vulnerabilities in templates
 47ab6af637 2020-08-06 Attiya Ishaque: [YONK-1759] Version bump of studio-fronte
 8dd78619c9 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
 b295389e96 2020-07-23 Zachary Trabookis: Set `SESSION_COOKIE_SAMESITE=Lax` for
 91af099933 2020-07-23 uzairr: Fix xss in templates
 0e45ecb743 2020-07-22 Ali-D-Akbar: Sustaining xss fixes This commit contains xs
 3757f0d11e 2020-07-06 Florian Haas: Fix profile image URLs for image storage on
---- ./edx-analytics-pipeline ----
---- ./repo-tools/repo-tools ----
---- ./edx-notes-api ----
 ad53edd 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./cs_comments_service ----
 3079804 2020-08-19 Samuel Walladge: Bump codecov to latest version
---- ./course-discovery ----
 e984f273 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./credentials ----
 7a7aab55 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./src/edx-analytics-configuration ----
---- ./src/edx-documentation ----
---- ./src/configuration ----
 05bb4edcf 2020-08-24 Feanil Patel: Improve sandboxing. (#5953) (#5960)
 860994c0d 2020-08-21 Feanil Patel: Timmc/codejail improvements (#5956)
---- ./src/enterprise-catalog ----
 f886da6 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./src/blockstore ----
---- ./src/edx-analytics-data-api ----
 64b4c7f 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./src/frontend-app-publisher ----
---- ./src/edx-app-android ----
---- ./src/notifier ----
---- ./src/edx-analytics-dashboard ----
 b8dfa559 2020-08-05 Ned Batchelder: Upgrade Django to 2.2.15
---- ./src/frontend-app-support-tools ----
---- ./src/edx-app-ios ----
---- ./src/edx-demo-course ----
---- ./src/ecommerce-worker ----
---- ./src/frontend-app-learning ----
---- ./src/edx-certificates ----
---- ./src/frontend-app-profile ----
---- ./src/license-manager ----
 85003a6 2020-08-05 Ned Batchelder: Upgrade to Django 2.2.15
---- ./src/testeng-ci ----
---- ./src/frontend-app-gradebook ----
---- ./src/edx-developer-docs ----
---- ./src/frontend-app-account ----

This is how I do it. There are probably other tools to do the same job. Maybe someone will point them out... :)

Değişken Deyince Ne Anlamalı?

Saturday 10 October 2020

Enes Başpınar has translated one of my popular pages into Turkish: Değişken Deyince Ne Anlamalı? is his translation of my Facts and myths about Python names and values.

Google Translate tells me the Turkish title means, “What Should It Understand When You Say Variables?,” which I guess is better translated as, “What Do We Mean When We Say Variables?”

It’s flattering that a piece is liked enough for someone to translate it. The only previous page that was translated was Cog, into Russian (twice!).

If you want to translate something on this site, let me know.

Scriv

Sunday 20 September 2020

I’ve written a tool for managing changelog files, called scriv. It focuses on a simple workflow, but with lots of flexibility.

I’ve long felt that it’s enormously beneficial for engineers to write about what they do, not only so that other people can understand it, but to help the engineers themselves understand it. Writing about a thing gives you another perspective on it, your own code included.

The philosophy behind scriv, and a quick list of other similar tools, is on the Philosophy page in the docs.

Scriv only does a few things now, but I’m interested to hear about other changelog workflows that could use better tooling.

Song-basket

Sunday 13 September 2020

I threw together a Spotify API program called song-basket. I have a few large themed playlists (for example, Instrumental Funk). This app is to help me add songs to them. I can choose a playlist (the basket), and then as I surf around Spotify, it lets me add the current song to the basket with one click. It also shows me whether the current song is already in the basket or not, which they often are. If the song is already in the basket, I don’t have to think about whether to add it, and I don’t have to deal with the annoying “Add duplicate?” question.

This started as an example in the Tekore docs, and I hacked at it until it did what I wanted. A lot of it is wrong: no templating, incorrect HTML, a stateful web application, horrid styling, and so on. It doesn’t matter, it’s a quick app to do what I need. If I want, I can polish it later.

How to be helpful online

Thursday 10 September 2020

Helping people online is difficult. We expect technical questions and discussions, but everyone involved are just people, so it doesn’t always go smoothly. There’s no way to guarantee a good outcome, but there are things we as helpers can do to improve the interactions.

There are plenty of pieces out there explaining how to ask good questions. This piece is different: it’s aimed at the helpers, not the askers. We helpers are the experts and the regulars. We are the constants in the help forums. How we behave sets the tone for everyone. We can’t expect to “fix” the askers.

Mostly, these ideas came from my experience in the #python IRC channel on Freenode, but they apply anywhere people are trying to communicate. The more emotionally weak the communication channel, the harder it will be to keep things going smoothly, so IRC is a tough medium and a good laboratory.

Let me say at the outset though: I have done and still do, all of the wrong things. Helping people online is not easy. Perhaps askers don’t know how to ask, or how to interpret our answers. Perhaps English is not their first language. We don’t know what they already know. And we are all human, so we are bringing our complex emotional state with us.

So I know this is hard. I’m hoping that talking about how things can go wrong will help us make them go right.

Answer the question first

When someone shows their code, it’s easy to lose sight of the question, and jump to other things in the code. Answer their actual question first. Once you’ve helped someone, and built a rapport with them, it will be easier to talk to them about other problems you see in their work.

No third rails

There are some topics which are so disliked that any mention of them brings immediate scorn. This is especially troublesome when we should instead be answering the question first. It should be OK for someone to ask for help with a program using sockets, and not have to defend using sockets, especially if the specific question has nothing to do with sockets.

Other third-rail Python topics include: threads, pickle, multiprocessing, globals, singletons. I know you don’t like them. I know you have been burned by them. I know you have better ideas of how to do what they do. Don’t let that derail the conversation. The goal is to help people. Strong reactions can make the asker feel attacked.

No dog-piling

The #python channel is large, and there are many energetic helpers. Everyone wants to help, that’s why we are there. But there can be too many voices. If people are already helping, you don’t have to chime in. If you have something truly different to say, say it. But if you don’t have a new thing to say, adding another voice could make things more difficult. Beginners can be overwhelmed, or feel like we are ganging up on them.

Conversely, if you have already been helping, and it’s getting frustrating, let someone else take over. You can step away. Maybe someone else will have better luck. I know it’s difficult, because we get invested. “I’m explaining it, and it’s not getting through” is a very frustrating feeling. Hammering away at it probably won’t fix it.

Meet their level

If you are a helper, you are an expert. You have learned tools and techniques over the years that have served you well. Askers may not be ready to hear about those things. The things that work for you might be over their head. Try to determine what they know, and give them a reasonable next step, not the ultimate solution.

A suboptimal solution they understand is better than a gold standard they can’t make use of.

As an expert, it’s tempting to present the full picture of a topic. You’ve mastered intricate details, and you want to share them, and to be accurate. But those extra details added to an ongoing conversation with a beginner can be distracting or confusing. It’s most important to give the beginner what they can use next, not what they will need eventually.

Say yes

As much as possible, answer with Yes answers instead of No answers.

“Is len([1,2,3]) how I get the length of an array?”
Bad: “That’s not an array.” (A “no” answer)
Good: “Yes, though we call them lists, not arrays.” (A “yes” answer)

It’s easy to pounce on incorrect things. It’s also unfriendly and gets in the way of actually providing help.

You are right: that isn’t an array. But you are here to help people, not to ding them for inaccuracies. Find the essence of their question, and answer it with a positive response.

Avoid absolutes

You are an expert, you know things. But strong absolute stances can come off as inflexible, off-putting or even confrontational. Add some doubt words. Even just “in my experience” helps to soften your message and put you on a more equal footing with others. This makes the ideas easier to consider and accept.

Step back

Sometimes interactions go poorly. Misunderstandings accumulate. Small friction makes understanding difficult, which leads to larger friction. When this happens, try to step back.

There are two ways to step back: one is to withdraw from the discussion. Someone else will probably take over. Another is to step back with them. You can talk about the difficulty.

Take some blame

It’s easy when things are going badly to think the other person isn’t trying, can’t be bothered, or won’t be able to understand. Frankly, maybe some of that is true. But try taking some of the blame. Instead of, “are you listening to me,” try saying, “maybe I didn’t explain it well,” or, “I don’t feel like I’m getting through.”

As much as possible, try to avoid “you’re doing it wrong” responses, and try to find ways to share in the effort and troubleshooting of the discussion.

Talking about yourself is always better than talking about them. Talking about the asker sounds accusatory and confrontational.

Use more words

IRC and other online mediums encourage quick short responses, which are exactly the kinds of responses that will be easy to misinterpret. Try to use more words, especially encouraging optimistic words.

Understand your motivations

We want to help, but let’s be honest: there are other forces driving us. There’s a dark appeal in pointing out where someone is wrong. We can retaliate against poor language stewardship by ranting to others. It feels good to win the competition for highest mastery of language implementation arcana.

It’s natural to find outlets for this kind of negative energy, but we have to keep it in check. Focus on helping people.

Humility

A lot of the above advice boils down to being humble. It feels good to help people, to know the answers. Being an expert and knowing things other people don’t know is very satisfying. But you can help people better if you approach the job with humility. Maybe you don’t know everything. Maybe some of it was your fault. You can be gracious in overlooking small mistakes.

Make connections

Another theme running through this advice is: making a connection with a person is more important than the technical details of the conversation. Points of correctness are useless without points of connection. Establish a rapport with people, and then deliver your technical message.

Finally: It’s hard

Again, I know this is all difficult. I know that some people are just not ready to be helped in IRC. Sometimes things will go badly. We can’t fix everything.

But I want things to go as well as they can, and I want us, the helpers, to handle ourselves as well as we can.

If you are looking for other thoughts about this, the Freenode Catalyst Guidelines are also good tips for how to be useful online.

Thanks for helping.

Older:

Jun 28:

2500