Scriv

Sunday 20 September 2020

I’ve written a tool for managing changelog files, called scriv. It focuses on a simple workflow, but with lots of flexibility.

I’ve long felt that it’s enormously beneficial for engineers to write about what they do, not only so that other people can understand it, but to help the engineers themselves understand it. Writing about a thing gives you another perspective on it, your own code included.

The philosophy behind scriv, and a quick list of other similar tools, is on the Philosophy page in the docs.

Scriv only does a few things now, but I’m interested to hear about other changelog workflows that could use better tooling.

Song-basket

Sunday 13 September 2020

I threw together a Spotify API program called song-basket. I have a few large themed playlists (for example, Instrumental Funk). This app is to help me add songs to them. I can choose a playlist (the basket), and then as I surf around Spotify, it lets me add the current song to the basket with one click. It also shows me whether the current song is already in the basket or not, which they often are. If the song is already in the basket, I don’t have to think about whether to add it, and I don’t have to deal with the annoying “Add duplicate?” question.

This started as an example in the Tekore docs, and I hacked at it until it did what I wanted. A lot of it is wrong: no templating, incorrect HTML, a stateful web application, horrid styling, and so on. It doesn’t matter, it’s a quick app to do what I need. If I want, I can polish it later.

How to be helpful online

Thursday 10 September 2020

Helping people online is difficult. We expect technical questions and discussions, but everyone involved are just people, so it doesn’t always go smoothly. There’s no way to guarantee a good outcome, but there are things we as helpers can do to improve the interactions.

There are plenty of pieces out there explaining how to ask good questions. This piece is different: it’s aimed at the helpers, not the askers. We helpers are the experts and the regulars. We are the constants in the help forums. How we behave sets the tone for everyone. We can’t expect to “fix” the askers.

Mostly, these ideas came from my experience in the #python IRC channel on Freenode, but they apply anywhere people are trying to communicate. The more emotionally weak the communication channel, the harder it will be to keep things going smoothly, so IRC is a tough medium and a good laboratory.

Let me say at the outset though: I have done and still do, all of the wrong things. Helping people online is not easy. Perhaps askers don’t know how to ask, or how to interpret our answers. Perhaps English is not their first language. We don’t know what they already know. And we are all human, so we are bringing our complex emotional state with us.

So I know this is hard. I’m hoping that talking about how things can go wrong will help us make them go right.

Answer the question first

When someone shows their code, it’s easy to lose sight of the question, and jump to other things in the code. Answer their actual question first. Once you’ve helped someone, and built a rapport with them, it will be easier to talk to them about other problems you see in their work.

No third rails

There are some topics which are so disliked that any mention of them brings immediate scorn. This is especially troublesome when we should instead be answering the question first. It should be OK for someone to ask for help with a program using sockets, and not have to defend using sockets, especially if the specific question has nothing to do with sockets.

Other third-rail Python topics include: threads, pickle, multiprocessing, globals, singletons. I know you don’t like them. I know you have been burned by them. I know you have better ideas of how to do what they do. Don’t let that derail the conversation. The goal is to help people. Strong reactions can make the asker feel attacked.

No dog-piling

The #python channel is large, and there are many energetic helpers. Everyone wants to help, that’s why we are there. But there can be too many voices. If people are already helping, you don’t have to chime in. If you have something truly different to say, say it. But if you don’t have a new thing to say, adding another voice could make things more difficult. Beginners can be overwhelmed, or feel like we are ganging up on them.

Conversely, if you have already been helping, and it’s getting frustrating, let someone else take over. You can step away. Maybe someone else will have better luck. I know it’s difficult, because we get invested. “I’m explaining it, and it’s not getting through” is a very frustrating feeling. Hammering away at it probably won’t fix it.

Meet their level

If you are a helper, you are an expert. You have learned tools and techniques over the years that have served you well. Askers may not be ready to hear about those things. The things that work for you might be over their head. Try to determine what they know, and give them a reasonable next step, not the ultimate solution.

A suboptimal solution they understand is better than a gold standard they can’t make use of.

As an expert, it’s tempting to present the full picture of a topic. You’ve mastered intricate details, and you want to share them, and to be accurate. But those extra details added to an ongoing conversation with a beginner can be distracting or confusing. It’s most important to give the beginner what they can use next, not what they will need eventually.

Say yes

As much as possible, answer with Yes answers instead of No answers.

“Is len([1,2,3]) how I get the length of an array?”
Bad: “That’s not an array.” (A “no” answer)
Good: “Yes, though we call them lists, not arrays.” (A “yes” answer)

It’s easy to pounce on incorrect things. It’s also unfriendly and gets in the way of actually providing help.

You are right: that isn’t an array. But you are here to help people, not to ding them for inaccuracies. Find the essence of their question, and answer it with a positive response.

Avoid absolutes

You are an expert, you know things. But strong absolute stances can come off as inflexible, off-putting or even confrontational. Add some doubt words. Even just “in my experience” helps to soften your message and put you on a more equal footing with others. This makes the ideas easier to consider and accept.

Step back

Sometimes interactions go poorly. Misunderstandings accumulate. Small friction makes understanding difficult, which leads to larger friction. When this happens, try to step back.

There are two ways to step back: one is to withdraw from the discussion. Someone else will probably take over. Another is to step back with them. You can talk about the difficulty.

Take some blame

It’s easy when things are going badly to think the other person isn’t trying, can’t be bothered, or won’t be able to understand. Frankly, maybe some of that is true. But try taking some of the blame. Instead of, “are you listening to me,” try saying, “maybe I didn’t explain it well,” or, “I don’t feel like I’m getting through.”

As much as possible, try to avoid “you’re doing it wrong” responses, and try to find ways to share in the effort and troubleshooting of the discussion.

Talking about yourself is always better than talking about them. Talking about the asker sounds accusatory and confrontational.

Use more words

IRC and other online mediums encourage quick short responses, which are exactly the kinds of responses that will be easy to misinterpret. Try to use more words, especially encouraging optimistic words.

Understand your motivations

We want to help, but let’s be honest: there are other forces driving us. There’s a dark appeal in pointing out where someone is wrong. We can retaliate against poor language stewardship by ranting to others. It feels good to win the competition for highest mastery of language implementation arcana.

It’s natural to find outlets for this kind of negative energy, but we have to keep it in check. Focus on helping people.

Humility

A lot of the above advice boils down to being humble. It feels good to help people, to know the answers. Being an expert and knowing things other people don’t know is very satisfying. But you can help people better if you approach the job with humility. Maybe you don’t know everything. Maybe some of it was your fault. You can be gracious in overlooking small mistakes.

Make connections

Another theme running through this advice is: making a connection with a person is more important than the technical details of the conversation. Points of correctness are useless without points of connection. Establish a rapport with people, and then deliver your technical message.

Finally: It’s hard

Again, I know this is all difficult. I know that some people are just not ready to be helped in IRC. Sometimes things will go badly. We can’t fix everything.

But I want things to go as well as they can, and I want us, the helpers, to handle ourselves as well as we can.

If you are looking for other thoughts about this, the Freenode Catalyst Guidelines are also good tips for how to be useful online.

Thanks for helping.

Do a pile of work better

Saturday 22 August 2020

A few days ago I wrote about doing a pile of work with concurrent.futures. Since then, I discovered a problem with the code: exceptions raised by the work function were silently ignored.

Here’s the improved code that logs exceptions:

def wait_first(futures):
    """
    Wait for the first future to complete.

    Returns:
        (done, not_done): two sets of futures.

    """
    return cf.wait(futures, return_when=cf.FIRST_COMPLETED)

def do_work(threads, argsfn, workfn):
    """
    Do a pile of work, maybe in threads, with a progress bar.

    Two callables are provided: `workfn` is the unit of work to be done,
    many times.  Its arguments are provided by calling `argsfn`, which
    must produce a sequence of tuples.  `argsfn` will be called a few
    times, and must produce the same sequence each time.

    Args:
        threads: the number of threads to use.
        argsfn: a callable that produces tuples, the arguments to `workfn`.
        workfn: a callable that does work.

    """
    total = sum(1 for _ in argsfn())
    with tqdm(total=total, smoothing=0.02) as progressbar:
        if threads:
            limit = 2 * threads
            not_done = set()

            def finish_some():
                nonlocal not_done
                done, not_done = wait_first(not_done)
                for done_future in done:
                    exc = done_future.exception()
                    if exc is not None:
                        log.error("Failed future:", exc_info=exc)
                progressbar.update(len(done))

            with cf.ThreadPoolExecutor(max_workers=threads) as executor:
                for args in argsfn():
                    while len(not_done) >= limit:
                        finish_some()
                    not_done.add(executor.submit(workfn, *args))
                while not_done:
                    finish_some()
        else:
            for args in argsfn():
                workfn(*args)
                progressbar.update(1)

This might also be the first time I’ve used “nonlocal” in real code...

Do a pile of work

Wednesday 19 August 2020

I had a large pile of data to feed through an expensive function. The concurrent.futures module in the Python standard library has always worked well for me as a simple way to farm out work across threads or processes.

Update: this code swallows exceptions. An improved version is at Do a pile of work better.

For example, if my work function is “workfn”, and it takes tuples of arguments as produced by “argsfn()”, this is how you could run them all:

for args in argsfn():
    workfn(*args)

This is how you would run them on a number of threads:

import concurrent.futures as cf

with cf.ThreadPoolExecutor(max_workers=nthreads) as executor:
    for args in argsfn():
        executor.submit(workfn, *args)

But this will generate all of the arguments up-front. If I have millions of work invocations, this could be a problem. I wanted a way to feed the tasks in as they are processed, to keep the queue small. And I wanted a progress bar.

I started from this Stack Overflow answer, added in tqdm for a progress bar, and made this:

import concurrent.futures as cf
from tqdm import tqdm

def wait_first(futures):
    """
    Wait for the first future to complete.

    Returns:
        (done, not_done): two sets of futures.

    """
    return cf.wait(futures, return_when=cf.FIRST_COMPLETED)

def do_work(nthreads, argsfn, workfn):
    """
    Do a pile of work, maybe in threads, with a progress bar.

    Two callables are provided: `workfn` is the unit of work to be done,
    many times.  Its arguments are provided by calling `argsfn`, which
    must produce a sequence of tuples.  `argsfn` will be called a few
    times, and must produce the same sequence each time.

    Args:
        nthreads: the number of threads to use.
        argsfn: a callable that produces tuples, the arguments to `workfn`.
        workfn: a callable that does work.

    """
    total = sum(1 for _ in argsfn())
    with tqdm(total=total, smoothing=0.1) as progressbar:
        if nthreads:
            limit = 2 * nthreads
            not_done = set()
            with cf.ThreadPoolExecutor(max_workers=nthreads) as executor:
                for args in argsfn():
                    if len(not_done) >= limit:
                        done, not_done = wait_first(not_done)
                        progressbar.update(len(done))
                    not_done.add(executor.submit(workfn, *args))
                while not_done:
                    done, not_done = wait_first(not_done)
                    progressbar.update(len(done))
        else:
            for args in argsfn():
                workfn(*args)
                progressbar.update(1)

There might be a better way. I don’t like the duplication of the wait_first call, but this works, and produces the right results.

BTW: my actual work function spawned subprocesses, which is why a thread pool worked to give me parallelism. A pure-Python work function wouldn’t get a speed-up this way, but a ProcessPoolExecutor could help.

You should include your tests in coverage

Tuesday 11 August 2020

This seems to be a recurring debate: should you measure the coverage of your tests? In my opinion, definitely yes.

Just to clarify: I’m not talking about using coverage measurement with your test suite to see what parts of your product are covered. I’ll assume we’re all doing that. The question here is, do you measure how much of your tests themselves are executed? You should.

The reasons all boil down to one idea: tests are real code. Coverage measurement can tell you useful things about that code:

  • You might have tests you aren’t running. It’s easy to copy and paste a test to create a new test, but forget to change the name. Since test names are arbitrary and never used except in the definition, this is a very easy mistake to make. Coverage can tell you where those mistakes are.
  • In any large enough project, the tests directory has code that is not a test itself, but is a helper for the tests. This code can become obsolete, or can have mistakes. Helpers might have logic meant for a test to use, but somehow is not being used. Coverage can point you to these problems.

Let’s flip the question around: why not measure coverage for your tests? What’s the harm?

  • “It skews my results”: This is the main complaint. A project has a goal for coverage measurement: coverage has to be above 80%, or some other number. Measuring the tests feels like cheating, because for the most part, tests are straight-line code executed by the test runner, so it will all be close to 100%.

    Simple: change your goal. 80% was just a number your team picked out of the air anyway. If your tests are 100% covered, and you include them, your total will go up. So use (say) 90% as a goal. There is no magic number that is the “right” level of coverage.

  • “It clutters the output”: Coverage.py has a --skip-covered option that will leave all the 100% files out of the report, so that you can focus on the files that need work.
  • “I don’t intend to run all the tests”: Some people run only their unit tests in CI, saving integration or system tests for another time. This will require some care, but you can configure coverage.py to measure only the part of the test suite you mean to run.

Whenever I discuss this idea with people, I usually get one of two responses:

  • “There are people who don’t measure their tests!?”
  • “Interesting, I had a problem this could have found for me.”

If you haven’t been measuring your tests, give it a try. I bet you will learn something interesting. There’s no downside to measuring the coverage of your tests, only benefits. Do it.

Older:

Jun 28:

2500