What I think is good and bad

Sunday 24 May 2020

I’m in the #python IRC channel on Freenode a lot. The people there are often quite opinionated. Julian had the idea of processing the logs to see what we thought was good, and what was bad, using sophisticated sentiment analysis.

Finding out what I liked and didn’t like wasn’t hard, since the “sophisticated sentiment analysis” was two regexes: “<nedbat>.* is good” and “<nedbat>.* is bad”!

Without further commentary, here is a sampling of things that I said were bad:

  • vertical alignment is bad because it means you might have to change many lines just because one of them got wider.
  • eval(input()) is bad.
  • blindly following stuff is bad.
  • trolling is bad. Find a way to use your brains for good.
  • floats for currency is bad.
  • any class that you can only instantiate once is bad.
  • that sounds like a singleton, which is bad.
  • some people say, “you should start by learning assembler,” and i think that is bad advice.
  • del is fine. __del__ is bad
  • implicit copying is bad
  • texture (repetition you can see when you squint) is bad in code
  • aligning indents with the opening delimiter is bad.
  • this is bad: def main(nums=[1,2,3]).
  • monkeypatching is bad
  • it is bad to modify a list you are iterating.
  • __import__ is bad
  • import * is bad stuff
  • the python doc search is bad
  • and-or is bad
  • singletons are hidden global state, and global state is bad
  • checking types is bad
  • that “:type myparam:” syntax is bad, it’s not readable. use google style instead: https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html#google-vs-numpy
  • python is bad at recursion deeper than 1000
  • there is a package on pypi called time, which is bad.

And here are some things I said were good:

  • “python -m app_name” is good, but recent.
  • endswith is good
  • xpath is good at selecting nodes in an xml dom tree.
  • recursion is good for recursive structures. Iteration is good for iterative structures
  • madlibs is good because you can do string manipulation with puerile humor
  • learning is good! :)
  • python is good for full applications.
  • re.sub is good.
  • mock(spec=thing) is good
  • excitement is good :)
  • pip will know how to install into virtualenvs, which is good.
  • gist.github.com is good
  • argparse is good for simple things
  • it is good to cover your tests, though others disagree with me
  • textwrap.dedent is good
  • requirements.txt is good for recreating environments.
  • the csv module is good at writing out dicts as rows.
  • this is good for seeing how rst will be formatted: http://rst.ninjs.org/
  • colorama is good
  • yaml is good
  • setUp is good. tearDown is better done as addCleanup
  • yield from is good for when you want one generator to produce all the values from another generator.
  • duck typing is good when there’s an operation supported across a number of types, and you can just use the operation without worrying about the type.
  • learning is good! :)
  • Django is good if you like having lots of things handled for you. Flask is good if you like to put together all the pieces yourself.
  • tox is good for testing against multiple environments
  • for validating email addresses, this is good: [^@ ]+@[^@ ]+\.[^@ ]+ https://nedbatchelder.com/blog/200908/humane_email_validation.html
  • the python.org tutorial is good if you have programmed in other languages before.
  • “pip install -e .” is good
  • the interactive interpreter is good for experimentation, but isn’t good for real development.
  • bpaste.net is good
  • Think Python is good
  • atexit is good.
  • python is good, and we are helpful and friendly :)
  • pandas is good if you need to manipulate tables of data. If you don’t, then don’t use pandas
  • you want to do something for each thing in a list, that’s what a for loop is good for :)
  • there’s an old habit of using “:type:blah:” or whatever, which is horrible. Sphinx now supports the “napoleon” style natively, which is good: http://www.sphinx-doc.org/en/1.4.8/ext/napoleon.html
  • pytest does cool assert rewriting, which 99.9999% of the time is good magic.
  • pudb is good in the terminal
  • trying to be efficient is good.
  • sql is good for some kinds of data. nosql is good for others.
  • .format is good
  • a decorator is good for wrapping functions in new functionality.
  • the pytest -k option is good at that.
  • i would not try to jam everything into setup.py. this feels like something a makefile is good at.
  • pytest is good at parameterized tests.
  • “if not list:” is good python
  • @classmethod is good for alternate constructors, yes.
  • rg is good: https://github.com/BurntSushi/ripgrep
  • the prompt is good for doing small experiments. Once you have larger programs, put them in .py files, and run them: python myprog.py
  • low tech is good tech
  • Learning is good.
  • numpy is good when you can do whole-matrix operations at once. If you need to iterate over elements and do individual operations, it doesn’t provide any benefit.
  • any is good when you have an iterable of true/false
  • choice is good. Why should there be only one implementation?
  • gist.github.com is good, or paste.pound-python.org
  • click is good
  • you’re not using a shell, which is good.
  • whatever helps you learn is good
  • numpy is good when you are doing matrix and array operations. lists are good for ordered collections of things
  • obfuscation isn’t something Python is good at.
  • the ast module is good for one thing: representing python programs as a tree of nodes. It provides tools for parsing source text into that tree.
  • python is good as a first language
  • isolation is good, but doing it with mocks can be a problem in itself
  • subclassing is good for when SubClass, by its essence is a ParentClass.
  • talking is good :)
  • coverage.py is good.
  • recursion is good for recursive structures (trees). iteration is better for linear structures (lists)
  • Jupyter is good for visualizations, graphing, tables, etc. interactive experimentation
  • lxml.html is good
  • sha256 is good too
  • Fluent Python is good, if you like books
  • for speed, PyPy is good. Or Cython. if you want to write C code, you can use cffi to call it from Python
  • curiosity is good
  • collections.Counter is good at counting things, and would do this in O(N).
  • .encode makes the conversion explicit, which is good. my_bytes = my_unicode.encode(“utf8”)
  • they is good.
  • pudb is good
  • in python 3, super() is good, but it doesn’t work in python 2.
  • the -k option is good for this, or you can define markers.
  • virtualenv is good for separating different projects’ needs
  • tig is good too
  • rst is good at multi-page docs, without assuming it will be html. markdown just shrugs and says, “use html when you have to”
  • writing is good just for its own sake.
  • bpaste.net is good.
  • learning is good
  • dependencies are good. using other people’s solutions to your problems is good.
  • split has a much better PR agency, but partition is good too
  • attribute access is good.
  • i won’t say that loops should introduce scopes, but it is good to be able to understand the interplay between scoping and closur-ing
  • https://pypi.org/project/appdirs/ is good for answering that question
  • automate the boring stuff is good. What kind of software will you be writing?
  • iso8601 is good
  • prompt_toolkit is good
  • numpy is good when you have an array full of data, and you can do one operation that works on all of it at once

Not much

Sunday 3 May 2020

My son Nat is 30 and has autism. His expressive language is somewhat limited, and he relies on rote answers when he can. A few years ago, one of his caregivers taught him that if someone asks, “What’s up?” you can answer, “Not much.”

At first I thought, “that’s not always a good answer,” but then I started paying attention to how people around me responded, and sure enough, not only is it a good answer, but it’s almost always the answer people give.

It’s especially appropriate these days. Nat has been living with us through these COVID-19 times since the middle of March, and he is frustrated at how limited his days have become.

One of Nat’s favorite things is a three-week calendar showing what’s going to happen. It was a regular routine when he would visit on weekends from his group home: we’d sit down and update his calendar. Every day would be marked with where he’d wake up, what he would do during the day, and where he would go to bed. Unusual activities and special events in particular would be noted and recited. He would often sit and study the calendar, or would ask to review it with us.

When he first moved in for the lock-down, we tried updating his calendar, but the result was too depressingly accurate: every day was the same, and every day was at home. The only special event was Passover, which had been changed from in-person to Zoom:

A Zoom seder with 19 people on 8 screens

Soon the weekly calendar was abandoned as uninteresting, and Nat started saying, “April,” meaning, I want it to be April when this will be over and I can go back to my regular life.

Of course, April came without a let-up of the lockdown, so he started saying, “May.” Now that we are in May, he says, “Summer.” We can’t give him a definite answer. The best we can do is to remind him that we have to wait, and that everyone he knows is also at home waiting.

I have been walking with Nat, a long-favored activity of ours. We’re up to about five miles a day, which is good for both of us.

My wife Susan has been handling most of the weekday activities other than the walks, trying to find things for Nat to do. They have been doing a lot of Facebook, a baking project most days, a little street basketball, and chores around the house.

She has taken to calling this “Suzie’s Little Day Program“:

Suzie’s Little Day Program pros and cons:

Pros: 1) Great staff-to-client ratio; 2) Lots of love; 3) Lots of napping; 4) Great treats; 5) Strong exercise component; 6) No ABA Whatsoever.

Cons: 1) Over-reliance on sugar; 2) Over-napping; 3) Not enough variety in peer group; 4) Moody staff; 5) Unreliable hours; 6) No ABA Whatsoever; 7) Often boring AF.

Overall, Nat is taking this very well. He has settled into this underwhelming routine. He dutifully wears his face mask on walks, and now knows to walk far around other people on the sidewalk without me prompting him. He likes getting on Zoom calls with the groups he is part of (MUSE foundation, Special Olympics, his day program), even if it’s just to watch because jumping in is difficult in those chaotic events.

We have gotten used to having him around full-time. There are 12 nearly empty bottles of shampoo in the shower (hard to explain). A few favorite Disney movies are on tight rotation. He keeps us a little more regimented, since ad-hoc is not his style.

I keep a close eye on him. He will not let us know if he starts to feel sick, so we have to be alert for him. He can be very passive, so it’s easy to feel guilty if he is staring into space or napping too much. We feel like we should be filling his time somehow, but it’s not possible to keep him busy ten hours a day.

Luckily, he has been in a calm period overall. There were other times in his life when these days would have been much stormier. We hope that his even temper continues. It’s been seven weeks so far, and we don’t know how much longer we are going to be together like this.

This is just our life right now. We’re all doing what we can. What’s going on? I’d have to say, not much.

Please report bugs in this site

Monday 27 April 2020

tl;dr: If you see something wrong on my site, please let me know. I won’t be annoyed or offended. You’ll be helping me. It’s a contribution.

•    •    •

This morning I tweeted out a link to my Lato’s unfortunate ligatures blog post. Łukasz Langa tweeted back at me:

This is low pri but since we're already nitpicking: your website's design doesn't do image resizing proportionally. That's how it looks on the iPhone in horizontal mode (vertical is worse).

OMG! Are images looking like that on my site and I didn’t know? (I’ve since fixed the problem.) Turns out it wasn’t just iPhones: Safari on desktop also showed stretched images like this! Why didn’t anyone tell me?

I could have done more thorough testing to begin with, but bugs will happen no matter what.

If you see something wrong, get in touch to let me know. As a creator, I want to make good things. If you can point me to a problem, you are helping me make better things. Even small things like dropped words or mispellings are good to report (see what I did there?).

I know it might feel like you are being a nit-picker, but it shows me that you care enough about what I do to respond. It’s a good thing. Probably other blog authors feel the same way.

Thanks in advance.

Letter boxed

Saturday 18 April 2020

With more time on my hands during this quarantine time, I started doing the Letter Boxed puzzle. You are given a square with three letters on each side:

A blank Letter Boxed puzzle: ECI AXY OTU HRN

You form words by choosing letters in sequence. The only rule is that the next letter must be on a different side of the square than the previous letter:

The same puzzle with THEOCRACY played

Your next word has to start with the last letter of the previous word:

The same puzzle with YENTA added in

The goal is to use all the letters in a certain number of words. For this puzzle, the challenge is five words.

But if you look at the answers they provide for yesterday’s puzzle, it’s always just two words. So now I’m tormenting myself trying to find two-word solutions. And of course I started thinking about writing a program to find them.

I found a giant list of words and started hacking on some code. I wasn’t sure if I’d need some fancy tree structure for searching the solution space. I figured I would start simpler than that, and maybe it would work.

My strategy was to whittle down the list of words a few steps at a time. First, keep only the words formed from just the 12 letters in the puzzle. Then reduce the list to only those that can be formed following the “not same side” rule. Then find pairs of words that end and start with the same letter, and use all 12 letters:

import collections
import itertools
import sys

def print_sample(label, words, n=10):
    print(f"{label}: {len(words)} words: {', '.join(itertools.islice(words, n))}")

with open("words2.txt") as fwords:
    words = set(w.strip() for w in fwords)

print_sample("All words", words)

sides = sys.argv[1:]
alphabet = set("".join(sides))

words = {w for w in words if set(w) < alphabet}
print_sample("Only the letters", words)

numbered_sides = {c: i for i, side in enumerate(sides) for c in side}

def is_possible(word):
    for first, second in zip(word, word[1:]):
        if numbered_sides[first] == numbered_sides[second]:
            return False
    return True

possible = {w for w in words if is_possible(w)}
print_sample("Possible words", possible)

starts = collections.defaultdict(list)
for word in possible:
    starts[word[0]].append(word)

print("Solutions:")
for word1 in possible:
    last = word1[-1]
    for word2 in starts[last]:
        if set(word1 + word2) == alphabet:
            print(word1, word2)

This code is just stream-of-consciousness coding, intermixing running statements with function I needed. On some previous puzzles, it came up with way too many solutions. For “riu pgh lcs yao” it found 1002 pairs of words! Solutions like “hypacusia argol” are not satisfying...

My giant list of words has 466,551 words. I also have a smaller file with only 45,404 words. I refactored the code to make it more flexible. Now it will try the small word list first, and only go to the second larger list if there are no solutions:

import collections
import itertools
import sys

def print_sample(label, words, n=10):
    print(f"{label}: {len(words)} words: {', '.join(itertools.islice(words, n))}")


class LetterBoxed:
    def __init__(self, sides):
        self.sides = sides
        self.alphabet = set("".join(sides))
        self.numbered_sides = {c: i for i, side in enumerate(self.sides) for c in side}

    def is_possible(self, word):
        for first, second in zip(word, word[1:]):
            if self.numbered_sides[first] == self.numbered_sides[second]:
                return False
        return True

    def solutions(self, words):
        alpha_words = {w for w in words if set(w) < self.alphabet}
        print_sample("Using only the letters", alpha_words)
        possible = {w for w in alpha_words if self.is_possible(w)}
        print_sample("Possible words", possible)

        starts = collections.defaultdict(list)
        for word in possible:
            starts[word[0]].append(word)

        for word1 in possible:
            last = word1[-1]
            for word2 in starts[last]:
                if set(word1 + word2) == self.alphabet:
                    yield (word1, word2)

def main(sides):
    letter_boxed = LetterBoxed(sides)
    for wordfile in ["words.txt", "words2.txt"]:
        with open(wordfile) as fwords:
            words = set(w.strip() for w in fwords)
        print_sample("All words", words)
        solutions = list(letter_boxed.solutions(words))
        if not solutions:
            print("No solutions with these words")
            continue
        print(f"{len(solutions)} solutions:")
        for word1, word2 in solutions:
            print(word1, word2)
        break

if __name__ == "__main__":
    main(sys.argv[1:])

With this, “riu pgh lcs yao” finds five solutions from the short word list, using words I actually know, like “gracious splashy.”

Unfortunately, sometimes the official solution uses words that aren’t in even the enormous word list. A previous puzzle was “tub pxi snq oja”, and the solution offered was “juxtaposition niqab,” which is frustrating. Playing word puzzles inevitably brings you face to face with the differences between accepted word lists.

After I wrote my code, I found:

  • Caleb Robinson’s blog post about his solution, which involved the fancier tree structures I avoided.
  • Art Steinmetz’s blog post about both generating the puzzle and solving it, in R. He mentions the generalizations of different numbers of sides, and numbers of letters per side. I was amused to realize that my code doesn’t care how many sides or letters per side are used, so it works for any number.

Just for grins, I’m wondering if there’s some crazy way to abuse regexes to do most of the work. Too much time on my hands...

How long did it take you to learn Python?

Friday 27 March 2020

Wait, don’t answer that. It doesn’t matter.

Beginners seem to ask this question when they are feeling daunted by the challenge before them. Maybe they are hoping for a helpful answer, but it seems like most answers will just be a jumping off point for feeling bad about their own progress.

Everyone learns differently. They learn from different sources, at different paces. Suppose you ask this question and someone answers “one month”? Will you feel bad about yourself because you’ve been at it for six weeks? Suppose they say, “ten years”? Now what do you think?

The question doesn’t even make sense in a way. What do we mean by “learn”? If you can write a number guessing game in Python, have you learned Python? Are we talking about basic familiarity, or deep memorization? Does something have to be second nature, or is it OK if you are still looking through the docs for details? “Learned” is not a binary state. There isn’t a moment where you don’t know Python, and then suddenly you do.

And what do we even mean by “Python”? Are we talking about the basic syntax, or do you need to be able to write a metaclass, a descriptor, and a decorator with arguments? Is it just the language, or also the standard library? How many of the 200+ modules in the standard library do you need to be familiar with? What about commonly used third-party libraries? Are we also including the skills needed to write large (10k lines) programs in Python? “Python” is a large and varied landscape, and you will be finding out new things about it for years and years.

Especially since it keeps changing! Python isn’t sitting still, so you will never be done “learning Python.” I have been using Python for more than 20 years, and been deeply involved with it for at least half that time. I thought I knew Python well, then they added “async”. I will have to figure that out one of these days...

Since Python is used in many different domains, the things you need to learn could be completely different from someone else. These days, lots of people are learning Python to get into data science. I don’t do data science. Here are more things I don’t know (taken from a random sampling of “libraries you should know” blog posts): TensorFlow, Scikit-Learn, Numpy, Keras, PyTorch, SciPy, Pandas, Matplotlib, Theano, NLTK, etc. How should I compare my learning to a data scientist’s?

My advice to beginners is: don’t compare your learning to other peoples’. Everyone learns differently, using different materials, at different speeds. Everyone has different definitions of “learn,” and of “Python.” Understand your goals and your learning style. Find materials that work for you. Study, and learn in your own way. You can do it.

Functional strategies in Python

Friday 13 March 2020

I got into a debate about Python’s support for functional programming (FP) with a friend. One of the challenging parts was listening to him say, “Python is broken” a number of times.

Python is not broken. It’s just not a great language for writing pure functional programs. Python seemed broken to my friend in exactly the same way that a hammer seems broken to someone trying to turn a screw with it.

I understand his frustration. Once you have fully embraced the FP mindset, it is difficult to understand why people would write programs any other way.

I have not fully embraced the FP mindset. But that doesn’t mean that I can’t apply some FP lessons to my Python programs.

In discussions about how FP and Python relate, I think too much attention is paid to the tactics. For example, some people say, “no need for map/­filter/­lambda, use list comprehensions.” Not only does this put off FP people because they’re being told to abandon the tools they are used to, but it gives the impression that list com­pre­hensions are somehow at odds with FP constructs, or are exact replacements.

Rather than focus on the tactics, the important ideas to take from FP are strategies, including:

  • Write small functions with no side-effects
  • Don’t change existing data, make new data
  • Combine functions to make larger functions

These strategies all lead to modularized code, free from mysterious action at a distance. The code is easier to reason about, debug, and extend.

Of course, languages that are built from the ground up with these ideas in mind will have great tools to support them. They have tactics like:

  • Immutable data structures
  • Rich libraries of higher-order functions
  • Good support for recursion

Functional languages like Scheme, Clojure, Haskell, and Scala have these tools built-in. They are of course going to be way better for writing Functional programs than Python is.

FP people look at Python, see none of these tools, and conclude that Python can’t be used for functional programming. As I said before, Python is not a great language for writing purely function programs. But it’s not a lost cause.

Even without those FP tools in Python, we can keep the FP strategies in mind. Although list comprehensions are presented as the alternative to FP tools, they help with the FP strategies, because they force you to make new data instead of mutating existing data.

If other FP professionals are like my friend, they are probably saying to themselves, “Ned, you just don’t get it.” Perhaps that is true, how would I know? That doesn’t mean I can’t improve my Python programs by thinking Functionally. I’m only just dipping my toes in the water so far, but I want to do more.

For more thoughts about this:

  • Gary Bernhardt: Boundaries, a PyCon talk that discusses Functional Core and Imperative Shell.
  • If you want more Functional tools, there are third-party Python packages like:
    • pyrsistent, providing immutable data structures
    • pydash, providing functional tools
    • fnc, providing functional tools

Older: