Since Python 3.6, regular dictionaries retain their insertion order: when you iterate over a dict, you get the items in the same order they were added to the dict. Before 3.6, dicts were unordered: the iteration order was seemingly random.
Here are two surprising things about these ordered dicts.
You can’t get the first item
Since the items in a dict have a specific order, it should be easy to get the
first (or Nth) item, right? Wrong. It’s not possible to do this directly. You
might think that d[0]
would be the first item, but it’s
not, it’s the value of the key 0, which could be the last item added to the
dict.
The only way to get the Nth item is to iterate over the dict, and wait until you get to the Nth item. There’s no random access by ordered index. This is one place where lists are better than dicts. Getting the Nth element of a list is an O(1) operation. Getting the Nth element of a dict (even if it is ordered) is an O(N) operation.
OrderedDict is a little different
If dicts are ordered now, collections.OrderedDict is useless, right? Well, maybe. It won’t be removed because that would break code using that class, and it has some methods that regular dicts don’t. But there’s also one subtle difference in behavior. Regular dicts don’t take order into account when comparing dicts for equality, but OrderedDicts do:
>>> d1 = {"a": 1, "b": 2}
>>> d2 = {"b": 2, "a": 1}
>>> d1 == d2
True
>>> list(d1)
['a', 'b']
>>> list(d2)
['b', 'a']
>>> from collections import OrderedDict
>>> od1 = OrderedDict([("a", 1), ("b", 2)])
>>> od2 = OrderedDict([("b", 2), ("a", 1)])
>>> od1 == od2
False
>>> list(od1)
['a', 'b']
>>> list(od2)
['b', 'a']
>>>
BTW, this post is the result of a surprisingly long and contentious discussion in the Python Discord.
Comments
Specifically:
OrderedDict([("a", 1), ("b", 2)]) == {"a": 1, "b": 2} == OrderedDict([("b", 2), ("a", 1)])
but OrderedDict([("a", 1), ("b", 2)]) != OrderedDict([("b", 2), ("a", 1)])
list(d)[500]
So I checked that. And to make sure, that I checked it right I did 100 runs and aggregated the results. Here they are. Left column is the index of the value that we're getting. Right column: mean of the time it took to get it. As you can see, there is no difference in retrieval time. And for once it would be nice to hear some argumentation from your side, apart from "oh it's iterating" and "please stop". Here's my code used for testing: By the way, just for the laughs, I did the same test on a dictionary with TEN MILLION records. That's a lot of records for a simple dict, but who cares. And guess what: it took exactly the same amount of time to retrieve the 10-millionth record as it took to retrieve the 0th. If in some way I'm wrong, and I just fail to see the iteration and growing execution time -- by all means enlighten me, and I will admit I was wrong. Otherwise be a gentleman ;)
Dicts don't offer numeric indexing, you have to iterate them to get it. The N could either mean the index you are looking for, if you iterate until you find what you want, or it could mean the size of the dict, if you iterate the entire dict as you are doing.
Your experiment demonstrates that once you have a list or tuple, accessing the element by index is a constant-time operation regardless of the index. This is the exact point being made in the post: lists are better than dicts at random access by numeric index. Your timing experiment shows that. But you've paid the price of iterating the entire dict first, and that total iteration is O(N) in the size of the dict.
Even if there is iteration, though, it's handled on the level of Python's C engine, so in practicality it won't matter. Especially since you can cache the result of if you need to acces items by index.
But I agree that you're better off with a list for that.
Add a comment: