blameall.py

One thing I’ve missed from Perforce since using Subversion is the “p4 annotate -a” command. This annotates a file with the revisions that introduced each line, much like the “svn blame” command. But the -a switch tells it to include every revision of every line. This is a way of getting the complete history of the file in one textual output. It’s great for finding a snippet that you suspect existed somewhere in the file’s past.

Blameall.py provides the same feature, but for Subversion.

For example, let’s say you have a file with a number of revisions. Revision 26:

Shopping List

- Milk
- Bread
- Eggs

Revision 27:

Shopping List

- Milk
- Juice

Revision 28:

Shopping List

- Juice
- Cereal
- Ice Cream

Running blameall shows the history of the file in one series of lines:

$ python blameall.py shoplist.txt
26 27 28
   26  head Shopping List
   26  head
   26    27 - Milk
   26    26 - Bread
   27  head - Juice
   26    26 - Eggs
   28  head - Cereal
   28  head - Ice Cream
   26  head

This shows us that “Milk” appeared in revision 26 and was present through revision 27. “Shopping List” appeared in 26, and is still in the file in the head revision.

It can be slow to get all the revisions, but it’s faster than manually searching through old revisions for that piece you know was back there somewhere.

You can provide a -r argument to blameall to limit its attention to a particular range of revisions.

Getting it

Blameall is a single-file Python script, no need to install anything. Just download and run:

Download: blameall.py

Comments

[gravatar]
Excellent. Thanks Ned!
[gravatar]
Cool tool! I don't see anything that says whether your blog
comments have markup characters, so this comment might look
weird.

The log lines in my personal repository look like this:

r218 | (no author) | 2007-11-05 13:07:43 -0500 (Mon, 05 Nov 2007) | 2 lines

(I'm the only person who uses it, so I've never bothered
getting a username to show up.) To get blameall.py to work
for me, I had to change the regex on line 122:

--- blameall.py.original 2007-11-28 12:00:48.531250000 -0500
+++ blameall.py.corrected 2007-11-28 12:03:09.078125000 -0500
@@ -119,7 +119,7 @@
if not revline and not log:
# End of the log.
break
- m = re.match(r"r(?P[0-9]+) \| (?P[^ ]+) \| [^|]+ \| (?P[0-9]+) line.*", revline)
+ m = re.match(r"r(?P[0-9]+) \| (?P(?:\(no author\)|[^ ]+)) \| [^|]+ \| (?P[0-9]+) line.*", revline)
if not m:
raise Exception("Couldn't scrape log line: %r\nRemaining: %r" % (revline, log))
revs.append((int(m.group('rev')), m.group('user')))
[gravatar]
The diff in my last comment is hard to read and the angle
brackets got swallowed; the short version is that I changed

> (?P<user>[^ ]+)

to

> (?P<user>(?:\(no author\)|[^ ]+))
[gravatar]
No need to introduce more alternatives ;)

(?P[^|]+)

catches everything between the both pipe symbols except the leading and trailing single space (so even trailing spaces except the last one would be catched).
If you look into the next expression, you can see that [^|]+ is already working :)

And for that matter, the trailing .* is unnecessary, too (or even the whole line.*).

So a more compact and and though more general version would be:
r(?P[0-9]+) \| (?P[^|]+) \| [^|]+ \| (?P[0-9]+)
[gravatar]
The symbolic group names rev, user and lines were lost during the submit process, they would have to be added after each ?P

And you obviously have an Unicode problem in this blog (see my name in the last comment ;)

It would be nice if you would catch the revision timestamp and add an option to display it next to the revision number, too.
Then we could easier blame old colleagues, which are long gone, to the project leader ;)
[gravatar]
Subversion's command-line client has an "--xml" option for most commands (most notably "svn log"), which comes out in a readily parseable format -- no need for "tweaking the regex".

Even better, the Python bindings for Subversion offer a direct interface -- no parsing required!
[gravatar]
@Aaron: didn't realize that syntax existed, thanks for the pointer. I've updated the code to account for it.

@Rene: thanks for the fine-tuning of the regex, and I'm sorry about the Unicode. My PHP skills are not that polished, but I can't switch my blog infrastructure just yet.

@Rob: I didn't know about the --xml switch, but the regex is working pretty well for me at the moment, so I guess it stays. And the python bindings feel like another dependency, so I'm glad to rely only on the command line client.
[gravatar]
Ned, thanks - I've already found this a really useful script (discovered via your recent svn user list email)
[gravatar]
Hi! This looks like just the tool I need. Unfortunately, I'm a total Python ignoramus, and I'm getting this error when I try to run it:

C:\myDir>pythod blameall.py myFile

Traceback (most recent call last):
File "blameall.py", line 189, in
main(sys.argv[1:])
File "blameall.py", line 180, in main
for i1, i2, line in multidiff.blame_data():
File "blameall.py", line 98, in blame_data
return self.blame
AttributeError: MultiDiff instance has no attribute 'blame'

C:\myDir>


I'm using Python 2.7.

Thanks for any help,

Andy
[gravatar]
Here's another data point: When I try it with Python 3.1.2, I get this error:

C:\myDir>python blameall.py myFile
File "blameall.py", line 186
print "%5s %5s %s" % (r1, r2, line)
^
SyntaxError: invalid syntax

C:\myDir>

Thanks again for any help,

Andy
[gravatar]
And more progress, but still no cigar - I got Python installed on my SunOS 5.8 machine, and got this error when I tried to run blameall:

> python blameall.py myFile
Traceback (most recent call last):
File "blameall.py", line 9, in
import optparse, re, subprocess, sys
File "/nobackup/user1/acohen/python/Python-2.7/Lib/subprocess.py", line 432, in
import pickle
File "/nobackup/user1/acohen/python/Python-2.7/Lib/pickle.py", line 1266, in
import binascii as _binascii
ImportError: No module named binascii
>


So, are there some libraries I need to get, or point to, or something?
[gravatar]
thanks, this was a great help.
[gravatar]
Great tool! Thanks much for sharing!
[gravatar]
A slight modification to allow spaces in filenames (this works on Windows, and I think it *should* work for Bash and other *NIX shells):
    fpath = args[0] # (this is line 164 in the current version)
    # Permit spaces in the filename, but don't interfere if the user has already
    # quoted it.
    if ' ' in fpath and not fpath[0] == fpath[-1] in '"\'':
        fpath = '"%s"' % fpath
I'm guessing Andy has either already found a solution or given up, but for future reference, this is a Python2 script and will NOT work for Python3.
[gravatar]
...actually, it looks like Andy already has Python 2....I'm not sure why their installation doesn't include the `binascii` module, but it's been in Python since at least version 2.6, so the installation itself is apparently faulty.

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.