blameall.py
Created 27 November 2007
One thing I’ve missed from Perforce since using Subversion is the “p4 annotate -a” command. This annotates a file with the revisions that introduced each line, much like the “svn blame” command. But the -a switch tells it to include every revision of every line. This is a way of getting the complete history of the file in one textual output. It’s great for finding a snippet that you suspect existed somewhere in the file’s past.
Blameall.py provides the same feature, but for Subversion.
For example, let’s say you have a file with a number of revisions. Revision 26:
Shopping List
- Milk
- Bread
- Eggs
Revision 27:
Shopping List
- Milk
- Juice
Revision 28:
Shopping List
- Juice
- Cereal
- Ice Cream
Running blameall shows the history of the file in one series of lines:
$ python blameall.py shoplist.txt
26 27 28
26 head Shopping List
26 head
26 27 - Milk
26 26 - Bread
27 head - Juice
26 26 - Eggs
28 head - Cereal
28 head - Ice Cream
26 head
This shows us that “Milk” appeared in revision 26 and was present through revision 27. “Shopping List” appeared in 26, and is still in the file in the head revision.
It can be slow to get all the revisions, but it’s faster than manually searching through old revisions for that piece you know was back there somewhere.
You can provide a -r argument to blameall to limit its attention to a particular range of revisions.
Getting it
Blameall is a single-file Python script, no need to install anything. Just download and run:
Download: blameall.py
Comments
comments have markup characters, so this comment might look
weird.
The log lines in my personal repository look like this:
r218 | (no author) | 2007-11-05 13:07:43 -0500 (Mon, 05 Nov 2007) | 2 lines
(I'm the only person who uses it, so I've never bothered
getting a username to show up.) To get blameall.py to work
for me, I had to change the regex on line 122:
--- blameall.py.original 2007-11-28 12:00:48.531250000 -0500
+++ blameall.py.corrected 2007-11-28 12:03:09.078125000 -0500
@@ -119,7 +119,7 @@
if not revline and not log:
# End of the log.
break
- m = re.match(r"r(?P[0-9]+) \| (?P[^ ]+) \| [^|]+ \| (?P[0-9]+) line.*", revline)
+ m = re.match(r"r(?P[0-9]+) \| (?P(?:\(no author\)|[^ ]+)) \| [^|]+ \| (?P[0-9]+) line.*", revline)
if not m:
raise Exception("Couldn't scrape log line: %r\nRemaining: %r" % (revline, log))
revs.append((int(m.group('rev')), m.group('user')))
brackets got swallowed; the short version is that I changed
> (?P<user>[^ ]+)
to
> (?P<user>(?:\(no author\)|[^ ]+))
(?P[^|]+)
catches everything between the both pipe symbols except the leading and trailing single space (so even trailing spaces except the last one would be catched).
If you look into the next expression, you can see that [^|]+ is already working :)
And for that matter, the trailing .* is unnecessary, too (or even the whole line.*).
So a more compact and and though more general version would be:
r(?P[0-9]+) \| (?P[^|]+) \| [^|]+ \| (?P[0-9]+)
And you obviously have an Unicode problem in this blog (see my name in the last comment ;)
It would be nice if you would catch the revision timestamp and add an option to display it next to the revision number, too.
Then we could easier blame old colleagues, which are long gone, to the project leader ;)
Even better, the Python bindings for Subversion offer a direct interface -- no parsing required!
@Rene: thanks for the fine-tuning of the regex, and I'm sorry about the Unicode. My PHP skills are not that polished, but I can't switch my blog infrastructure just yet.
@Rob: I didn't know about the --xml switch, but the regex is working pretty well for me at the moment, so I guess it stays. And the python bindings feel like another dependency, so I'm glad to rely only on the command line client.
C:\myDir>pythod blameall.py myFile
Traceback (most recent call last):
File "blameall.py", line 189, in
main(sys.argv[1:])
File "blameall.py", line 180, in main
for i1, i2, line in multidiff.blame_data():
File "blameall.py", line 98, in blame_data
return self.blame
AttributeError: MultiDiff instance has no attribute 'blame'
C:\myDir>
I'm using Python 2.7.
Thanks for any help,
Andy
C:\myDir>python blameall.py myFile
File "blameall.py", line 186
print "%5s %5s %s" % (r1, r2, line)
^
SyntaxError: invalid syntax
C:\myDir>
Thanks again for any help,
Andy
> python blameall.py myFile
Traceback (most recent call last):
File "blameall.py", line 9, in
import optparse, re, subprocess, sys
File "/nobackup/user1/acohen/python/Python-2.7/Lib/subprocess.py", line 432, in
import pickle
File "/nobackup/user1/acohen/python/Python-2.7/Lib/pickle.py", line 1266, in
import binascii as _binascii
ImportError: No module named binascii
>
So, are there some libraries I need to get, or point to, or something?
Add a comment: