D. Richard Hipp’s software universe

Sunday 3 January 2010

You may not have heard the name D. Richard Hipp, but you’ve used his software: SQLite is his creation, and it’s everywhere. SQLite is an impressive piece of work, but it’s not alone. Along the way, Hipp also wrote LEMON, his own parser generator for parsing SQL in SQLite.

And now he has his own distributed source control, Fossil, which hosts the SQLite development stream. Fossil is interesting because it’s also a distributed wiki and bug tracker, kind of like Trac meets Mercurial or something. As with all of his work, the Fossil documentation is very clear about the design principles and internals, again, very impressive.

SQLite’s documentation includes a detailed page about how it is tested, including how its coverage is measured. Needless to say, it is well measured: 100% condition coverage! The description there of the use of C macros to enhance measurement is a good example of how macros can be extremely useful in building complex software, and makes me wish for something with similar capabilities in Python.

I admire Hipp’s output, but I worry that it might be somewhat insular. SQLite obviously has great acceptance, but what will happen to Fossil? It has a huge uphill climb to get users, what with Git and Mercurial slugging it out, and a dozen others competing for attention.

It’s the age-old dilemma about using the best technology or the most widely accepted. In this case, I don’t even know if Fossil is better than the alternatives. At this point it doesn’t have the critical mass that would even move it from the Curiosities category to the Look Into It list.


I think Fossil is a very cool idea, but believe that Richard's choice of C as a development language will severely limit its adoption: the bang-for-the-buck ratio is lower than it would be for Perl/Python/Ruby/whatever. I think a Fossil-like tool on top of Git or Mercurial has a much greater chance of being the replacement for Trac we've all been waiting for.
There are several problems with the "insular" model. Not only is Fossil unique but there is no way to convert into or out of other VCS systems. At least providing a plugin to tailor would allow that. Additionally Fossil has very limited functionality - for example you can't search in tickets, only get a report and use your browser to search titles. Similarly there are no hooks to send email on ticket changes.

DRH also codes as though C has made no progress since 1989 and all the world is still a Vax which results in several issues. For example size_t and off_t are not used instead just using ints. This is a bad thing to do on 64 bit platforms and it took quite a bit of effort to convince him just how bad a problem it was, and the fixes aren't exactly models of how code should be written. Similarly enums are avoided instead picking types manually which assumes the C author knows better than the compiler which sizes are most efficient.

The claim of 100% coverage is true, but the code is not complete. For example the error return codes of several functions are ignored.

Apparently I look like a spammer so you have to look these bugs up manually as citations of the problems. Use http://www.sqlite.org/cvstrac/tktview?tn=9999 and replace 9999 with the bug numbers.

Type confusion: 2125, 3246

Ignoring error codes: 3946, 3507, 3394

Python is actually rather good at this sort of thing especially if you make a debug build. In my test suite I do return errors at all possible points and test for them with execution flowing through my code, Python C code and SQLite and often crossing those boundaries multiple times.
@Greg: Fossil isn't really written entirely in C. Large amounts are generated C code using TCL scripting (especially templating) and much logic is in SQL.

The final compiled result is fantastic for deployment. It is a single binary with no shared libraries and can run as a standalone web server, be invoked from CGI or inetd. There is no dependency hell.
Greg: I doubt Dr. Hipp is interested in converting Fossil to run on top of Git etc. That'd mean migrating away from the SQLite back-end, which I'm sure he sees as a compelling feature. Indeed, it's compelling in a lot of ways--sans one.

Monotone also uses SQLite to store its repository. The feeling I got from the Monotone camp was "and that's why we will never, ever, ever be competitive speed-wise with the rest of the SCMs". SQLite is an astonishing project in many ways, but there is no chance it will outperform an intelligently designed custom-written SCM back-end.

Ned: The upside of being insular is that you distance yourself from the turbulence of the real world. Dr. Hipp and the contributors to SQLite/Fossil/LEMON can all happily work away in blissful ignorance of the engoing SCM superiority battles. And they won't have painful "please rebuild your repository" hairy changeovers and "wait, what version of the SCM are you using?" support issues unless they're self-inflicted. Only Mercurial has yet to ever change its repository format; while this is charmingly backwards compatible it's also costing them in the fastest-SCM arms race.

I doubt that Fossil will be a major impediment to people contributing patches; the UI looks basically sane. And I assume it's easy to incorporate universal diffs. So it's probably not a big deal.
Of course they'll have "please rebuild your repository" issues, why wouldn't they? Precisely because they're so insular, they have no incentive to maintain backwards compatibility, since anytime you change something all you need to do is tell the 5 people using it to update and recompile.
Yes, I concur with some of the other commenters.

I would rather see the features of Fossil achieved by improving the existing tools: work with Ditz etc. to get distributed bug tracking the way you want, work with Bazaar etc. to get the distributed VCS behaviour you want, work with Ikiwiki to get the wiki behaviour you want, etc.

Each of those is deliberately extensible (and I'm sure there are other existing tools that can be substituted), so the need for wheel re-invention is lost on me.
Thanks for mentioning Fossil, Ned! My server logs tell me that lots of people have seen Fossil for the first time as a result of your blog.

Some of the comments suggest that folks think I am trying to compete with Monotone, git, hg, and other "more established" DVCSes. This is not the case. I wrote Fossil to meet my own needs. If others find it useful, great. If not, I'll use it myself and be happy. It was never been my goal to create the Next Great DVCS.

That said, I think there are many original ideas in Fossil that other DVCSes would do well to consider and perhaps incorporate into their own designs. The built-in wiki and bug tracking are new. The "fossil ui" command that starts a local webserver and launches the users web browser to view it has proven to be a very powerful idea. The "embedded documentation" has worked well for us. The "autosync" mode works better than the usual DVCS for my work practices. Bandwidth efficiency and the ability to penetrate restrictive firewalls is an important feature for many users. And many people like the fact that Fossil is a single stand-alone executable, making it drop-dead simple to install or uninstall or even run in a chroot jail. I think it would be great if, as others have suggested, some or all of these and other features of Fossil were added to other DVCSes. I promise that noone will not hurt my feelings by stealing the ideas. Hack away. Just don't expect me to do the hacking for you, since I'm happy with Fossil.

Many folks seem put off by the idea that Fossil is written in C. They believe that a scripting language (ex: Python) would be a better choice. Actually, I did several early prototypes of Fossil using TCL but what I found is that the high level features of a scripting language did not really help. I'm the first to admit that languages like TCL or Python is usually a much better choice for implementing a big project like this and I was surprised to see that C worked as well or better in this application. I'm not exactly sure why that is, but I think the comment from Roger above (Roger Binns?) is probably closest to the truth when he points out that the scripting language used by Fossil is really SQL. Most the work that isn't SQL tends to be low-level byte twiddling that is easer to do in C. Perhaps the take-away here is that the best language for an application might not always be what you expect.

Thanks to all for the criticism and feedback.
@Richard, I admire your "I'm not trying to take over the world, this is what works for me" attitude.

We seem to have gotten onto the "C is not a great implementation choice" theme. While I see the benefits of implementing in Python (or other higher-level languages), as I point out concerning C macros, C has some advantages that other languages do not. And I don't see anyone criticizing Git over Linus' decision to code in C.
I would like to add one more thing which I think is important to the success of a SCM these days. That is how easy it is to make a SCM's functionality available as a plugin to IDE's and editors.
As soon as converters from and to other SCM's are available I will give fossil a try.
I am impressed by Fossil with my first quick run, especially after my struggles to get Trac & Mercurial working together (I am not saying this is true for all). Going forward a few things are definitely going to become important (just picking a few things from comments here):

* The ability to quickly search through issues & notifications
* Integration with IDEs
* Ability to migrate from/to other VCSs (what I'm looking for, to deal with a just-in-case scenario)
Started having a look at fossil thanks to, among others, this blog post.

Just thought I'd point out though, for the commenters worried about lack of converters to/from other SCMs, that fossil already does this for git:


So, there's one escape hatch if you're feeling cornered. :)
...heh... Getting used to the new year. Looks like I'm a little late with the "news" of git import/export. Still, I guess it's good to update the record for slow-pokes like me catching up on this.
@Roger, Greg
Let me pipe up and say:
I feel bad for you son, I got 99 problems but C ain't one.

(in fairness, they raise good points. And this comes from a guy who thought that C was a crutch for the weak just a couple decades ago. Your worst critics are often your best advisors...)

I spent the morning looking at a variety of ticket/product defect tracking systems. Is a distributed version control system a reasonable substitute? Maybe not, but after three days of research (for the umpty-frazillionth time in 10 years) I stumbled on Fossil and it was the first package I've demoed that IMMEDIATELY made sense and felt intuitive.

Oh, sure, if it was written in python or php it would be easier to extend. This is a real problem, actually, but I suspect all he has to do is cobble up a python api and/or the ability to shell out and we're good from there on. Of course, I haven't looked at session handling, but I suspect the package can be extended readily enough.

There were complaints that it is written in turn-of-century C style. Actually, that's a good thing, because the code will be easier to modify with some chance of success by average procedural programmers, who probably constitute 80% of those who want to modify/improve it.

But truthfully, even if it were written in Cobol, I'd still be installing it. It has the mark of excellent software written by a guy who makes good guesses about usability: I was able to log in, and immediately intuit/grok/guess how it would work.

Secondly, it looks attractive. It's not quite sexy, but the design decisions are professional and lack the usual "look, mom, I wrote a program!" design mistakes common in open source. You know, the common sort with a gui that looks like the equivalent of a 5 year old painting a dinosaur with acrylics for the first time.

We have a pretty well developed hudson system, but I'm pretty certain that this will be integrated into our work flow within a week. The only drawback I can see is that we may have to add some functionality by attaching to the SQL-lite database and generating some reports/graphs/etc. But let's face it - if reading and writing to a SQL-lite database is the biggest hurdle we face, I'm OK.

Also, we write a lot of software using the local webserver model, and it has solved a myriad of support problems for us. Increasingly, we find that it's the only way to sidestep Microsoft's "burn down the old buildings and salt the fields as you pass by" OS security improvements.

I'm pretty sure fossil is about to catch fire. Congrats!
Fossil hits one niche that goes begging a lot: the small team. It hits it so well that I'd be willing to bet that most of the people who come and go as contributors to Fossil are part of that niche.

And that's quite interesting, if you look at some of the FOSS that's been very successful so far--one person with an idea that they're willing to code up, and then let others help with. It's kind of "fundamental FOSS." It brings Wall, Ousterhout, or Torvalds to mind.

But as for Fossil, there really is no substitute for it. RCS is too limited, CVS is too hairy, everything else is just too big and nothing else comes close to the convenience for a small team.

Fossil may have been written as the perfect SCM system for one person, but it comes pretty close to perfect for a whole lot of others.

Arguments about "the future" of source control, the "best language" for development of anything, the extensibility of the program--and whatever else causes furrowed brows for people who worry about the tool being perfect for all purposes miss the point entirely--like the blind men examining the elephant.

It works for me. I have a clone of the fossil repo -- if all else went away tomorrow, it would still work for me.
Good for you D. Richard! You scratch that itch and scratched it well.

We could all learn a lesson here. For far to long those of us who are opensource developers have allowed other's in the community to hinder our creative development.

Everytime I here someone say.

Why re-invent the wheel.
Don't create your own project work on xyz's project.

It pisses me off.

A. First off the wheel sucks I want a F%$*ing ROCKET PACK!
B. No one tell's me what to work but my wife!

Add a comment:

Ignore this:
Leave this empty:
Name is required. Either email or web are required. Email won't be displayed and I won't spam you. Your web site won't be indexed by search engines.
Don't put anything here:
Leave this empty:
Comment text is Markdown.