Running a database from a ramdisk

Thursday 12 May 2011

One project I'm working on these days is a large test suite for a Django project, in particular, trying to make it go faster. In one subset of 16 tests, total execution time is about 250 seconds! The tests clearly spend a lot of time in the database: 50% of the time is creating the test db, and another 30% is installing test fixtures.

A classic first step is to run the tests from an in-memory SQLite database. In this case, it's a Postgres database using GIS features, so changing engines isn't a simple settings change. Postgres doesn't have an in-memory feature, but I thought, why not force the database into memory without Postgres knowing, simply by creating the database in a ramdisk?

On Linux, this is a simple matter of creating the database in /dev/shm instead of a real disk-backed filesystem. And good thing it was simple, too, because it didn't help at all: the test times were completely indistinguishable between the two filesystems.

Which in retrospect makes perfect sense. Disk filesystems employ aggressive caching to keep data in memory in the first place, so it was unlikely that my simple trick was going to change things significantly. The test database is small enough to fit into RAM anyway, and for the tests, Postgres was already set to disable fsync.

The remaining mystery: why is SQLite in-memory faster than other databases on disk? If it is about the memory, then it should be possible to make other databases faster by forcing them into memory. And if isn't about the memory, then it's just that SQLite is faster no matter what (because it's in-process, for example) and using a disk-based SQLite should be faster than your real database also.

Comments

[gravatar]
Ludek Vodicka 7:50 AM on 12 May 2011

In our product ORM Designer (http://orm-designer.com) we're using SQLite database as back-end for holding all model data.

During the trying of speed-up some operations we tried to use in-memory database instead of db stored on the disk. The speed improvement was evident (~20-30%) only when executing lot of separated queries out of the transaction.

When INSERTs/UPDATEs/... are executed in the transaction, speed results are very similar (+/- 5%) to in-memory DB.

So the reason for the very good SQLite performance is probably due to a very good optimization within core ;-)

[gravatar]
Wu 10:47 AM on 12 May 2011

Good point. Obviously it depends on how big your tests database will grow (in size) but storing it in memory will speed up things a lot.

Just in case anyone is wondering if it is a linux-only feature, you can do that in any of the BSD OS too, for example, in FreeBSD you can create memory-based filesystems easily using mdconfig or mdmfs:

http://www.freebsd.org/doc/handbook/disks-virtual.html#DISKS-MD-FREEBSD5

[gravatar]
Evgeny 11:15 AM on 12 May 2011

Obviously, SQLite is designed as a single-user, single-connection database. Multi-user access control, ACID transaction isolation, SQL query optimization, and so on, name it yourself -- SQLite saves on skipping all this advanced stuff.

[gravatar]
Sam Penrose 11:44 AM on 12 May 2011

Is the test DB being "created" by a series of INSERT and UPDATE statements with data that is static across many runs? The simplest optimization might be to look for a serialization format that Postgres can load in a single call. I have no idea whether Postgres supports anything like that.

[gravatar]
Ned Batchelder 12:16 PM on 12 May 2011

@Wu: Maybe I wasn't clear, but the point here is that putting the database "in memory" didn't help at all, probably because disk-based filesystems try hard to keep everything in memory anyway.

@Evgeny: Your points are about how SQLite's in-process embedded model allows it to be faster. True, but doesn't explain why SQLite in memory is faster than SQLite on disk. Unless SQLite has *worse* disk optimization than other engines. And it's hard to see how skipping SQL optimization can make things faster...

@Sam: the next step is to investigate faster ways to get content into Postgres, yes.

[gravatar]
Floris Bruynooghe 12:18 PM on 12 May 2011

SQLite has no way to disable fsync() calls (as far as I know) other then using the in-memory version. So that's one reason for the speedup. Secondly, though related, for each transaction it needs to write a copy of the database as a journal, make changes and then rename one of the two copies and remove the other (or some dance similar to that, I haven't checked the details recently and my memory is not that good). Again a lot of stuff which I presume it can skip when using an in-memory version.

[gravatar]
Roger 12:41 PM on 12 May 2011

Putting the database on ramdisk will make no difference for database reads due to the OS filesystem cache. It will make a big difference for writes because they require writing to the journal and doing one or more fsync() calls to ensure the journal is persistent (will survive a power failure at that point). See the SQLite FAQ but the principles apply to any database that is durable:

http://www.sqlite.org/faq.html#q19

One other exception to this is Windows XP which uses a 10MB file system cache by default no matter how much RAM you have. A registry edit can fix this.

@Floris: You can disable the fsync calls in SQLite but only if you don't care about durability. See the journal_mode pragma at http://www.sqlite.org/pragma.html#pragma_journal_mode

[gravatar]
Andrew Dalke 2:11 PM on 12 May 2011

The upcoming PostgreSQL 9.1 supports "Unlogged Tables", which are in-memory databases. Here's some timing numbers: http://www.depesz.com/index.php/2011/01/03/waiting-for-9-1-unlogged-tables/ . That would give you another dimension of this problem to explore.

[gravatar]
Rand 8:44 AM on 16 May 2011

How about using Spatialite, sqlite with GIS features ?

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.