Thursday 3 August 2006 — This is over 18 years old. Be careful.
This is one of those debugging tales that reveals multiple layers of helpful magic, all of which combine to make things not work, and keep you from figuring out why.
I was experimenting with a complex query in Django 0.91. I needed to join through a number of tables, and the last table needed to use an IN expression to get results that applied to a number of different users at once. The code was something along the lines of:
idlist = ','.join([str(i) for i in ids])
tlist = things.get_list(
tables=['middle','people'],
where=['middle.thing_id = things.id',
'things.owner_id = people.id',
'people.id in (%s)',
'middle.user_id = %s'],
params=[idlist, my_id]
)
Running the code wasn’t producing all the correct results. Many were missing. Tracing the code and looking at the returned values, I noticed something: all the results applied to only one user, rather than to the set that I was trying to get. That realization led me to the problem.
My intent was that a SQL statement would get executed that had something like this in the middle:
.. AND people.id in (12,31,62) AND ..
The first piece of unhelpful magic was that the parameters were strings, so the ORM helpfully quoted them for me. So the SQL actually had this:
.. AND people.id in ('12,31,62') AND ..
Then MySQL helpfully converted that string into the needed integer for me, so that the SQL executed as if it were:
.. AND people.id in (12) AND ..
So of course the query was returning results for just one user.
The last strange thing (I’m not sure whether to call it helpful magic because I don’t know why anyone would find this helpful) was that the SQL statement tracing provided by the ORM for some reason doesn’t display the quoting applied by the ORM. String parameters are displayed without their quote marks. The SQL trace for the original code displayed:
.. AND people.id in (12,31,62) AND ..
which lulled me into believing that the query being executed was the one I wanted.
The fix was a simple one: use Python string replacement instead of ORM parameter replacement to build the query:
idlist = ','.join([str(i) for i in ids])
tlist = things.get_list(
tables=['middle','people'],
where=['middle.thing_id = things.id',
'things.owner_id = people.id',
'people.id in (%s)' % idlist,
'middle.user_id = %s'],
params=[my_id]
)
This avoided the string quoting, which produced the correct SQL, which produced the correct results.
Note: there may have been a better way to express this. One of the difficulties in doing complex work with an ORM is having to learn two languages: SQL, and the ORM. The simplified syntax provided functionally by the ORM is a good thing when it is slight, but once it gets complicated, I find myself thinking in SQL and back-porting to the ORM.
Comments
I see that Django now has the option of custom Managers so you could probably have put your code into real SQL and avoided that weird SQL-in-python code you ended up with.
Overall my experience of ORM's (just SQLObject / Django ORM) is like yours. They are good for the simple things :
ob = Things.get(1)
ob = Things.getByName('ThatObject')
ob.destroy()
ob.categories.add(Category('New Category'))
ob_list = Things.all()
But for anything more complex I want SQL. Yes, I know that makes it less portable and means I have to deal with quoting but I'd rather deal with porting SQL than trying to figure out the ORM-query language which frankly doesn't fit-my-brain.
I usually start every SQL debugging session with printing the query that is actually sent to the database and then moving up the stack from there. It must have been pretty annoying to think that you were doing that only to find out that the ORM was hiding the real SQL.
from sqlalchemy import sql
results = sql.select(
columns=[middle, people],
whereclause=sql.and_(
(middle.c.thing_id == things.c.id),
(things.c.owner_id == people.c.id),
people.c.id.in_(*idlist),
(middle.c.user_id == my_id),
),
from_obj=[things],
use_labels=True,
).execute()
With the column objects, there's a lot more intelligence about the values passed in. Some raw examples:
>>> print tables.tags.c.type.in_()
tags.type IS NULL
>>> print tables.tags.c.type.in_('test')
tags.type = %s
>>> print tables.tags.c.type.in_('test', 'testing')
tags.type IN (%s, %s)
SQLAlchemy provides some nice tools for doing programmatic, fairly-"pythonic" SQL generation. Since complex queries may be dependent on user-input data, having good tools that can deal with single values, multiple values, is nice. Zope's SQL Methods, while being just a template-language implementation, provide a lot of nice tools for this. ``sqltest idlist column='people.id' type=int multiple optional`` would also generate the right statement, ensure it was quoted correctly, and the 'optional' flag would remove the entire statement if `idlist` was effectively null.
Anyways, someone reminded me of a great quote recently. I forget the source and am too hungry right now to look it up. It goes, "Object-Relational Mapping is the Vietnam of computer science" (meaning: quagmire!).
With or without O-R mapping being in the equation, I do like tools like Z SQL Methods and SQLAlchemy's basic level (non-ORM) tools that take care of things like quoting, type enforcement, formatting (especially with dates!), and binding muck for me.
So i'd say the bugs here are at least three (handling, escaping, and logging in the wrong way). This said, why do we still rely on ORM mappings? I wish somebody shipped a good, solid, simple and cheap oo-db.
for thing, p, m in sandbox.recall(
(things << people) << middle,
lambda t, p, m: p.id in ids and m.user_id = my_id):
handle(thing)
No handling, escaping, or logging bugs, and no possibility of injection. [The << implies LEFT JOIN; if there's some better arrangement for your use-case, you can use >> for RIGHT JOIN or & for INNER JOIN.]
But the thing is that "in" condition just can't be represented in such way (i saw this in Oracle too). So good DB binding should've thrown an exception instead of helpfully converting the list into the string.
As for Django's SQL log quoting those "strings" would be as much an error as not quoting because it's equally not true. To make it true Django should store SQL string with placeholders and a tuple of actual data passed with it.
And as far as the logging goes, it would be more accurate to show the query with placeholders, and a tuple of data, but far less convenient. Debug logging needs to strike a balance between the two.
For myself, even its java, HQL of Hibernate is really cool, a good mix of SQL with object inside. I really hope some python HQL like implementation could happen
Just one more example of why Django removed the magic in the 'magic-removal' branch, which has been merged into the mainline and is in the latest stable Django release (0.95).
Not 'all' the magic has been removed; You still sometimes write code like People.objects.filter(id__in = [1,2,3,4]). But the extended where= stuff you definatly do not want to bother with any more. For estreemly complex queries, there is the 'Q' class, which is fantastic.
myQuery = Q(date=datetime.now()) & Q(id__in=[1,2,3,4]) | Q(name='admin')
matching = Table.objects.filter(myQuery).all()
notmatching = Table.objects.exclude(myQuery).all()
The queries are late binding, even the objects returned by all() have yet to perform the querie (they are just iterable,sliceable objects which query on demand).
In some sence, this is adding MORE magic, of a form, but the end result is code which is much more module, easier to debug, and can give better concise error messages.
Add a comment: