[geeklog-devel] Comment Speed Improvement

Vincent Furia vmf at abtech.org
Sat Mar 27 14:54:13 EST 2004


I've been thinking hard about improving the speed of displaying comments 
by reducing the number of queries and potentially removing the need to 
use recursive functions.

While Niels solution provides a quick fix for groklaw's specific 
problems, it is pretty inefficient when you are looking at a subset of 
comments for a story.  A more elegant solution must be possilbe..

I think I've found that solution, but it will take some work (including 
a database change) to implement.  While I'm fully capable of 
implementing it, I wanted to get some feed-back before jumping in.

The idea is to store the comments in the database using a modified 
preorder tree traversal method.  A brief description (along with a 
comparison to our current method) can be found in this article:

	http://www.sitepoint.com/article/hierarchical-data-database

A better description is this site:

	http://membres.lycos.fr/sqlpro/Tree/SQL_tree.htm

However it is in French (which I don't speak).  But babelfish 
(http://babel.altavista.com) works pretty well on the first half of the 
page.  Plus it has many useful graphics.

The idea is a little confusing at first, but once you wrap your head 
around it you'll think:  "Wow, hey, that's pretty darn cool".  Or maybe 
you'll think: "Mein Gott, das ist wirklich kühl.". ;)

The summary:

- 1 query and 2 updates to insert a new comment
- 1 query to get all the comments (in a logical order for display)
- 1 query to get all the children of a given comment (with or without 
the given comment)
- 1 query to count the number of children a comment has

Basically this will drastically improve the speed of comment displays of 
all types, especially for sites that have very large databases of 
comments (groklaw).  The only expense is adding two extra db queries 
when a comment gets added (including one that could potentially update 
almost every database row).

The only change to the database will be the addition of two integer 
columns: left terminal (lft) and right terminal (rgt).  Optionally we 
could also remove the parent id (pid) column as it would no longer be 
technically necessary.  That being said, I think we should preserve it 
for convenience.

I think the improvement will be worth it, though we'll have to write a 
slick upgrade script to take care of current comments.  A side effect of 
using this method is that we will be able to write a comment function 
that is iterative rather than recursive which will further improve 
performance.

Another note is that this method is database independent.  There are 
some other pretty nice solutions that require stored procedures and I 
think Oracle has something built in.  Obviously those other solutions 
are really going to work well with MySQL.

If the article above doesn't make sense, let me know and I can write a 
brief summary of what is going on in "preorder tree traversal" method.

Please everyone, let me know what you think.  I'd especially like some 
input from a DB guy to make sure I'm not missing something obvious 
(Dwight, you still out there?).

-Vinny



More information about the geeklog-devel mailing list