[geeklog-devel] [Fwd: Re: Geeklog optimalisations]

Vincent Furia vmf at abtech.org
Fri Mar 12 10:36:57 EST 2004


I'm in!  Though if we're going to get serious about upgrading geeklog 
1.3.x to reduce queries and seriously improve performance for high 
traffic sites it might be time to start thinking about branching 1.3.x 
off and using the main branch to start implementing these performance 
improvements (and perhaps calling it 1.4.x).

I actually don't think we're too far away from getting Geeklog to decent 
speeds.  As Niels pointed out (and as has been pointed out before I 
think) we only really have one table locking issue.  The database 
queries are something that can be fixed with some intelligent caching 
and better shaped queries.

If we're feeling really brave, swithing template systems to a 
pre-compiled template system like Smarty wouldn't be a bad idea.  I 
think that will do ALOT to improve performance.

-Vinny

Tony Bibbs wrote:
> This is an FYI.  We'll be discussing this on the development lists over 
> the next few days (I hope).  It's important we help Groklaw as best we 
> can as they are one of our bigger sites and by them pushing the limits 
> of Geeklog we can address their issues and make Geeklog a better product 
> at the same time.
> 
> --Tony
> 
> ---------------------
> 
> Niels,
> 
> I think a bit of background is in order before you can understand how
> Geeklog got where it is.  First, nearly all the code you are referring
> to is legacy code.  It was there before I managed the project and it is
> still there under Dirk's management.  In it's infancy, Geeklog was only
> servicing smaller sites so performance was never really an issue and,
> frankly, I was a bit young and dumb when I first got started with
> Geeklog so performance tuning PHP scripts wasn't even a consideration
> and my focus was on the feature set.
> 
> Under Dirk's management, the feature set has continued to grow to the
> point that we have a large userbase and what you are encountering with
> Geeklog is only natural.  Groklaw is clearly one of the biggest sites to
> run Geeklog.  I have posted questions to our mailing lists asking about
> performance issues realted to bigger Geeklog sites getting no responses
> back so your email was a pleasant surprise.
> 
> The long and the short of it is Geeklog has matured to a point where
> bigger sites are using it and we pushing the performance limits it has.
>  Geeklog's database interaction has always been an issue for me and is
> a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)
> while the 1.3.x continues.  You are right, we need to address the
> performance issues and given the amount of work you have put into
> troubleshooting Groklaw I think you can play a critical part in that.
> 
> What I would like to do is see us work closely with you to begin
> addressing these issues.  A starting point would be to have a place
> where we can install a development version of Groklaw's database
> somewhere where we can run tests.  Dirk and I don't have access to a
> database of that size and while we could fudge together some data using
> a real world example would sure be nice.  Once we have a test bed, I'd
> be open to suggestions on how we might work on this to resolve your
> immediate issues *and* begin addressing performance tuning as a whole.
> 
> #geeklog is where I dwell (though not always at the keyboard).  If
> possible I'd like to see us discuss this on geeklog-devtalk.  Niels, if
> you could join that list at http://lists.geeklog.net/listinfo we can
> carry this on there.  In the meantime if you can catch Dirk or myself in
> IRC feel free to do so.  FYI I'm out of town this weekend (FWIW I'm GMT
> -6) so I may not seem too responsive until I get back on Sunday.
> 
> Thanks for contacting us, I'm sure we can address these issues.
> 
> --Tony
> 
> 
> 
> Niels Leenheer wrote:
> 
>> Hi guys,
>>
>> First of all. What were you guys thinking? Sorry to be so rude, but I 
>> simply
>> had to get that off my chest. I feel better now. I'm okay. Really.
>>
>> As some of you may be aware of Groklaw is using Geeklog. It has turned 
>> in to
>> quite a busy website and stories with more than 700 comments are not 
>> out of
>> the ordinary. In addition to this being slashdotted has become normal. 
>> This
>> is where the problems started. The server can't handle much more. On busy
>> days the website turns into a crawling slow pile of ..
>>
>> As a regular reader and volunteer of Groklaw I offered to take a look 
>> at the
>> Geeklog source code and try to find some places that could benefit from
>> optimalisation. After some testing I've noticed that most of the problems
>> are due to load on the database server.
>>
>> The first thing I started working on is the code that generates all the
>> comments. It turns out that for every comment at least two queries are
>> executed. For a story with more than 700 comments this would mean more 
>> than
>> almost 1500 queries to generate the page.
>>
>> I've modified this code extensively and now we use one query to fetch all
>> the user details of all the people involved in posting. One query is 
>> used to
>> fetch all the comments that have no parent. One query to fetch all the
>> comments to do have parents. And if needed, one query to fetch the 
>> parent.
>> All this data is then turned into one big nested array, which is 
>> passed by
>> reference to the functions that actually print the data. Depending on how
>> many comments there are this could result in a speed improvement of about
>> 0% - 1000%. As you can imagine if you only have about 10 comments it 
>> would
>> not mean much, with 500 comments it would reduce the amount of queries
>> needed by about a 1000. It's a very big improvement.
>>
>> One other problem I've identified is table locking of the story table. 
>> The
>> statistics are stored in the same table as the actual content of the 
>> story.
>> So each time a story is displayed, it will use an UPDATE query and a 
>> SELECT
>> query on the same table. With a lot of requests the table is constantly
>> locked by the UPDATE queries and the SELECT queries are waiting. We've
>> disabled the statistics for now, but we are investigating the 
>> possibility of
>> moving the statistics to a separate table.
>>
>> Next is the database layer. The mysql_fetch_array() function has two
>> arguments. The second determines what the function returns. Either an
>> associative array, a numbered array or both. By default the function 
>> returns
>> both. This is what Geeklog does. In most of the code only the associative
>> array is used. Only in a couple of small instances the code requires an
>> numbered array. What we have done is to instruct the mysql_fetch_array()
>> function to return only an associative array by default. Only when the 
>> code
>> requires a numbered array we request both. This should lower the 
>> amount of
>> memory needed by Geeklog.
>>
>> The SEC_getUserGroups() function is also quite expensive. It is called
>> throughout the generation a page and it does not cache the 
>> information. We'
>> ve added a simple cache for the data that is fetched from the database 
>> which
>> eliminates another 30 or so queries.
>>
>> Next is the index page. The COM_featuredCheck() function is executed 
>> every
>> time the frontpage is requested. I've changed the loop that actually
>> displays the stories on the frontpage and included a check to see if 
>> there
>> is more than one featured story. If there is, the second story is not
>> displayed as such and the featuredCheck() function is called. This again
>> saves a couple of queries and the end result is the same.
>>
>> We are also using the mycal extension which I've almost completely
>> rewritten. Mycal uses a query for every day that is displayed and 
>> after my
>> modifications it only uses one query. A 27-34 reduction in queries.
>>
>> Now back to my first paragraph. I was pretty impressed with how easy 
>> it was
>> to get used to the way everything works in Geeklog. It was pretty easy to
>> understand and it looks like it was designed pretty well. But I was also
>> horrified when I saw the enormous amount of queries that are used, but I
>> guess Geeklog wasn't really designed with this kind of traffic and these
>> enormous amounts of comments in mind.
>>
>> Most of the changes we've made are not yet running on the production 
>> server.
>> Once we've properly tested everything and everything is stable, I'm 
>> willing
>> to look at how we can give these changes back to Geeklog. As simple patch
>> between the current version of Geeklog and Groklaw will be difficult,
>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw
>> specific modifications. If you are interested in these modifications, 
>> please
>> let me know and we'll work something out.
>>
>> If you want to talk to me about this you can e-mail. In addition to this
>> I'll try to visit #geeklog as often as I can.
>>
>> Niels Leenheer
>> -- project manager phpAdsNew
>>
>>
>>
> 
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://lists.geeklog.net/listinfo/geeklog-devel
> 



More information about the geeklog-devel mailing list