[geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Tony Bibbs
tony at tonybibbs.com
Fri Mar 12 11:17:38 EST 2004
Yeah, I think we should start only with DB optimizations (SQL tuning,
reducing queries, etc). I also think switching to InnoDB should be
seriously considered. Doing so is something we can do relatively
painlessly via upgrade scripts. I say start small, go after the bigger
problems. Starting with comments would be good, however, I think
someone should put together a plan on how to make sure during our
performance analysis we cover all of the system .
I still say leave the template stuff for 2.0...I don't think filesystem
access is causing that much a of a problem. And yes, I need to get
serious about 2.0...latley I have been telling myself to shit or get off
the pot...
--Tony
Vincent Furia wrote:
> I'm in! Though if we're going to get serious about upgrading geeklog
> 1.3.x to reduce queries and seriously improve performance for high
> traffic sites it might be time to start thinking about branching 1.3.x
> off and using the main branch to start implementing these performance
> improvements (and perhaps calling it 1.4.x).
>
> I actually don't think we're too far away from getting Geeklog to decent
> speeds. As Niels pointed out (and as has been pointed out before I
> think) we only really have one table locking issue. The database
> queries are something that can be fixed with some intelligent caching
> and better shaped queries.
>
> If we're feeling really brave, swithing template systems to a
> pre-compiled template system like Smarty wouldn't be a bad idea. I
> think that will do ALOT to improve performance.
>
> -Vinny
>
> Tony Bibbs wrote:
>
>> This is an FYI. We'll be discussing this on the development lists
>> over the next few days (I hope). It's important we help Groklaw as
>> best we can as they are one of our bigger sites and by them pushing
>> the limits of Geeklog we can address their issues and make Geeklog a
>> better product at the same time.
>>
>> --Tony
>>
>> ---------------------
>>
>> Niels,
>>
>> I think a bit of background is in order before you can understand how
>> Geeklog got where it is. First, nearly all the code you are referring
>> to is legacy code. It was there before I managed the project and it is
>> still there under Dirk's management. In it's infancy, Geeklog was only
>> servicing smaller sites so performance was never really an issue and,
>> frankly, I was a bit young and dumb when I first got started with
>> Geeklog so performance tuning PHP scripts wasn't even a consideration
>> and my focus was on the feature set.
>>
>> Under Dirk's management, the feature set has continued to grow to the
>> point that we have a large userbase and what you are encountering with
>> Geeklog is only natural. Groklaw is clearly one of the biggest sites to
>> run Geeklog. I have posted questions to our mailing lists asking about
>> performance issues realted to bigger Geeklog sites getting no responses
>> back so your email was a pleasant surprise.
>>
>> The long and the short of it is Geeklog has matured to a point where
>> bigger sites are using it and we pushing the performance limits it has.
>> Geeklog's database interaction has always been an issue for me and is
>> a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)
>> while the 1.3.x continues. You are right, we need to address the
>> performance issues and given the amount of work you have put into
>> troubleshooting Groklaw I think you can play a critical part in that.
>>
>> What I would like to do is see us work closely with you to begin
>> addressing these issues. A starting point would be to have a place
>> where we can install a development version of Groklaw's database
>> somewhere where we can run tests. Dirk and I don't have access to a
>> database of that size and while we could fudge together some data using
>> a real world example would sure be nice. Once we have a test bed, I'd
>> be open to suggestions on how we might work on this to resolve your
>> immediate issues *and* begin addressing performance tuning as a whole.
>>
>> #geeklog is where I dwell (though not always at the keyboard). If
>> possible I'd like to see us discuss this on geeklog-devtalk. Niels, if
>> you could join that list at http://lists.geeklog.net/listinfo we can
>> carry this on there. In the meantime if you can catch Dirk or myself in
>> IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT
>> -6) so I may not seem too responsive until I get back on Sunday.
>>
>> Thanks for contacting us, I'm sure we can address these issues.
>>
>> --Tony
>>
>>
>>
>> Niels Leenheer wrote:
>>
>>> Hi guys,
>>>
>>> First of all. What were you guys thinking? Sorry to be so rude, but I
>>> simply
>>> had to get that off my chest. I feel better now. I'm okay. Really.
>>>
>>> As some of you may be aware of Groklaw is using Geeklog. It has
>>> turned in to
>>> quite a busy website and stories with more than 700 comments are not
>>> out of
>>> the ordinary. In addition to this being slashdotted has become
>>> normal. This
>>> is where the problems started. The server can't handle much more. On
>>> busy
>>> days the website turns into a crawling slow pile of ..
>>>
>>> As a regular reader and volunteer of Groklaw I offered to take a look
>>> at the
>>> Geeklog source code and try to find some places that could benefit from
>>> optimalisation. After some testing I've noticed that most of the
>>> problems
>>> are due to load on the database server.
>>>
>>> The first thing I started working on is the code that generates all the
>>> comments. It turns out that for every comment at least two queries are
>>> executed. For a story with more than 700 comments this would mean
>>> more than
>>> almost 1500 queries to generate the page.
>>>
>>> I've modified this code extensively and now we use one query to fetch
>>> all
>>> the user details of all the people involved in posting. One query is
>>> used to
>>> fetch all the comments that have no parent. One query to fetch all the
>>> comments to do have parents. And if needed, one query to fetch the
>>> parent.
>>> All this data is then turned into one big nested array, which is
>>> passed by
>>> reference to the functions that actually print the data. Depending on
>>> how
>>> many comments there are this could result in a speed improvement of
>>> about
>>> 0% - 1000%. As you can imagine if you only have about 10 comments it
>>> would
>>> not mean much, with 500 comments it would reduce the amount of queries
>>> needed by about a 1000. It's a very big improvement.
>>>
>>> One other problem I've identified is table locking of the story
>>> table. The
>>> statistics are stored in the same table as the actual content of the
>>> story.
>>> So each time a story is displayed, it will use an UPDATE query and a
>>> SELECT
>>> query on the same table. With a lot of requests the table is constantly
>>> locked by the UPDATE queries and the SELECT queries are waiting. We've
>>> disabled the statistics for now, but we are investigating the
>>> possibility of
>>> moving the statistics to a separate table.
>>>
>>> Next is the database layer. The mysql_fetch_array() function has two
>>> arguments. The second determines what the function returns. Either an
>>> associative array, a numbered array or both. By default the function
>>> returns
>>> both. This is what Geeklog does. In most of the code only the
>>> associative
>>> array is used. Only in a couple of small instances the code requires an
>>> numbered array. What we have done is to instruct the mysql_fetch_array()
>>> function to return only an associative array by default. Only when
>>> the code
>>> requires a numbered array we request both. This should lower the
>>> amount of
>>> memory needed by Geeklog.
>>>
>>> The SEC_getUserGroups() function is also quite expensive. It is called
>>> throughout the generation a page and it does not cache the
>>> information. We'
>>> ve added a simple cache for the data that is fetched from the
>>> database which
>>> eliminates another 30 or so queries.
>>>
>>> Next is the index page. The COM_featuredCheck() function is executed
>>> every
>>> time the frontpage is requested. I've changed the loop that actually
>>> displays the stories on the frontpage and included a check to see if
>>> there
>>> is more than one featured story. If there is, the second story is not
>>> displayed as such and the featuredCheck() function is called. This again
>>> saves a couple of queries and the end result is the same.
>>>
>>> We are also using the mycal extension which I've almost completely
>>> rewritten. Mycal uses a query for every day that is displayed and
>>> after my
>>> modifications it only uses one query. A 27-34 reduction in queries.
>>>
>>> Now back to my first paragraph. I was pretty impressed with how easy
>>> it was
>>> to get used to the way everything works in Geeklog. It was pretty
>>> easy to
>>> understand and it looks like it was designed pretty well. But I was also
>>> horrified when I saw the enormous amount of queries that are used, but I
>>> guess Geeklog wasn't really designed with this kind of traffic and these
>>> enormous amounts of comments in mind.
>>>
>>> Most of the changes we've made are not yet running on the production
>>> server.
>>> Once we've properly tested everything and everything is stable, I'm
>>> willing
>>> to look at how we can give these changes back to Geeklog. As simple
>>> patch
>>> between the current version of Geeklog and Groklaw will be difficult,
>>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw
>>> specific modifications. If you are interested in these modifications,
>>> please
>>> let me know and we'll work something out.
>>>
>>> If you want to talk to me about this you can e-mail. In addition to this
>>> I'll try to visit #geeklog as often as I can.
>>>
>>> Niels Leenheer
>>> -- project manager phpAdsNew
>>>
>>>
>>>
>>
>> _______________________________________________
>> geeklog-devel mailing list
>> geeklog-devel at lists.geeklog.net
>> http://lists.geeklog.net/listinfo/geeklog-devel
>>
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://lists.geeklog.net/listinfo/geeklog-devel
More information about the geeklog-devel
mailing list