[geeklog-devel] [Fwd: Re: Geeklog optimalisations]

Fri Mar 12 11:17:38 EST 2004

Yeah, I think we should start only with DB optimizations (SQL tuning, 
reducing queries, etc).  I also think switching to InnoDB should be 
seriously considered. Doing so is something we can do relatively 
painlessly via upgrade scripts.  I say start small, go after the bigger 
problems.  Starting with comments would be good, however, I think 
someone should put together a plan on how to make sure during our 
performance analysis we cover all of the system .

I still say leave the template stuff for 2.0...I don't think filesystem 
access is causing that much a of a problem.  And yes, I need to get 
serious about 2.0...latley I have been telling myself to shit or get off 
  the pot...

--Tony

Vincent Furia wrote:
> I'm in!  Though if we're going to get serious about upgrading geeklog 
> 1.3.x to reduce queries and seriously improve performance for high 
> traffic sites it might be time to start thinking about branching 1.3.x 
> off and using the main branch to start implementing these performance 
> improvements (and perhaps calling it 1.4.x).
> 
> I actually don't think we're too far away from getting Geeklog to decent 
> speeds.  As Niels pointed out (and as has been pointed out before I 
> think) we only really have one table locking issue.  The database 
> queries are something that can be fixed with some intelligent caching 
> and better shaped queries.
> 
> If we're feeling really brave, swithing template systems to a 
> pre-compiled template system like Smarty wouldn't be a bad idea.  I 
> think that will do ALOT to improve performance.
> 
> -Vinny
> 
> Tony Bibbs wrote:
> 
>> This is an FYI.  We'll be discussing this on the development lists 
>> over the next few days (I hope).  It's important we help Groklaw as 
>> best we can as they are one of our bigger sites and by them pushing 
>> the limits of Geeklog we can address their issues and make Geeklog a 
>> better product at the same time.
>>
>> --Tony
>>
>> ---------------------
>>
>> Niels,
>>
>> I think a bit of background is in order before you can understand how
>> Geeklog got where it is.  First, nearly all the code you are referring
>> to is legacy code.  It was there before I managed the project and it is
>> still there under Dirk's management.  In it's infancy, Geeklog was only
>> servicing smaller sites so performance was never really an issue and,
>> frankly, I was a bit young and dumb when I first got started with
>> Geeklog so performance tuning PHP scripts wasn't even a consideration
>> and my focus was on the feature set.
>>
>> Under Dirk's management, the feature set has continued to grow to the
>> point that we have a large userbase and what you are encountering with
>> Geeklog is only natural.  Groklaw is clearly one of the biggest sites to
>> run Geeklog.  I have posted questions to our mailing lists asking about
>> performance issues realted to bigger Geeklog sites getting no responses
>> back so your email was a pleasant surprise.
>>
>> The long and the short of it is Geeklog has matured to a point where
>> bigger sites are using it and we pushing the performance limits it has.
>>  Geeklog's database interaction has always been an issue for me and is
>> a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)
>> while the 1.3.x continues.  You are right, we need to address the
>> performance issues and given the amount of work you have put into
>> troubleshooting Groklaw I think you can play a critical part in that.
>>
>> What I would like to do is see us work closely with you to begin
>> addressing these issues.  A starting point would be to have a place
>> where we can install a development version of Groklaw's database
>> somewhere where we can run tests.  Dirk and I don't have access to a
>> database of that size and while we could fudge together some data using
>> a real world example would sure be nice.  Once we have a test bed, I'd
>> be open to suggestions on how we might work on this to resolve your
>> immediate issues *and* begin addressing performance tuning as a whole.
>>
>> #geeklog is where I dwell (though not always at the keyboard).  If
>> possible I'd like to see us discuss this on geeklog-devtalk.  Niels, if
>> you could join that list at http://lists.geeklog.net/listinfo we can
>> carry this on there.  In the meantime if you can catch Dirk or myself in
>> IRC feel free to do so.  FYI I'm out of town this weekend (FWIW I'm GMT
>> -6) so I may not seem too responsive until I get back on Sunday.
>>
>> Thanks for contacting us, I'm sure we can address these issues.
>>
>> --Tony
>>
>>
>>
>> Niels Leenheer wrote:
>>
>>> Hi guys,
>>>
>>> First of all. What were you guys thinking? Sorry to be so rude, but I 
>>> simply
>>> had to get that off my chest. I feel better now. I'm okay. Really.
>>>
>>> As some of you may be aware of Groklaw is using Geeklog. It has 
>>> turned in to
>>> quite a busy website and stories with more than 700 comments are not 
>>> out of
>>> the ordinary. In addition to this being slashdotted has become 
>>> normal. This
>>> is where the problems started. The server can't handle much more. On 
>>> busy
>>> days the website turns into a crawling slow pile of ..
>>>
>>> As a regular reader and volunteer of Groklaw I offered to take a look 
>>> at the
>>> Geeklog source code and try to find some places that could benefit from
>>> optimalisation. After some testing I've noticed that most of the 
>>> problems
>>> are due to load on the database server.
>>>
>>> The first thing I started working on is the code that generates all the
>>> comments. It turns out that for every comment at least two queries are
>>> executed. For a story with more than 700 comments this would mean 
>>> more than
>>> almost 1500 queries to generate the page.
>>>
>>> I've modified this code extensively and now we use one query to fetch 
>>> all
>>> the user details of all the people involved in posting. One query is 
>>> used to
>>> fetch all the comments that have no parent. One query to fetch all the
>>> comments to do have parents. And if needed, one query to fetch the 
>>> parent.
>>> All this data is then turned into one big nested array, which is 
>>> passed by
>>> reference to the functions that actually print the data. Depending on 
>>> how
>>> many comments there are this could result in a speed improvement of 
>>> about
>>> 0% - 1000%. As you can imagine if you only have about 10 comments it 
>>> would
>>> not mean much, with 500 comments it would reduce the amount of queries
>>> needed by about a 1000. It's a very big improvement.
>>>
>>> One other problem I've identified is table locking of the story 
>>> table. The
>>> statistics are stored in the same table as the actual content of the 
>>> story.
>>> So each time a story is displayed, it will use an UPDATE query and a 
>>> SELECT
>>> query on the same table. With a lot of requests the table is constantly
>>> locked by the UPDATE queries and the SELECT queries are waiting. We've
>>> disabled the statistics for now, but we are investigating the 
>>> possibility of
>>> moving the statistics to a separate table.
>>>
>>> Next is the database layer. The mysql_fetch_array() function has two
>>> arguments. The second determines what the function returns. Either an
>>> associative array, a numbered array or both. By default the function 
>>> returns
>>> both. This is what Geeklog does. In most of the code only the 
>>> associative
>>> array is used. Only in a couple of small instances the code requires an
>>> numbered array. What we have done is to instruct the mysql_fetch_array()
>>> function to return only an associative array by default. Only when 
>>> the code
>>> requires a numbered array we request both. This should lower the 
>>> amount of
>>> memory needed by Geeklog.
>>>
>>> The SEC_getUserGroups() function is also quite expensive. It is called
>>> throughout the generation a page and it does not cache the 
>>> information. We'
>>> ve added a simple cache for the data that is fetched from the 
>>> database which
>>> eliminates another 30 or so queries.
>>>
>>> Next is the index page. The COM_featuredCheck() function is executed 
>>> every
>>> time the frontpage is requested. I've changed the loop that actually
>>> displays the stories on the frontpage and included a check to see if 
>>> there
>>> is more than one featured story. If there is, the second story is not
>>> displayed as such and the featuredCheck() function is called. This again
>>> saves a couple of queries and the end result is the same.
>>>
>>> We are also using the mycal extension which I've almost completely
>>> rewritten. Mycal uses a query for every day that is displayed and 
>>> after my
>>> modifications it only uses one query. A 27-34 reduction in queries.
>>>
>>> Now back to my first paragraph. I was pretty impressed with how easy 
>>> it was
>>> to get used to the way everything works in Geeklog. It was pretty 
>>> easy to
>>> understand and it looks like it was designed pretty well. But I was also
>>> horrified when I saw the enormous amount of queries that are used, but I
>>> guess Geeklog wasn't really designed with this kind of traffic and these
>>> enormous amounts of comments in mind.
>>>
>>> Most of the changes we've made are not yet running on the production 
>>> server.
>>> Once we've properly tested everything and everything is stable, I'm 
>>> willing
>>> to look at how we can give these changes back to Geeklog. As simple 
>>> patch
>>> between the current version of Geeklog and Groklaw will be difficult,
>>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw
>>> specific modifications. If you are interested in these modifications, 
>>> please
>>> let me know and we'll work something out.
>>>
>>> If you want to talk to me about this you can e-mail. In addition to this
>>> I'll try to visit #geeklog as often as I can.
>>>
>>> Niels Leenheer
>>> -- project manager phpAdsNew
>>>
>>>
>>>
>>
>> _______________________________________________
>> geeklog-devel mailing list
>> geeklog-devel at lists.geeklog.net
>> http://lists.geeklog.net/listinfo/geeklog-devel
>>
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://lists.geeklog.net/listinfo/geeklog-devel