[geeklog-devel] [Fwd: Re: Geeklog optimalisations]

Fri Mar 12 11:44:58 EST 2004

I agree session management should be considered but, again, this would 
be a lot of work.  Having a user object in the session would be a 
tremendous help.  Switching to PHP4 sessions might be too much work 
(research is needed) but you may be able to emulate PHP4 sessions by 
adding a data field to the current sessions table and then do 
serialization/deserialization from that.  I digress.

The big issue at hand is addressing a few short term performance 
problem.  I think looking into Niel's customizations is a starting 
point.  Blaine is right, we need a fix that won't affect the majority of 
users yet give sites like Groklaw and our big Mac-related sites more 
bang for their buck.  I think addressing the comments is a big issue. 
particularly considering the differ modes you can view the comments in.

--Tony

Blaine Lang wrote:
> I have to agree with that Geeklog as it exists is fine for 98% of our users
> but addressing some of these issues can only make it better.
> 
> We just need to be careful as a lot of the bugs that we have been addressing
> are permission based ones and we don't want to introduce more issues that
> will effect the 98% of our users. I know it is well known, that adding new
> code and features often introduces new bugs so need to keep that in mind
> when assessing the effort for such a project.
> 
> What about using PHP Sessions to cache some of the common information and
> eliminate a lot of the SQL queries.
> Possible areas would be:
>   - User access rights and group membership
>   - $_USER array
> 
> It would be nice for Plugins to have access to SESSIONS as well.
> 
> ----- Original Message ----- 
> From: "Tony Bibbs" <tony at tonybibbs.com>
> To: <geeklog-devel at lists.geeklog.net>
> Sent: Friday, March 12, 2004 11:17 AM
> Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
> 
> 
> 
>>Yeah, I think we should start only with DB optimizations (SQL tuning,
>>reducing queries, etc).  I also think switching to InnoDB should be
>>seriously considered. Doing so is something we can do relatively
>>painlessly via upgrade scripts.  I say start small, go after the bigger
>>problems.  Starting with comments would be good, however, I think
>>someone should put together a plan on how to make sure during our
>>performance analysis we cover all of the system .
>>
>>I still say leave the template stuff for 2.0...I don't think filesystem
>>access is causing that much a of a problem.  And yes, I need to get
>>serious about 2.0...latley I have been telling myself to shit or get off
>>  the pot...
>>
>>--Tony
>>
>>Vincent Furia wrote:
>>
>>>I'm in!  Though if we're going to get serious about upgrading geeklog
>>>1.3.x to reduce queries and seriously improve performance for high
>>>traffic sites it might be time to start thinking about branching 1.3.x
>>>off and using the main branch to start implementing these performance
>>>improvements (and perhaps calling it 1.4.x).
>>>
>>>I actually don't think we're too far away from getting Geeklog to decent
>>>speeds.  As Niels pointed out (and as has been pointed out before I
>>>think) we only really have one table locking issue.  The database
>>>queries are something that can be fixed with some intelligent caching
>>>and better shaped queries.
>>>
>>>If we're feeling really brave, swithing template systems to a
>>>pre-compiled template system like Smarty wouldn't be a bad idea.  I
>>>think that will do ALOT to improve performance.
>>>
>>>-Vinny
>>>
>>>Tony Bibbs wrote:
>>>
>>>
>>>>This is an FYI.  We'll be discussing this on the development lists
>>>>over the next few days (I hope).  It's important we help Groklaw as
>>>>best we can as they are one of our bigger sites and by them pushing
>>>>the limits of Geeklog we can address their issues and make Geeklog a
>>>>better product at the same time.
>>>>
>>>>--Tony
>>>>
>>>>---------------------
>>>>
>>>>Niels,
>>>>
>>>>I think a bit of background is in order before you can understand how
>>>>Geeklog got where it is.  First, nearly all the code you are referring
>>>>to is legacy code.  It was there before I managed the project and it is
>>>>still there under Dirk's management.  In it's infancy, Geeklog was only
>>>>servicing smaller sites so performance was never really an issue and,
>>>>frankly, I was a bit young and dumb when I first got started with
>>>>Geeklog so performance tuning PHP scripts wasn't even a consideration
>>>>and my focus was on the feature set.
>>>>
>>>>Under Dirk's management, the feature set has continued to grow to the
>>>>point that we have a large userbase and what you are encountering with
>>>>Geeklog is only natural.  Groklaw is clearly one of the biggest sites
> 
> to
> 
>>>>run Geeklog.  I have posted questions to our mailing lists asking about
>>>>performance issues realted to bigger Geeklog sites getting no responses
>>>>back so your email was a pleasant surprise.
>>>>
>>>>The long and the short of it is Geeklog has matured to a point where
>>>>bigger sites are using it and we pushing the performance limits it has.
>>>> Geeklog's database interaction has always been an issue for me and is
>>>>a large part why I have chosen to get a new codebase up (i.e. Geeklog
> 
> 2)
> 
>>>>while the 1.3.x continues.  You are right, we need to address the
>>>>performance issues and given the amount of work you have put into
>>>>troubleshooting Groklaw I think you can play a critical part in that.
>>>>
>>>>What I would like to do is see us work closely with you to begin
>>>>addressing these issues.  A starting point would be to have a place
>>>>where we can install a development version of Groklaw's database
>>>>somewhere where we can run tests.  Dirk and I don't have access to a
>>>>database of that size and while we could fudge together some data using
>>>>a real world example would sure be nice.  Once we have a test bed, I'd
>>>>be open to suggestions on how we might work on this to resolve your
>>>>immediate issues *and* begin addressing performance tuning as a whole.
>>>>
>>>>#geeklog is where I dwell (though not always at the keyboard).  If
>>>>possible I'd like to see us discuss this on geeklog-devtalk.  Niels, if
>>>>you could join that list at http://lists.geeklog.net/listinfo we can
>>>>carry this on there.  In the meantime if you can catch Dirk or myself
> 
> in
> 
>>>>IRC feel free to do so.  FYI I'm out of town this weekend (FWIW I'm GMT
>>>>-6) so I may not seem too responsive until I get back on Sunday.
>>>>
>>>>Thanks for contacting us, I'm sure we can address these issues.
>>>>
>>>>--Tony
>>>>
>>>>
>>>>
>>>>Niels Leenheer wrote:
>>>>
>>>>
>>>>>Hi guys,
>>>>>
>>>>>First of all. What were you guys thinking? Sorry to be so rude, but I
>>>>>simply
>>>>>had to get that off my chest. I feel better now. I'm okay. Really.
>>>>>
>>>>>As some of you may be aware of Groklaw is using Geeklog. It has
>>>>>turned in to
>>>>>quite a busy website and stories with more than 700 comments are not
>>>>>out of
>>>>>the ordinary. In addition to this being slashdotted has become
>>>>>normal. This
>>>>>is where the problems started. The server can't handle much more. On
>>>>>busy
>>>>>days the website turns into a crawling slow pile of ..
>>>>>
>>>>>As a regular reader and volunteer of Groklaw I offered to take a look
>>>>>at the
>>>>>Geeklog source code and try to find some places that could benefit
> 
> from
> 
>>>>>optimalisation. After some testing I've noticed that most of the
>>>>>problems
>>>>>are due to load on the database server.
>>>>>
>>>>>The first thing I started working on is the code that generates all
> 
> the
> 
>>>>>comments. It turns out that for every comment at least two queries are
>>>>>executed. For a story with more than 700 comments this would mean
>>>>>more than
>>>>>almost 1500 queries to generate the page.
>>>>>
>>>>>I've modified this code extensively and now we use one query to fetch
>>>>>all
>>>>>the user details of all the people involved in posting. One query is
>>>>>used to
>>>>>fetch all the comments that have no parent. One query to fetch all the
>>>>>comments to do have parents. And if needed, one query to fetch the
>>>>>parent.
>>>>>All this data is then turned into one big nested array, which is
>>>>>passed by
>>>>>reference to the functions that actually print the data. Depending on
>>>>>how
>>>>>many comments there are this could result in a speed improvement of
>>>>>about
>>>>>0% - 1000%. As you can imagine if you only have about 10 comments it
>>>>>would
>>>>>not mean much, with 500 comments it would reduce the amount of queries
>>>>>needed by about a 1000. It's a very big improvement.
>>>>>
>>>>>One other problem I've identified is table locking of the story
>>>>>table. The
>>>>>statistics are stored in the same table as the actual content of the
>>>>>story.
>>>>>So each time a story is displayed, it will use an UPDATE query and a
>>>>>SELECT
>>>>>query on the same table. With a lot of requests the table is
> 
> constantly
> 
>>>>>locked by the UPDATE queries and the SELECT queries are waiting. We've
>>>>>disabled the statistics for now, but we are investigating the
>>>>>possibility of
>>>>>moving the statistics to a separate table.
>>>>>
>>>>>Next is the database layer. The mysql_fetch_array() function has two
>>>>>arguments. The second determines what the function returns. Either an
>>>>>associative array, a numbered array or both. By default the function
>>>>>returns
>>>>>both. This is what Geeklog does. In most of the code only the
>>>>>associative
>>>>>array is used. Only in a couple of small instances the code requires
> 
> an
> 
>>>>>numbered array. What we have done is to instruct the
> 
> mysql_fetch_array()
> 
>>>>>function to return only an associative array by default. Only when
>>>>>the code
>>>>>requires a numbered array we request both. This should lower the
>>>>>amount of
>>>>>memory needed by Geeklog.
>>>>>
>>>>>The SEC_getUserGroups() function is also quite expensive. It is called
>>>>>throughout the generation a page and it does not cache the
>>>>>information. We'
>>>>>ve added a simple cache for the data that is fetched from the
>>>>>database which
>>>>>eliminates another 30 or so queries.
>>>>>
>>>>>Next is the index page. The COM_featuredCheck() function is executed
>>>>>every
>>>>>time the frontpage is requested. I've changed the loop that actually
>>>>>displays the stories on the frontpage and included a check to see if
>>>>>there
>>>>>is more than one featured story. If there is, the second story is not
>>>>>displayed as such and the featuredCheck() function is called. This
> 
> again
> 
>>>>>saves a couple of queries and the end result is the same.
>>>>>
>>>>>We are also using the mycal extension which I've almost completely
>>>>>rewritten. Mycal uses a query for every day that is displayed and
>>>>>after my
>>>>>modifications it only uses one query. A 27-34 reduction in queries.
>>>>>
>>>>>Now back to my first paragraph. I was pretty impressed with how easy
>>>>>it was
>>>>>to get used to the way everything works in Geeklog. It was pretty
>>>>>easy to
>>>>>understand and it looks like it was designed pretty well. But I was
> 
> also
> 
>>>>>horrified when I saw the enormous amount of queries that are used, but
> 
> I
> 
>>>>>guess Geeklog wasn't really designed with this kind of traffic and
> 
> these
> 
>>>>>enormous amounts of comments in mind.
>>>>>
>>>>>Most of the changes we've made are not yet running on the production
>>>>>server.
>>>>>Once we've properly tested everything and everything is stable, I'm
>>>>>willing
>>>>>to look at how we can give these changes back to Geeklog. As simple
>>>>>patch
>>>>>between the current version of Geeklog and Groklaw will be difficult,
>>>>>because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw
>>>>>specific modifications. If you are interested in these modifications,
>>>>>please
>>>>>let me know and we'll work something out.
>>>>>
>>>>>If you want to talk to me about this you can e-mail. In addition to
> 
> this
> 
>>>>>I'll try to visit #geeklog as often as I can.
>>>>>
>>>>>Niels Leenheer
>>>>>-- project manager phpAdsNew
>>>>>
>>>>>
>>>>>
>>>>
>>>>_______________________________________________
>>>>geeklog-devel mailing list
>>>>geeklog-devel at lists.geeklog.net
>>>>http://lists.geeklog.net/listinfo/geeklog-devel
>>>>
>>>
>>>_______________________________________________
>>>geeklog-devel mailing list
>>>geeklog-devel at lists.geeklog.net
>>>http://lists.geeklog.net/listinfo/geeklog-devel
>>
>>_______________________________________________
>>geeklog-devel mailing list
>>geeklog-devel at lists.geeklog.net
>>http://lists.geeklog.net/listinfo/geeklog-devel
> 
> 
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://lists.geeklog.net/listinfo/geeklog-devel