[geeklog-devtalk] geeklog-devel digest, Vol 1 #291 - 1 msg

geeklog-devel-request at lists.geeklog.net geeklog-devel-request at lists.geeklog.net
Fri Mar 12 13:00:02 EST 2004


Send geeklog-devel mailing list submissions to
geeklog-devel at lists.geeklog.net

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.geeklog.net/listinfo/geeklog-devel
or, via email, send a message with subject or body 'help' to
geeklog-devel-request at lists.geeklog.net

You can reach the person managing the list at
geeklog-devel-admin at lists.geeklog.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of geeklog-devel digest..."


Today's Topics:

1. Re: [Fwd: Re: Geeklog optimalisations] (Tony Bibbs)

--__--__--

Message: 1
Date: Fri, 12 Mar 2004 10:44:58 -0600
From: Tony Bibbs <tony at tonybibbs.com>
To: geeklog-devel at lists.geeklog.net
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Reply-To: geeklog-devel at lists.geeklog.net

I agree session management should be considered but, again, this would
be a lot of work. Having a user object in the session would be a
tremendous help. Switching to PHP4 sessions might be too much work
(research is needed) but you may be able to emulate PHP4 sessions by
adding a data field to the current sessions table and then do
serialization/deserialization from that. I digress.

The big issue at hand is addressing a few short term performance
problem. I think looking into Niel's customizations is a starting
point. Blaine is right, we need a fix that won't affect the majority of
users yet give sites like Groklaw and our big Mac-related sites more
bang for their buck. I think addressing the comments is a big issue.
particularly considering the differ modes you can view the comments in.

--Tony

Blaine Lang wrote:

> I have to agree with that Geeklog as it exists is fine for 98% of our users

> but addressing some of these issues can only make it better.

>

> We just need to be careful as a lot of the bugs that we have been addressing

> are permission based ones and we don't want to introduce more issues that

> will effect the 98% of our users. I know it is well known, that adding new

> code and features often introduces new bugs so need to keep that in mind

> when assessing the effort for such a project.

>

> What about using PHP Sessions to cache some of the common information and

> eliminate a lot of the SQL queries.

> Possible areas would be:

> - User access rights and group membership

> - $_USER array

>

> It would be nice for Plugins to have access to SESSIONS as well.

>

> ----- Original Message -----

> From: "Tony Bibbs" <tony at tonybibbs.com>

> To: <geeklog-devel at lists.geeklog.net>

> Sent: Friday, March 12, 2004 11:17 AM

> Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]

>

>

>

>>Yeah, I think we should start only with DB optimizations (SQL tuning,

>>reducing queries, etc). I also think switching to InnoDB should be

>>seriously considered. Doing so is something we can do relatively

>>painlessly via upgrade scripts. I say start small, go after the bigger

>>problems. Starting with comments would be good, however, I think

>>someone should put together a plan on how to make sure during our

>>performance analysis we cover all of the system .

>>

>>I still say leave the template stuff for 2.0...I don't think filesystem

>>access is causing that much a of a problem. And yes, I need to get

>>serious about 2.0...latley I have been telling myself to shit or get off

>> the pot...

>>

>>--Tony

>>

>>Vincent Furia wrote:

>>

>>>I'm in! Though if we're going to get serious about upgrading geeklog

>>>1.3.x to reduce queries and seriously improve performance for high

>>>traffic sites it might be time to start thinking about branching 1.3.x

>>>off and using the main branch to start implementing these performance

>>>improvements (and perhaps calling it 1.4.x).

>>>

>>>I actually don't think we're too far away from getting Geeklog to decent

>>>speeds. As Niels pointed out (and as has been pointed out before I

>>>think) we only really have one table locking issue. The database

>>>queries are something that can be fixed with some intelligent caching

>>>and better shaped queries.

>>>

>>>If we're feeling really brave, swithing template systems to a

>>>pre-compiled template system like Smarty wouldn't be a bad idea. I

>>>think that will do ALOT to improve performance.

>>>

>>>-Vinny

>>>

>>>Tony Bibbs wrote:

>>>

>>>

>>>>This is an FYI. We'll be discussing this on the development lists

>>>>over the next few days (I hope). It's important we help Groklaw as

>>>>best we can as they are one of our bigger sites and by them pushing

>>>>the limits of Geeklog we can address their issues and make Geeklog a

>>>>better product at the same time.

>>>>

>>>>--Tony

>>>>

>>>>---------------------

>>>>

>>>>Niels,

>>>>

>>>>I think a bit of background is in order before you can understand how

>>>>Geeklog got where it is. First, nearly all the code you are referring

>>>>to is legacy code. It was there before I managed the project and it is

>>>>still there under Dirk's management. In it's infancy, Geeklog was only

>>>>servicing smaller sites so performance was never really an issue and,

>>>>frankly, I was a bit young and dumb when I first got started with

>>>>Geeklog so performance tuning PHP scripts wasn't even a consideration

>>>>and my focus was on the feature set.

>>>>

>>>>Under Dirk's management, the feature set has continued to grow to the

>>>>point that we have a large userbase and what you are encountering with

>>>>Geeklog is only natural. Groklaw is clearly one of the biggest sites

>

> to

>

>>>>run Geeklog. I have posted questions to our mailing lists asking about

>>>>performance issues realted to bigger Geeklog sites getting no responses

>>>>back so your email was a pleasant surprise.

>>>>

>>>>The long and the short of it is Geeklog has matured to a point where

>>>>bigger sites are using it and we pushing the performance limits it has.

>>>> Geeklog's database interaction has always been an issue for me and is

>>>>a large part why I have chosen to get a new codebase up (i.e. Geeklog

>

> 2)

>

>>>>while the 1.3.x continues. You are right, we need to address the

>>>>performance issues and given the amount of work you have put into

>>>>troubleshooting Groklaw I think you can play a critical part in that.

>>>>

>>>>What I would like to do is see us work closely with you to begin

>>>>addressing these issues. A starting point would be to have a place

>>>>where we can install a development version of Groklaw's database

>>>>somewhere where we can run tests. Dirk and I don't have access to a

>>>>database of that size and while we could fudge together some data using

>>>>a real world example would sure be nice. Once we have a test bed, I'd

>>>>be open to suggestions on how we might work on this to resolve your

>>>>immediate issues *and* begin addressing performance tuning as a whole.

>>>>

>>>>#geeklog is where I dwell (though not always at the keyboard). If

>>>>possible I'd like to see us discuss this on geeklog-devtalk. Niels, if

>>>>you could join that list at http://lists.geeklog.net/listinfo we can

>>>>carry this on there. In the meantime if you can catch Dirk or myself

>

> in

>

>>>>IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT

>>>>-6) so I may not seem too responsive until I get back on Sunday.

>>>>

>>>>Thanks for contacting us, I'm sure we can address these issues.

>>>>

>>>>--Tony

>>>>

>>>>

>>>>

>>>>Niels Leenheer wrote:

>>>>

>>>>

>>>>>Hi guys,

>>>>>

>>>>>First of all. What were you guys thinking? Sorry to be so rude, but I

>>>>>simply

>>>>>had to get that off my chest. I feel better now. I'm okay. Really.

>>>>>

>>>>>As some of you may be aware of Groklaw is using Geeklog. It has

>>>>>turned in to

>>>>>quite a busy website and stories with more than 700 comments are not

>>>>>out of

>>>>>the ordinary. In addition to this being slashdotted has become

>>>>>normal. This

>>>>>is where the problems started. The server can't handle much more. On

>>>>>busy

>>>>>days the website turns into a crawling slow pile of ..

>>>>>

>>>>>As a regular reader and volunteer of Groklaw I offered to take a look

>>>>>at the

>>>>>Geeklog source code and try to find some places that could benefit

>

> from

>

>>>>>optimalisation. After some testing I've noticed that most of the

>>>>>problems

>>>>>are due to load on the database server.

>>>>>

>>>>>The first thing I started working on is the code that generates all

>

> the

>

>>>>>comments. It turns out that for every comment at least two queries are

>>>>>executed. For a story with more than 700 comments this would mean

>>>>>more than

>>>>>almost 1500 queries to generate the page.

>>>>>

>>>>>I've modified this code extensively and now we use one query to fetch

>>>>>all

>>>>>the user details of all the people involved in posting. One query is

>>>>>used to

>>>>>fetch all the comments that have no parent. One query to fetch all the

>>>>>comments to do have parents. And if needed, one query to fetch the

>>>>>parent.

>>>>>All this data is then turned into one big nested array, which is

>>>>>passed by

>>>>>reference to the functions that actually print the data. Depending on

>>>>>how

>>>>>many comments there are this could result in a speed improvement of

>>>>>about

>>>>>0% - 1000%. As you can imagine if you only have about 10 comments it

>>>>>would

>>>>>not mean much, with 500 comments it would reduce the amount of queries

>>>>>needed by about a 1000. It's a very big improvement.

>>>>>

>>>>>One other problem I've identified is table locking of the story

>>>>>table. The

>>>>>statistics are stored in the same table as the actual content of the

>>>>>story.

>>>>>So each time a story is displayed, it will use an UPDATE query and a

>>>>>SELECT

>>>>>query on the same table. With a lot of requests the table is

>

> constantly

>

>>>>>locked by the UPDATE queries and the SELECT queries are waiting. We've

>>>>>disabled the statistics for now, but we are investigating the

>>>>>possibility of

>>>>>moving the statistics to a separate table.

>>>>>

>>>>>Next is the database layer. The mysql_fetch_array() function has two

>>>>>arguments. The second determines what the function returns. Either an

>>>>>associative array, a numbered array or both. By default the function

>>>>>returns

>>>>>both. This is what Geeklog does. In most of the code only the

>>>>>associative

>>>>>array is used. Only in a couple of small instances the code requires

>

> an

>

>>>>>numbered array. What we have done is to instruct the

>

> mysql_fetch_array()

>

>>>>>function to return only an associative array by default. Only when

>>>>>the code

>>>>>requires a numbered array we request both. This should lower the

>>>>>amount of

>>>>>memory needed by Geeklog.

>>>>>

>>>>>The SEC_getUserGroups() function is also quite expensive. It is called

>>>>>throughout the generation a page and it does not cache the

>>>>>information. We'

>>>>>ve added a simple cache for the data that is fetched from the

>>>>>database which

>>>>>eliminates another 30 or so queries.

>>>>>

>>>>>Next is the index page. The COM_featuredCheck() function is executed

>>>>>every

>>>>>time the frontpage is requested. I've changed the loop that actually

>>>>>displays the stories on the frontpage and included a check to see if

>>>>>there

>>>>>is more than one featured story. If there is, the second story is not

>>>>>displayed as such and the featuredCheck() function is called. This

>

> again

>

>>>>>saves a couple of queries and the end result is the same.

>>>>>

>>>>>We are also using the mycal extension which I've almost completely

>>>>>rewritten. Mycal uses a query for every day that is displayed and

>>>>>after my

>>>>>modifications it only uses one query. A 27-34 reduction in queries.

>>>>>

>>>>>Now back to my first paragraph. I was pretty impressed with how easy

>>>>>it was

>>>>>to get used to the way everything works in Geeklog. It was pretty

>>>>>easy to

>>>>>understand and it looks like it was designed pretty well. But I was

>

> also

>

>>>>>horrified when I saw the enormous amount of queries that are used, but

>

> I

>

>>>>>guess Geeklog wasn't really designed with this kind of traffic and

>

> these

>

>>>>>enormous amounts of comments in mind.

>>>>>

>>>>>Most of the changes we've made are not yet running on the production

>>>>>server.

>>>>>Once we've properly tested everything and everything is stable, I'm

>>>>>willing

>>>>>to look at how we can give these changes back to Geeklog. As simple

>>>>>patch

>>>>>between the current version of Geeklog and Groklaw will be difficult,

>>>>>because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw

>>>>>specific modifications. If you are interested in these modifications,

>>>>>please

>>>>>let me know and we'll work something out.

>>>>>

>>>>>If you want to talk to me about this you can e-mail. In addition to

>

> this

>

>>>>>I'll try to visit #geeklog as often as I can.

>>>>>

>>>>>Niels Leenheer

>>>>>-- project manager phpAdsNew

>>>>>

>>>>>

>>>>>

>>>>

>>>>_______________________________________________

>>>>geeklog-devel mailing list

>>>>geeklog-devel at lists.geeklog.net

>>>>http://lists.geeklog.net/listinfo/geeklog-devel

>>>>

>>>

>>>_______________________________________________

>>>geeklog-devel mailing list

>>>geeklog-devel at lists.geeklog.net

>>>http://lists.geeklog.net/listinfo/geeklog-devel

>>

>>_______________________________________________

>>geeklog-devel mailing list

>>geeklog-devel at lists.geeklog.net

>>http://lists.geeklog.net/listinfo/geeklog-devel

>

>

> _______________________________________________

> geeklog-devel mailing list

> geeklog-devel at lists.geeklog.net

> http://lists.geeklog.net/listinfo/geeklog-devel



--__--__--

_______________________________________________
geeklog-devel mailing list
geeklog-devel at lists.geeklog.net
http://lists.geeklog.net/listinfo/geeklog-devel


End of geeklog-devel Digest



More information about the geeklog-devtalk mailing list