[geeklog-devtalk] geeklog-devel digest, Vol 1 #290 - 4 msgs

geeklog-devel-request at lists.geeklog.net geeklog-devel-request at lists.geeklog.net
Fri Mar 12 11:32:01 EST 2004


Send geeklog-devel mailing list submissions to
geeklog-devel at lists.geeklog.net

To subscribe or unsubscribe via the World Wide Web, visit
http://lists.geeklog.net/listinfo/geeklog-devel
or, via email, send a message with subject or body 'help' to
geeklog-devel-request at lists.geeklog.net

You can reach the person managing the list at
geeklog-devel-admin at lists.geeklog.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of geeklog-devel digest..."


Today's Topics:

1. [Fwd: Re: Geeklog optimalisations] (Tony Bibbs)
2. Re: [Fwd: Re: Geeklog optimalisations] (Vincent Furia)
3. Re: [Fwd: Re: Geeklog optimalisations] (Tony Bibbs)
4. Re: [Fwd: Re: Geeklog optimalisations] (Blaine Lang)

--__--__--

Message: 1
Date: Fri, 12 Mar 2004 08:40:30 -0600
From: Tony Bibbs <tony at tonybibbs.com>
To: Geeklog <geeklog-devel at lists.geeklog.net>
Subject: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Reply-To: geeklog-devel at lists.geeklog.net

This is an FYI. We'll be discussing this on the development lists over
the next few days (I hope). It's important we help Groklaw as best we
can as they are one of our bigger sites and by them pushing the limits
of Geeklog we can address their issues and make Geeklog a better product
at the same time.

--Tony

---------------------

Niels,

I think a bit of background is in order before you can understand how
Geeklog got where it is. First, nearly all the code you are referring
to is legacy code. It was there before I managed the project and it is
still there under Dirk's management. In it's infancy, Geeklog was only
servicing smaller sites so performance was never really an issue and,
frankly, I was a bit young and dumb when I first got started with
Geeklog so performance tuning PHP scripts wasn't even a consideration
and my focus was on the feature set.

Under Dirk's management, the feature set has continued to grow to the
point that we have a large userbase and what you are encountering with
Geeklog is only natural. Groklaw is clearly one of the biggest sites to
run Geeklog. I have posted questions to our mailing lists asking about
performance issues realted to bigger Geeklog sites getting no responses
back so your email was a pleasant surprise.

The long and the short of it is Geeklog has matured to a point where
bigger sites are using it and we pushing the performance limits it has.
Geeklog's database interaction has always been an issue for me and is
a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)
while the 1.3.x continues. You are right, we need to address the
performance issues and given the amount of work you have put into
troubleshooting Groklaw I think you can play a critical part in that.

What I would like to do is see us work closely with you to begin
addressing these issues. A starting point would be to have a place
where we can install a development version of Groklaw's database
somewhere where we can run tests. Dirk and I don't have access to a
database of that size and while we could fudge together some data using
a real world example would sure be nice. Once we have a test bed, I'd
be open to suggestions on how we might work on this to resolve your
immediate issues *and* begin addressing performance tuning as a whole.

#geeklog is where I dwell (though not always at the keyboard). If
possible I'd like to see us discuss this on geeklog-devtalk. Niels, if
you could join that list at http://lists.geeklog.net/listinfo we can
carry this on there. In the meantime if you can catch Dirk or myself in
IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT
-6) so I may not seem too responsive until I get back on Sunday.

Thanks for contacting us, I'm sure we can address these issues.

--Tony



Niels Leenheer wrote:

> Hi guys,

>

> First of all. What were you guys thinking? Sorry to be so rude, but I simply

> had to get that off my chest. I feel better now. I'm okay. Really.

>

> As some of you may be aware of Groklaw is using Geeklog. It has turned in to

> quite a busy website and stories with more than 700 comments are not out of

> the ordinary. In addition to this being slashdotted has become normal. This

> is where the problems started. The server can't handle much more. On busy

> days the website turns into a crawling slow pile of ..

>

> As a regular reader and volunteer of Groklaw I offered to take a look at the

> Geeklog source code and try to find some places that could benefit from

> optimalisation. After some testing I've noticed that most of the problems

> are due to load on the database server.

>

> The first thing I started working on is the code that generates all the

> comments. It turns out that for every comment at least two queries are

> executed. For a story with more than 700 comments this would mean more than

> almost 1500 queries to generate the page.

>

> I've modified this code extensively and now we use one query to fetch all

> the user details of all the people involved in posting. One query is used to

> fetch all the comments that have no parent. One query to fetch all the

> comments to do have parents. And if needed, one query to fetch the parent.

> All this data is then turned into one big nested array, which is passed by

> reference to the functions that actually print the data. Depending on how

> many comments there are this could result in a speed improvement of about

> 0% - 1000%. As you can imagine if you only have about 10 comments it would

> not mean much, with 500 comments it would reduce the amount of queries

> needed by about a 1000. It's a very big improvement.

>

> One other problem I've identified is table locking of the story table. The

> statistics are stored in the same table as the actual content of the story.

> So each time a story is displayed, it will use an UPDATE query and a SELECT

> query on the same table. With a lot of requests the table is constantly

> locked by the UPDATE queries and the SELECT queries are waiting. We've

> disabled the statistics for now, but we are investigating the possibility of

> moving the statistics to a separate table.

>

> Next is the database layer. The mysql_fetch_array() function has two

> arguments. The second determines what the function returns. Either an

> associative array, a numbered array or both. By default the function returns

> both. This is what Geeklog does. In most of the code only the associative

> array is used. Only in a couple of small instances the code requires an

> numbered array. What we have done is to instruct the mysql_fetch_array()

> function to return only an associative array by default. Only when the code

> requires a numbered array we request both. This should lower the amount of

> memory needed by Geeklog.

>

> The SEC_getUserGroups() function is also quite expensive. It is called

> throughout the generation a page and it does not cache the information. We'

> ve added a simple cache for the data that is fetched from the database which

> eliminates another 30 or so queries.

>

> Next is the index page. The COM_featuredCheck() function is executed every

> time the frontpage is requested. I've changed the loop that actually

> displays the stories on the frontpage and included a check to see if there

> is more than one featured story. If there is, the second story is not

> displayed as such and the featuredCheck() function is called. This again

> saves a couple of queries and the end result is the same.

>

> We are also using the mycal extension which I've almost completely

> rewritten. Mycal uses a query for every day that is displayed and after my

> modifications it only uses one query. A 27-34 reduction in queries.

>

> Now back to my first paragraph. I was pretty impressed with how easy it was

> to get used to the way everything works in Geeklog. It was pretty easy to

> understand and it looks like it was designed pretty well. But I was also

> horrified when I saw the enormous amount of queries that are used, but I

> guess Geeklog wasn't really designed with this kind of traffic and these

> enormous amounts of comments in mind.

>

> Most of the changes we've made are not yet running on the production server.

> Once we've properly tested everything and everything is stable, I'm willing

> to look at how we can give these changes back to Geeklog. As simple patch

> between the current version of Geeklog and Groklaw will be difficult,

> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw

> specific modifications. If you are interested in these modifications, please

> let me know and we'll work something out.

>

> If you want to talk to me about this you can e-mail. In addition to this

> I'll try to visit #geeklog as often as I can.

>

> Niels Leenheer

> -- project manager phpAdsNew

>

>

>



--__--__--

Message: 2
Date: Fri, 12 Mar 2004 10:36:57 -0500
From: Vincent Furia <vmf at abtech.org>
To: geeklog-devel at lists.geeklog.net
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Reply-To: geeklog-devel at lists.geeklog.net

I'm in! Though if we're going to get serious about upgrading geeklog
1.3.x to reduce queries and seriously improve performance for high
traffic sites it might be time to start thinking about branching 1.3.x
off and using the main branch to start implementing these performance
improvements (and perhaps calling it 1.4.x).

I actually don't think we're too far away from getting Geeklog to decent
speeds. As Niels pointed out (and as has been pointed out before I
think) we only really have one table locking issue. The database
queries are something that can be fixed with some intelligent caching
and better shaped queries.

If we're feeling really brave, swithing template systems to a
pre-compiled template system like Smarty wouldn't be a bad idea. I
think that will do ALOT to improve performance.

-Vinny

Tony Bibbs wrote:

> This is an FYI. We'll be discussing this on the development lists over

> the next few days (I hope). It's important we help Groklaw as best we

> can as they are one of our bigger sites and by them pushing the limits

> of Geeklog we can address their issues and make Geeklog a better product

> at the same time.

>

> --Tony

>

> ---------------------

>

> Niels,

>

> I think a bit of background is in order before you can understand how

> Geeklog got where it is. First, nearly all the code you are referring

> to is legacy code. It was there before I managed the project and it is

> still there under Dirk's management. In it's infancy, Geeklog was only

> servicing smaller sites so performance was never really an issue and,

> frankly, I was a bit young and dumb when I first got started with

> Geeklog so performance tuning PHP scripts wasn't even a consideration

> and my focus was on the feature set.

>

> Under Dirk's management, the feature set has continued to grow to the

> point that we have a large userbase and what you are encountering with

> Geeklog is only natural. Groklaw is clearly one of the biggest sites to

> run Geeklog. I have posted questions to our mailing lists asking about

> performance issues realted to bigger Geeklog sites getting no responses

> back so your email was a pleasant surprise.

>

> The long and the short of it is Geeklog has matured to a point where

> bigger sites are using it and we pushing the performance limits it has.

> Geeklog's database interaction has always been an issue for me and is

> a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)

> while the 1.3.x continues. You are right, we need to address the

> performance issues and given the amount of work you have put into

> troubleshooting Groklaw I think you can play a critical part in that.

>

> What I would like to do is see us work closely with you to begin

> addressing these issues. A starting point would be to have a place

> where we can install a development version of Groklaw's database

> somewhere where we can run tests. Dirk and I don't have access to a

> database of that size and while we could fudge together some data using

> a real world example would sure be nice. Once we have a test bed, I'd

> be open to suggestions on how we might work on this to resolve your

> immediate issues *and* begin addressing performance tuning as a whole.

>

> #geeklog is where I dwell (though not always at the keyboard). If

> possible I'd like to see us discuss this on geeklog-devtalk. Niels, if

> you could join that list at http://lists.geeklog.net/listinfo we can

> carry this on there. In the meantime if you can catch Dirk or myself in

> IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT

> -6) so I may not seem too responsive until I get back on Sunday.

>

> Thanks for contacting us, I'm sure we can address these issues.

>

> --Tony

>

>

>

> Niels Leenheer wrote:

>

>> Hi guys,

>>

>> First of all. What were you guys thinking? Sorry to be so rude, but I

>> simply

>> had to get that off my chest. I feel better now. I'm okay. Really.

>>

>> As some of you may be aware of Groklaw is using Geeklog. It has turned

>> in to

>> quite a busy website and stories with more than 700 comments are not

>> out of

>> the ordinary. In addition to this being slashdotted has become normal.

>> This

>> is where the problems started. The server can't handle much more. On busy

>> days the website turns into a crawling slow pile of ..

>>

>> As a regular reader and volunteer of Groklaw I offered to take a look

>> at the

>> Geeklog source code and try to find some places that could benefit from

>> optimalisation. After some testing I've noticed that most of the problems

>> are due to load on the database server.

>>

>> The first thing I started working on is the code that generates all the

>> comments. It turns out that for every comment at least two queries are

>> executed. For a story with more than 700 comments this would mean more

>> than

>> almost 1500 queries to generate the page.

>>

>> I've modified this code extensively and now we use one query to fetch all

>> the user details of all the people involved in posting. One query is

>> used to

>> fetch all the comments that have no parent. One query to fetch all the

>> comments to do have parents. And if needed, one query to fetch the

>> parent.

>> All this data is then turned into one big nested array, which is

>> passed by

>> reference to the functions that actually print the data. Depending on how

>> many comments there are this could result in a speed improvement of about

>> 0% - 1000%. As you can imagine if you only have about 10 comments it

>> would

>> not mean much, with 500 comments it would reduce the amount of queries

>> needed by about a 1000. It's a very big improvement.

>>

>> One other problem I've identified is table locking of the story table.

>> The

>> statistics are stored in the same table as the actual content of the

>> story.

>> So each time a story is displayed, it will use an UPDATE query and a

>> SELECT

>> query on the same table. With a lot of requests the table is constantly

>> locked by the UPDATE queries and the SELECT queries are waiting. We've

>> disabled the statistics for now, but we are investigating the

>> possibility of

>> moving the statistics to a separate table.

>>

>> Next is the database layer. The mysql_fetch_array() function has two

>> arguments. The second determines what the function returns. Either an

>> associative array, a numbered array or both. By default the function

>> returns

>> both. This is what Geeklog does. In most of the code only the associative

>> array is used. Only in a couple of small instances the code requires an

>> numbered array. What we have done is to instruct the mysql_fetch_array()

>> function to return only an associative array by default. Only when the

>> code

>> requires a numbered array we request both. This should lower the

>> amount of

>> memory needed by Geeklog.

>>

>> The SEC_getUserGroups() function is also quite expensive. It is called

>> throughout the generation a page and it does not cache the

>> information. We'

>> ve added a simple cache for the data that is fetched from the database

>> which

>> eliminates another 30 or so queries.

>>

>> Next is the index page. The COM_featuredCheck() function is executed

>> every

>> time the frontpage is requested. I've changed the loop that actually

>> displays the stories on the frontpage and included a check to see if

>> there

>> is more than one featured story. If there is, the second story is not

>> displayed as such and the featuredCheck() function is called. This again

>> saves a couple of queries and the end result is the same.

>>

>> We are also using the mycal extension which I've almost completely

>> rewritten. Mycal uses a query for every day that is displayed and

>> after my

>> modifications it only uses one query. A 27-34 reduction in queries.

>>

>> Now back to my first paragraph. I was pretty impressed with how easy

>> it was

>> to get used to the way everything works in Geeklog. It was pretty easy to

>> understand and it looks like it was designed pretty well. But I was also

>> horrified when I saw the enormous amount of queries that are used, but I

>> guess Geeklog wasn't really designed with this kind of traffic and these

>> enormous amounts of comments in mind.

>>

>> Most of the changes we've made are not yet running on the production

>> server.

>> Once we've properly tested everything and everything is stable, I'm

>> willing

>> to look at how we can give these changes back to Geeklog. As simple patch

>> between the current version of Geeklog and Groklaw will be difficult,

>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw

>> specific modifications. If you are interested in these modifications,

>> please

>> let me know and we'll work something out.

>>

>> If you want to talk to me about this you can e-mail. In addition to this

>> I'll try to visit #geeklog as often as I can.

>>

>> Niels Leenheer

>> -- project manager phpAdsNew

>>

>>

>>

>

> _______________________________________________

> geeklog-devel mailing list

> geeklog-devel at lists.geeklog.net

> http://lists.geeklog.net/listinfo/geeklog-devel

>


--__--__--

Message: 3
Date: Fri, 12 Mar 2004 10:17:38 -0600
From: Tony Bibbs <tony at tonybibbs.com>
To: geeklog-devel at lists.geeklog.net
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Reply-To: geeklog-devel at lists.geeklog.net

Yeah, I think we should start only with DB optimizations (SQL tuning,
reducing queries, etc). I also think switching to InnoDB should be
seriously considered. Doing so is something we can do relatively
painlessly via upgrade scripts. I say start small, go after the bigger
problems. Starting with comments would be good, however, I think
someone should put together a plan on how to make sure during our
performance analysis we cover all of the system .

I still say leave the template stuff for 2.0...I don't think filesystem
access is causing that much a of a problem. And yes, I need to get
serious about 2.0...latley I have been telling myself to shit or get off
the pot...

--Tony

Vincent Furia wrote:

> I'm in! Though if we're going to get serious about upgrading geeklog

> 1.3.x to reduce queries and seriously improve performance for high

> traffic sites it might be time to start thinking about branching 1.3.x

> off and using the main branch to start implementing these performance

> improvements (and perhaps calling it 1.4.x).

>

> I actually don't think we're too far away from getting Geeklog to decent

> speeds. As Niels pointed out (and as has been pointed out before I

> think) we only really have one table locking issue. The database

> queries are something that can be fixed with some intelligent caching

> and better shaped queries.

>

> If we're feeling really brave, swithing template systems to a

> pre-compiled template system like Smarty wouldn't be a bad idea. I

> think that will do ALOT to improve performance.

>

> -Vinny

>

> Tony Bibbs wrote:

>

>> This is an FYI. We'll be discussing this on the development lists

>> over the next few days (I hope). It's important we help Groklaw as

>> best we can as they are one of our bigger sites and by them pushing

>> the limits of Geeklog we can address their issues and make Geeklog a

>> better product at the same time.

>>

>> --Tony

>>

>> ---------------------

>>

>> Niels,

>>

>> I think a bit of background is in order before you can understand how

>> Geeklog got where it is. First, nearly all the code you are referring

>> to is legacy code. It was there before I managed the project and it is

>> still there under Dirk's management. In it's infancy, Geeklog was only

>> servicing smaller sites so performance was never really an issue and,

>> frankly, I was a bit young and dumb when I first got started with

>> Geeklog so performance tuning PHP scripts wasn't even a consideration

>> and my focus was on the feature set.

>>

>> Under Dirk's management, the feature set has continued to grow to the

>> point that we have a large userbase and what you are encountering with

>> Geeklog is only natural. Groklaw is clearly one of the biggest sites to

>> run Geeklog. I have posted questions to our mailing lists asking about

>> performance issues realted to bigger Geeklog sites getting no responses

>> back so your email was a pleasant surprise.

>>

>> The long and the short of it is Geeklog has matured to a point where

>> bigger sites are using it and we pushing the performance limits it has.

>> Geeklog's database interaction has always been an issue for me and is

>> a large part why I have chosen to get a new codebase up (i.e. Geeklog 2)

>> while the 1.3.x continues. You are right, we need to address the

>> performance issues and given the amount of work you have put into

>> troubleshooting Groklaw I think you can play a critical part in that.

>>

>> What I would like to do is see us work closely with you to begin

>> addressing these issues. A starting point would be to have a place

>> where we can install a development version of Groklaw's database

>> somewhere where we can run tests. Dirk and I don't have access to a

>> database of that size and while we could fudge together some data using

>> a real world example would sure be nice. Once we have a test bed, I'd

>> be open to suggestions on how we might work on this to resolve your

>> immediate issues *and* begin addressing performance tuning as a whole.

>>

>> #geeklog is where I dwell (though not always at the keyboard). If

>> possible I'd like to see us discuss this on geeklog-devtalk. Niels, if

>> you could join that list at http://lists.geeklog.net/listinfo we can

>> carry this on there. In the meantime if you can catch Dirk or myself in

>> IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT

>> -6) so I may not seem too responsive until I get back on Sunday.

>>

>> Thanks for contacting us, I'm sure we can address these issues.

>>

>> --Tony

>>

>>

>>

>> Niels Leenheer wrote:

>>

>>> Hi guys,

>>>

>>> First of all. What were you guys thinking? Sorry to be so rude, but I

>>> simply

>>> had to get that off my chest. I feel better now. I'm okay. Really.

>>>

>>> As some of you may be aware of Groklaw is using Geeklog. It has

>>> turned in to

>>> quite a busy website and stories with more than 700 comments are not

>>> out of

>>> the ordinary. In addition to this being slashdotted has become

>>> normal. This

>>> is where the problems started. The server can't handle much more. On

>>> busy

>>> days the website turns into a crawling slow pile of ..

>>>

>>> As a regular reader and volunteer of Groklaw I offered to take a look

>>> at the

>>> Geeklog source code and try to find some places that could benefit from

>>> optimalisation. After some testing I've noticed that most of the

>>> problems

>>> are due to load on the database server.

>>>

>>> The first thing I started working on is the code that generates all the

>>> comments. It turns out that for every comment at least two queries are

>>> executed. For a story with more than 700 comments this would mean

>>> more than

>>> almost 1500 queries to generate the page.

>>>

>>> I've modified this code extensively and now we use one query to fetch

>>> all

>>> the user details of all the people involved in posting. One query is

>>> used to

>>> fetch all the comments that have no parent. One query to fetch all the

>>> comments to do have parents. And if needed, one query to fetch the

>>> parent.

>>> All this data is then turned into one big nested array, which is

>>> passed by

>>> reference to the functions that actually print the data. Depending on

>>> how

>>> many comments there are this could result in a speed improvement of

>>> about

>>> 0% - 1000%. As you can imagine if you only have about 10 comments it

>>> would

>>> not mean much, with 500 comments it would reduce the amount of queries

>>> needed by about a 1000. It's a very big improvement.

>>>

>>> One other problem I've identified is table locking of the story

>>> table. The

>>> statistics are stored in the same table as the actual content of the

>>> story.

>>> So each time a story is displayed, it will use an UPDATE query and a

>>> SELECT

>>> query on the same table. With a lot of requests the table is constantly

>>> locked by the UPDATE queries and the SELECT queries are waiting. We've

>>> disabled the statistics for now, but we are investigating the

>>> possibility of

>>> moving the statistics to a separate table.

>>>

>>> Next is the database layer. The mysql_fetch_array() function has two

>>> arguments. The second determines what the function returns. Either an

>>> associative array, a numbered array or both. By default the function

>>> returns

>>> both. This is what Geeklog does. In most of the code only the

>>> associative

>>> array is used. Only in a couple of small instances the code requires an

>>> numbered array. What we have done is to instruct the mysql_fetch_array()

>>> function to return only an associative array by default. Only when

>>> the code

>>> requires a numbered array we request both. This should lower the

>>> amount of

>>> memory needed by Geeklog.

>>>

>>> The SEC_getUserGroups() function is also quite expensive. It is called

>>> throughout the generation a page and it does not cache the

>>> information. We'

>>> ve added a simple cache for the data that is fetched from the

>>> database which

>>> eliminates another 30 or so queries.

>>>

>>> Next is the index page. The COM_featuredCheck() function is executed

>>> every

>>> time the frontpage is requested. I've changed the loop that actually

>>> displays the stories on the frontpage and included a check to see if

>>> there

>>> is more than one featured story. If there is, the second story is not

>>> displayed as such and the featuredCheck() function is called. This again

>>> saves a couple of queries and the end result is the same.

>>>

>>> We are also using the mycal extension which I've almost completely

>>> rewritten. Mycal uses a query for every day that is displayed and

>>> after my

>>> modifications it only uses one query. A 27-34 reduction in queries.

>>>

>>> Now back to my first paragraph. I was pretty impressed with how easy

>>> it was

>>> to get used to the way everything works in Geeklog. It was pretty

>>> easy to

>>> understand and it looks like it was designed pretty well. But I was also

>>> horrified when I saw the enormous amount of queries that are used, but I

>>> guess Geeklog wasn't really designed with this kind of traffic and these

>>> enormous amounts of comments in mind.

>>>

>>> Most of the changes we've made are not yet running on the production

>>> server.

>>> Once we've properly tested everything and everything is stable, I'm

>>> willing

>>> to look at how we can give these changes back to Geeklog. As simple

>>> patch

>>> between the current version of Geeklog and Groklaw will be difficult,

>>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw

>>> specific modifications. If you are interested in these modifications,

>>> please

>>> let me know and we'll work something out.

>>>

>>> If you want to talk to me about this you can e-mail. In addition to this

>>> I'll try to visit #geeklog as often as I can.

>>>

>>> Niels Leenheer

>>> -- project manager phpAdsNew

>>>

>>>

>>>

>>

>> _______________________________________________

>> geeklog-devel mailing list

>> geeklog-devel at lists.geeklog.net

>> http://lists.geeklog.net/listinfo/geeklog-devel

>>

> _______________________________________________

> geeklog-devel mailing list

> geeklog-devel at lists.geeklog.net

> http://lists.geeklog.net/listinfo/geeklog-devel


--__--__--

Message: 4
From: "Blaine Lang" <geeklog at langfamily.ca>
To: <geeklog-devel at lists.geeklog.net>
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]
Date: Fri, 12 Mar 2004 11:30:40 -0500
Reply-To: geeklog-devel at lists.geeklog.net

I have to agree with that Geeklog as it exists is fine for 98% of our users
but addressing some of these issues can only make it better.

We just need to be careful as a lot of the bugs that we have been addressing
are permission based ones and we don't want to introduce more issues that
will effect the 98% of our users. I know it is well known, that adding new
code and features often introduces new bugs so need to keep that in mind
when assessing the effort for such a project.

What about using PHP Sessions to cache some of the common information and
eliminate a lot of the SQL queries.
Possible areas would be:
- User access rights and group membership
- $_USER array

It would be nice for Plugins to have access to SESSIONS as well.

----- Original Message -----
From: "Tony Bibbs" <tony at tonybibbs.com>
To: <geeklog-devel at lists.geeklog.net>
Sent: Friday, March 12, 2004 11:17 AM
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]



> Yeah, I think we should start only with DB optimizations (SQL tuning,

> reducing queries, etc). I also think switching to InnoDB should be

> seriously considered. Doing so is something we can do relatively

> painlessly via upgrade scripts. I say start small, go after the bigger

> problems. Starting with comments would be good, however, I think

> someone should put together a plan on how to make sure during our

> performance analysis we cover all of the system .

>

> I still say leave the template stuff for 2.0...I don't think filesystem

> access is causing that much a of a problem. And yes, I need to get

> serious about 2.0...latley I have been telling myself to shit or get off

> the pot...

>

> --Tony

>

> Vincent Furia wrote:

> > I'm in! Though if we're going to get serious about upgrading geeklog

> > 1.3.x to reduce queries and seriously improve performance for high

> > traffic sites it might be time to start thinking about branching 1.3.x

> > off and using the main branch to start implementing these performance

> > improvements (and perhaps calling it 1.4.x).

> >

> > I actually don't think we're too far away from getting Geeklog to decent

> > speeds. As Niels pointed out (and as has been pointed out before I

> > think) we only really have one table locking issue. The database

> > queries are something that can be fixed with some intelligent caching

> > and better shaped queries.

> >

> > If we're feeling really brave, swithing template systems to a

> > pre-compiled template system like Smarty wouldn't be a bad idea. I

> > think that will do ALOT to improve performance.

> >

> > -Vinny

> >

> > Tony Bibbs wrote:

> >

> >> This is an FYI. We'll be discussing this on the development lists

> >> over the next few days (I hope). It's important we help Groklaw as

> >> best we can as they are one of our bigger sites and by them pushing

> >> the limits of Geeklog we can address their issues and make Geeklog a

> >> better product at the same time.

> >>

> >> --Tony

> >>

> >> ---------------------

> >>

> >> Niels,

> >>

> >> I think a bit of background is in order before you can understand how

> >> Geeklog got where it is. First, nearly all the code you are referring

> >> to is legacy code. It was there before I managed the project and it is

> >> still there under Dirk's management. In it's infancy, Geeklog was only

> >> servicing smaller sites so performance was never really an issue and,

> >> frankly, I was a bit young and dumb when I first got started with

> >> Geeklog so performance tuning PHP scripts wasn't even a consideration

> >> and my focus was on the feature set.

> >>

> >> Under Dirk's management, the feature set has continued to grow to the

> >> point that we have a large userbase and what you are encountering with

> >> Geeklog is only natural. Groklaw is clearly one of the biggest sites

to

> >> run Geeklog. I have posted questions to our mailing lists asking about

> >> performance issues realted to bigger Geeklog sites getting no responses

> >> back so your email was a pleasant surprise.

> >>

> >> The long and the short of it is Geeklog has matured to a point where

> >> bigger sites are using it and we pushing the performance limits it has.

> >> Geeklog's database interaction has always been an issue for me and is

> >> a large part why I have chosen to get a new codebase up (i.e. Geeklog

2)

> >> while the 1.3.x continues. You are right, we need to address the

> >> performance issues and given the amount of work you have put into

> >> troubleshooting Groklaw I think you can play a critical part in that.

> >>

> >> What I would like to do is see us work closely with you to begin

> >> addressing these issues. A starting point would be to have a place

> >> where we can install a development version of Groklaw's database

> >> somewhere where we can run tests. Dirk and I don't have access to a

> >> database of that size and while we could fudge together some data using

> >> a real world example would sure be nice. Once we have a test bed, I'd

> >> be open to suggestions on how we might work on this to resolve your

> >> immediate issues *and* begin addressing performance tuning as a whole.

> >>

> >> #geeklog is where I dwell (though not always at the keyboard). If

> >> possible I'd like to see us discuss this on geeklog-devtalk. Niels, if

> >> you could join that list at http://lists.geeklog.net/listinfo we can

> >> carry this on there. In the meantime if you can catch Dirk or myself

in

> >> IRC feel free to do so. FYI I'm out of town this weekend (FWIW I'm GMT

> >> -6) so I may not seem too responsive until I get back on Sunday.

> >>

> >> Thanks for contacting us, I'm sure we can address these issues.

> >>

> >> --Tony

> >>

> >>

> >>

> >> Niels Leenheer wrote:

> >>

> >>> Hi guys,

> >>>

> >>> First of all. What were you guys thinking? Sorry to be so rude, but I

> >>> simply

> >>> had to get that off my chest. I feel better now. I'm okay. Really.

> >>>

> >>> As some of you may be aware of Groklaw is using Geeklog. It has

> >>> turned in to

> >>> quite a busy website and stories with more than 700 comments are not

> >>> out of

> >>> the ordinary. In addition to this being slashdotted has become

> >>> normal. This

> >>> is where the problems started. The server can't handle much more. On

> >>> busy

> >>> days the website turns into a crawling slow pile of ..

> >>>

> >>> As a regular reader and volunteer of Groklaw I offered to take a look

> >>> at the

> >>> Geeklog source code and try to find some places that could benefit

from

> >>> optimalisation. After some testing I've noticed that most of the

> >>> problems

> >>> are due to load on the database server.

> >>>

> >>> The first thing I started working on is the code that generates all

the

> >>> comments. It turns out that for every comment at least two queries are

> >>> executed. For a story with more than 700 comments this would mean

> >>> more than

> >>> almost 1500 queries to generate the page.

> >>>

> >>> I've modified this code extensively and now we use one query to fetch

> >>> all

> >>> the user details of all the people involved in posting. One query is

> >>> used to

> >>> fetch all the comments that have no parent. One query to fetch all the

> >>> comments to do have parents. And if needed, one query to fetch the

> >>> parent.

> >>> All this data is then turned into one big nested array, which is

> >>> passed by

> >>> reference to the functions that actually print the data. Depending on

> >>> how

> >>> many comments there are this could result in a speed improvement of

> >>> about

> >>> 0% - 1000%. As you can imagine if you only have about 10 comments it

> >>> would

> >>> not mean much, with 500 comments it would reduce the amount of queries

> >>> needed by about a 1000. It's a very big improvement.

> >>>

> >>> One other problem I've identified is table locking of the story

> >>> table. The

> >>> statistics are stored in the same table as the actual content of the

> >>> story.

> >>> So each time a story is displayed, it will use an UPDATE query and a

> >>> SELECT

> >>> query on the same table. With a lot of requests the table is

constantly

> >>> locked by the UPDATE queries and the SELECT queries are waiting. We've

> >>> disabled the statistics for now, but we are investigating the

> >>> possibility of

> >>> moving the statistics to a separate table.

> >>>

> >>> Next is the database layer. The mysql_fetch_array() function has two

> >>> arguments. The second determines what the function returns. Either an

> >>> associative array, a numbered array or both. By default the function

> >>> returns

> >>> both. This is what Geeklog does. In most of the code only the

> >>> associative

> >>> array is used. Only in a couple of small instances the code requires

an

> >>> numbered array. What we have done is to instruct the

mysql_fetch_array()

> >>> function to return only an associative array by default. Only when

> >>> the code

> >>> requires a numbered array we request both. This should lower the

> >>> amount of

> >>> memory needed by Geeklog.

> >>>

> >>> The SEC_getUserGroups() function is also quite expensive. It is called

> >>> throughout the generation a page and it does not cache the

> >>> information. We'

> >>> ve added a simple cache for the data that is fetched from the

> >>> database which

> >>> eliminates another 30 or so queries.

> >>>

> >>> Next is the index page. The COM_featuredCheck() function is executed

> >>> every

> >>> time the frontpage is requested. I've changed the loop that actually

> >>> displays the stories on the frontpage and included a check to see if

> >>> there

> >>> is more than one featured story. If there is, the second story is not

> >>> displayed as such and the featuredCheck() function is called. This

again

> >>> saves a couple of queries and the end result is the same.

> >>>

> >>> We are also using the mycal extension which I've almost completely

> >>> rewritten. Mycal uses a query for every day that is displayed and

> >>> after my

> >>> modifications it only uses one query. A 27-34 reduction in queries.

> >>>

> >>> Now back to my first paragraph. I was pretty impressed with how easy

> >>> it was

> >>> to get used to the way everything works in Geeklog. It was pretty

> >>> easy to

> >>> understand and it looks like it was designed pretty well. But I was

also

> >>> horrified when I saw the enormous amount of queries that are used, but

I

> >>> guess Geeklog wasn't really designed with this kind of traffic and

these

> >>> enormous amounts of comments in mind.

> >>>

> >>> Most of the changes we've made are not yet running on the production

> >>> server.

> >>> Once we've properly tested everything and everything is stable, I'm

> >>> willing

> >>> to look at how we can give these changes back to Geeklog. As simple

> >>> patch

> >>> between the current version of Geeklog and Groklaw will be difficult,

> >>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw

> >>> specific modifications. If you are interested in these modifications,

> >>> please

> >>> let me know and we'll work something out.

> >>>

> >>> If you want to talk to me about this you can e-mail. In addition to

this

> >>> I'll try to visit #geeklog as often as I can.

> >>>

> >>> Niels Leenheer

> >>> -- project manager phpAdsNew

> >>>

> >>>

> >>>

> >>

> >> _______________________________________________

> >> geeklog-devel mailing list

> >> geeklog-devel at lists.geeklog.net

> >> http://lists.geeklog.net/listinfo/geeklog-devel

> >>

> > _______________________________________________

> > geeklog-devel mailing list

> > geeklog-devel at lists.geeklog.net

> > http://lists.geeklog.net/listinfo/geeklog-devel

> _______________________________________________

> geeklog-devel mailing list

> geeklog-devel at lists.geeklog.net

> http://lists.geeklog.net/listinfo/geeklog-devel




--__--__--

_______________________________________________
geeklog-devel mailing list
geeklog-devel at lists.geeklog.net
http://lists.geeklog.net/listinfo/geeklog-devel


End of geeklog-devel Digest



More information about the geeklog-devtalk mailing list