[geeklog-devel] [Fwd: Re: Geeklog optimalisations]

Fri Mar 12 11:30:40 EST 2004

I have to agree with that Geeklog as it exists is fine for 98% of our users
but addressing some of these issues can only make it better.

We just need to be careful as a lot of the bugs that we have been addressing
are permission based ones and we don't want to introduce more issues that
will effect the 98% of our users. I know it is well known, that adding new
code and features often introduces new bugs so need to keep that in mind
when assessing the effort for such a project.

What about using PHP Sessions to cache some of the common information and
eliminate a lot of the SQL queries.
Possible areas would be:
  - User access rights and group membership
  - $_USER array

It would be nice for Plugins to have access to SESSIONS as well.

----- Original Message ----- 
From: "Tony Bibbs" <tony at tonybibbs.com>
To: <geeklog-devel at lists.geeklog.net>
Sent: Friday, March 12, 2004 11:17 AM
Subject: Re: [geeklog-devel] [Fwd: Re: Geeklog optimalisations]

> Yeah, I think we should start only with DB optimizations (SQL tuning,
> reducing queries, etc).  I also think switching to InnoDB should be
> seriously considered. Doing so is something we can do relatively
> painlessly via upgrade scripts.  I say start small, go after the bigger
> problems.  Starting with comments would be good, however, I think
> someone should put together a plan on how to make sure during our
> performance analysis we cover all of the system .
>
> I still say leave the template stuff for 2.0...I don't think filesystem
> access is causing that much a of a problem.  And yes, I need to get
> serious about 2.0...latley I have been telling myself to shit or get off
>   the pot...
>
> --Tony
>
> Vincent Furia wrote:
> > I'm in!  Though if we're going to get serious about upgrading geeklog
> > 1.3.x to reduce queries and seriously improve performance for high
> > traffic sites it might be time to start thinking about branching 1.3.x
> > off and using the main branch to start implementing these performance
> > improvements (and perhaps calling it 1.4.x).
> >
> > I actually don't think we're too far away from getting Geeklog to decent
> > speeds.  As Niels pointed out (and as has been pointed out before I
> > think) we only really have one table locking issue.  The database
> > queries are something that can be fixed with some intelligent caching
> > and better shaped queries.
> >
> > If we're feeling really brave, swithing template systems to a
> > pre-compiled template system like Smarty wouldn't be a bad idea.  I
> > think that will do ALOT to improve performance.
> >
> > -Vinny
> >
> > Tony Bibbs wrote:
> >
> >> This is an FYI.  We'll be discussing this on the development lists
> >> over the next few days (I hope).  It's important we help Groklaw as
> >> best we can as they are one of our bigger sites and by them pushing
> >> the limits of Geeklog we can address their issues and make Geeklog a
> >> better product at the same time.
> >>
> >> --Tony
> >>
> >> ---------------------
> >>
> >> Niels,
> >>
> >> I think a bit of background is in order before you can understand how
> >> Geeklog got where it is.  First, nearly all the code you are referring
> >> to is legacy code.  It was there before I managed the project and it is
> >> still there under Dirk's management.  In it's infancy, Geeklog was only
> >> servicing smaller sites so performance was never really an issue and,
> >> frankly, I was a bit young and dumb when I first got started with
> >> Geeklog so performance tuning PHP scripts wasn't even a consideration
> >> and my focus was on the feature set.
> >>
> >> Under Dirk's management, the feature set has continued to grow to the
> >> point that we have a large userbase and what you are encountering with
> >> Geeklog is only natural.  Groklaw is clearly one of the biggest sites
to
> >> run Geeklog.  I have posted questions to our mailing lists asking about
> >> performance issues realted to bigger Geeklog sites getting no responses
> >> back so your email was a pleasant surprise.
> >>
> >> The long and the short of it is Geeklog has matured to a point where
> >> bigger sites are using it and we pushing the performance limits it has.
> >>  Geeklog's database interaction has always been an issue for me and is
> >> a large part why I have chosen to get a new codebase up (i.e. Geeklog
2)
> >> while the 1.3.x continues.  You are right, we need to address the
> >> performance issues and given the amount of work you have put into
> >> troubleshooting Groklaw I think you can play a critical part in that.
> >>
> >> What I would like to do is see us work closely with you to begin
> >> addressing these issues.  A starting point would be to have a place
> >> where we can install a development version of Groklaw's database
> >> somewhere where we can run tests.  Dirk and I don't have access to a
> >> database of that size and while we could fudge together some data using
> >> a real world example would sure be nice.  Once we have a test bed, I'd
> >> be open to suggestions on how we might work on this to resolve your
> >> immediate issues *and* begin addressing performance tuning as a whole.
> >>
> >> #geeklog is where I dwell (though not always at the keyboard).  If
> >> possible I'd like to see us discuss this on geeklog-devtalk.  Niels, if
> >> you could join that list at http://lists.geeklog.net/listinfo we can
> >> carry this on there.  In the meantime if you can catch Dirk or myself
in
> >> IRC feel free to do so.  FYI I'm out of town this weekend (FWIW I'm GMT
> >> -6) so I may not seem too responsive until I get back on Sunday.
> >>
> >> Thanks for contacting us, I'm sure we can address these issues.
> >>
> >> --Tony
> >>
> >>
> >>
> >> Niels Leenheer wrote:
> >>
> >>> Hi guys,
> >>>
> >>> First of all. What were you guys thinking? Sorry to be so rude, but I
> >>> simply
> >>> had to get that off my chest. I feel better now. I'm okay. Really.
> >>>
> >>> As some of you may be aware of Groklaw is using Geeklog. It has
> >>> turned in to
> >>> quite a busy website and stories with more than 700 comments are not
> >>> out of
> >>> the ordinary. In addition to this being slashdotted has become
> >>> normal. This
> >>> is where the problems started. The server can't handle much more. On
> >>> busy
> >>> days the website turns into a crawling slow pile of ..
> >>>
> >>> As a regular reader and volunteer of Groklaw I offered to take a look
> >>> at the
> >>> Geeklog source code and try to find some places that could benefit
from
> >>> optimalisation. After some testing I've noticed that most of the
> >>> problems
> >>> are due to load on the database server.
> >>>
> >>> The first thing I started working on is the code that generates all
the
> >>> comments. It turns out that for every comment at least two queries are
> >>> executed. For a story with more than 700 comments this would mean
> >>> more than
> >>> almost 1500 queries to generate the page.
> >>>
> >>> I've modified this code extensively and now we use one query to fetch
> >>> all
> >>> the user details of all the people involved in posting. One query is
> >>> used to
> >>> fetch all the comments that have no parent. One query to fetch all the
> >>> comments to do have parents. And if needed, one query to fetch the
> >>> parent.
> >>> All this data is then turned into one big nested array, which is
> >>> passed by
> >>> reference to the functions that actually print the data. Depending on
> >>> how
> >>> many comments there are this could result in a speed improvement of
> >>> about
> >>> 0% - 1000%. As you can imagine if you only have about 10 comments it
> >>> would
> >>> not mean much, with 500 comments it would reduce the amount of queries
> >>> needed by about a 1000. It's a very big improvement.
> >>>
> >>> One other problem I've identified is table locking of the story
> >>> table. The
> >>> statistics are stored in the same table as the actual content of the
> >>> story.
> >>> So each time a story is displayed, it will use an UPDATE query and a
> >>> SELECT
> >>> query on the same table. With a lot of requests the table is
constantly
> >>> locked by the UPDATE queries and the SELECT queries are waiting. We've
> >>> disabled the statistics for now, but we are investigating the
> >>> possibility of
> >>> moving the statistics to a separate table.
> >>>
> >>> Next is the database layer. The mysql_fetch_array() function has two
> >>> arguments. The second determines what the function returns. Either an
> >>> associative array, a numbered array or both. By default the function
> >>> returns
> >>> both. This is what Geeklog does. In most of the code only the
> >>> associative
> >>> array is used. Only in a couple of small instances the code requires
an
> >>> numbered array. What we have done is to instruct the
mysql_fetch_array()
> >>> function to return only an associative array by default. Only when
> >>> the code
> >>> requires a numbered array we request both. This should lower the
> >>> amount of
> >>> memory needed by Geeklog.
> >>>
> >>> The SEC_getUserGroups() function is also quite expensive. It is called
> >>> throughout the generation a page and it does not cache the
> >>> information. We'
> >>> ve added a simple cache for the data that is fetched from the
> >>> database which
> >>> eliminates another 30 or so queries.
> >>>
> >>> Next is the index page. The COM_featuredCheck() function is executed
> >>> every
> >>> time the frontpage is requested. I've changed the loop that actually
> >>> displays the stories on the frontpage and included a check to see if
> >>> there
> >>> is more than one featured story. If there is, the second story is not
> >>> displayed as such and the featuredCheck() function is called. This
again
> >>> saves a couple of queries and the end result is the same.
> >>>
> >>> We are also using the mycal extension which I've almost completely
> >>> rewritten. Mycal uses a query for every day that is displayed and
> >>> after my
> >>> modifications it only uses one query. A 27-34 reduction in queries.
> >>>
> >>> Now back to my first paragraph. I was pretty impressed with how easy
> >>> it was
> >>> to get used to the way everything works in Geeklog. It was pretty
> >>> easy to
> >>> understand and it looks like it was designed pretty well. But I was
also
> >>> horrified when I saw the enormous amount of queries that are used, but
I
> >>> guess Geeklog wasn't really designed with this kind of traffic and
these
> >>> enormous amounts of comments in mind.
> >>>
> >>> Most of the changes we've made are not yet running on the production
> >>> server.
> >>> Once we've properly tested everything and everything is stable, I'm
> >>> willing
> >>> to look at how we can give these changes back to Geeklog. As simple
> >>> patch
> >>> between the current version of Geeklog and Groklaw will be difficult,
> >>> because we are using 1.3.8-1sr4 and it also includes a lot of Groklaw
> >>> specific modifications. If you are interested in these modifications,
> >>> please
> >>> let me know and we'll work something out.
> >>>
> >>> If you want to talk to me about this you can e-mail. In addition to
this
> >>> I'll try to visit #geeklog as often as I can.
> >>>
> >>> Niels Leenheer
> >>> -- project manager phpAdsNew
> >>>
> >>>
> >>>
> >>
> >> _______________________________________________
> >> geeklog-devel mailing list
> >> geeklog-devel at lists.geeklog.net
> >> http://lists.geeklog.net/listinfo/geeklog-devel
> >>
> > _______________________________________________
> > geeklog-devel mailing list
> > geeklog-devel at lists.geeklog.net
> > http://lists.geeklog.net/listinfo/geeklog-devel
> _______________________________________________
> geeklog-devel mailing list
> geeklog-devel at lists.geeklog.net
> http://lists.geeklog.net/listinfo/geeklog-devel