[geeklog-devel] Re: Input From PJ of Groklaw

Fri Aug 20 11:58:37 EDT 2004

> Rakaz (Niels) is another person than me.

Sorry about that.  Us techies get caught up in IRC/system names instead 
of real names it's not hard to do.  Apologies to you and Niels.

> The database is the most serious performance bottleneck. We have 7?
> webservers talking to a single database and Geeklog does a lot of
> queries per page. We have a few stories with over 1000 comments!

Yeah, this is clearly a problem.  We have recently implemented use of 
PHP sessions.  We didn't do much with it other than store important 
things like $_USER in it but now that we have that we need to 
investigate using the single query made to populate $_SESSION form the 
database in ways that will prevent the lots of little queries that have 
cluttered the code.  Given that we need to start reviewing the code again.

Also, have you considered adding another database server and load 
balancing them?  I think MySQL's replication has reached a point were 
you could do this reliably.  Obvious it makes things more complex from 
the administration side but it might be worth investigating.

>
> Another issue is that regular expressions can take huge amounts of CPU
> time when stories go large. PJ writes stories that don't allways fit in
> 64k, so I enlarged the bodytext field to mediumtext (16M). The regular
> expression match in COM_extractLinks caused a time-out in stories that
> had more that 50 links. (footnotes and back). BTW, we dropped the
> "what's related" box.

Hrm, good idea. Dirk, it might be worth upgrading that field.  Yeah, the 
regex's would be a killer on larger stories.  Thanks, we'll review it 
and see how we can fix this.

> I have been looking for a suitable platform, but couldn't find another
> one that
>  1. was open source
>  2. had a decent security record
>  3. could be installed easily
>  4. had a good management interface
> and I'ld hate to convert databases to a new environment.

Well, like I said, we want to be sure that Geeklog grows as the needs of 
the communities grow.  Groklaw is an outlier in terms of size but our 
goal is to support sites that become this successful.  I hope that by 
working together we can get to a point where Geeklog is sleeker, meaner 
and cleaner.  All this reminds me of where Mozilla is with Thunderbird 
and Firefox.  I think Geeklog is to a point where we need to bust out 
with code analyzing an realling streamlining things.  Any chance we 
might be able to get a copy of the Groklaw DB?  I know that is asking a 
lot but it would help us a lot.

--Tony