[geeklog-devel] Re: Input From PJ of Groklaw

Peter Roozemaal mathfox at xs4all.nl
Fri Aug 20 12:29:36 EDT 2004


Tony Bibbs wrote:

>> The database is the most serious performance bottleneck. We have 7?
>> webservers talking to a single database and Geeklog does a lot of
>> queries per page. We have a few stories with over 1000 comments!
> 
> Yeah, this is clearly a problem.  We have recently implemented use of 
> PHP sessions.  We didn't do much with it other than store important 
> things like $_USER in it but now that we have that we need to 
> investigate using the single query made to populate $_SESSION form the 
> database in ways that will prevent the lots of little queries that have 
> cluttered the code.  Given that we need to start reviewing the code again.

PHP session variables create a load on the central file server. We can
not store them locally on the webserver as the loadbalancer will move
sessions from machine to machine. (rethinks) It might be possible if the
session variable only serves as cache for database data.

> Also, have you considered adding another database server and load 
> balancing them?  I think MySQL's replication has reached a point were 
> you could do this reliably.  Obvious it makes things more complex from 
> the administration side but it might be worth investigating.

We need a bit of help from your side in separating read-only database
operations from operations that modify the database. The config files
would need two database names, etc.

>> Another issue is that regular expressions can take huge amounts of CPU
>> time when stories go large. PJ writes stories that don't allways fit in
>> 64k, so I enlarged the bodytext field to mediumtext (16M). The regular
>> expression match in COM_extractLinks caused a time-out in stories that
>> had more that 50 links. (footnotes and back). BTW, we dropped the
>> "what's related" box.
> 
> Hrm, good idea. Dirk, it might be worth upgrading that field.  Yeah, the 
> regex's would be a killer on larger stories.  Thanks, we'll review it 
> and see how we can fix this.
> 
>> I have been looking for a suitable platform, but couldn't find another
>> one that
>>  1. was open source
>>  2. had a decent security record
>>  3. could be installed easily
>>  4. had a good management interface
>> and I'ld hate to convert databases to a new environment.
> 
> Well, like I said, we want to be sure that Geeklog grows as the needs of 
> the communities grow.  Groklaw is an outlier in terms of size but our 
> goal is to support sites that become this successful.  I hope that by 
> working together we can get to a point where Geeklog is sleeker, meaner 
> and cleaner.  All this reminds me of where Mozilla is with Thunderbird 
> and Firefox.  I think Geeklog is to a point where we need to bust out 
> with code analyzing an realling streamlining things.  Any chance we 
> might be able to get a copy of the Groklaw DB?  I know that is asking a 
> lot but it would help us a lot.

I can give you a sanitized version of the database if there is a place
where I can upload a 45 Mbyte of compressed data. I need a few hours to
clean it from personal data.

Greetings,
	Peter.





More information about the geeklog-devel mailing list