[Setup] Re: Zope/Plone scalability
Martin Aspeli
optilude at gmx.net
Sun May 11 10:40:09 UTC 2008
This post is generously snipped to avoid more agitation.
> As I asked before, is there some, more recent document, which
> highlights how to structure Plone for a large scale deployment? If
> there is, I'd sure like to read it ...
If you have thousands of users or tens of gigabytes of data, then it
normally pays to get some expert help in to assure your architecture.
That goes for any platform, Plone included. The basics are fairly well
known: You have a ZEO server, you have multiple ZEO clients (at least
one per processor core), you have load balancer in front of those
clients (e.g. pound), and you have Varnish or Squid for caching, with
CacheFu correctly configured in Plone.
> Our main site keels over frequently when it was on our Zope/Zeo
> cluster. On its own box, it only restarts itself 1-5 times per day.
This is still unacceptable - I'd never run a site that behaved like
that. I don't think many people would. Most Plone sites don't do this,
so there must be something in your setup that could be improved or
fixed. I can't be more specific than that, though.
> What do you mean by "instance"? Install Zope (n front ends plus ZEO
> backend). Add Plone site. Repeat x 20.
>
> That's the setup I'm talking about. As far as I can tell, it's the
> most natural way to add different websites (e.g. Plone sites with
> different URLs). It also turns out not to be very performant. (Note:
> there's only one ZEO box, with one data.fs, which for us is <2GB)
I don't think that "a priori" having 20+ small Plone sites in one Zope
instance is a drag on performance, except that you're duplicating some
things (like the catalog) that may add a bit of overhead compared to
having one site that's 20 times as big. I suspect the performance and
stability issues you're seeing have more to do with what's going on
inside one or more of those sites.
This is where you need to learn how to debug things. Your logs will tell
you when something crashes. From your previous traceback, it looks like
something in RedirectionTool (which, by the way, is not a core part of
Plone). Have you tried to uninstall this tool temporarily (take a
backup!) to see whether the problem goes away? Have you tried to ask on
the mailing lists what the problem is? Have you tried to get someone
with Python skills to do some debugging on the line that's causing the
error? Have you tried to understand what triggers the error - is it
happening on any 404 page, for example? Or on all pages? Or on pages
when a redirect alias is being invoked?
> Install IIS/Apache. Add sites x 20, with host header redirection. No
> problems. Add dynamic elements (.NET, modPHP). Still no problems.
That's not a valid comparison. Add 20 advanced content management
systems written in .NET or modPHP, fill them with the same content, and
then come talk to me.
This kind of talk is fairly pointless, though. You're having a whinge.
If you want to have a whinge, go ahead, but then I'll stop wasting my
time trying to figure out what your problem is and give you advice. If
you want to get advice, then you're much more likely to get it if you
adjust your tone to seem less combative.
> (Again, thanks to Raphael Ritz for the ZEO Raid suggestion, which I'm
> trying out on a separate box, though the Subversion tags make my
> SysAdmins hesitant of its use on our production servers.)
From what I understand, ZEO Raid is not yet completely finished. I'd
speak to Christian Theune about it. I know he's very close to having it
finished, but is looking for sponsorship to get it over the final hurdle.
RelStorage will let you store things in Oracle or Postgres and thus use
their scalability features. It may be a more mature option. I know Jarn
are using it currently.
However, as you've been told repeatedly, it's very unlikely, based on
what you've told us, that your problems lie at the ZEO server, and thus
that ZEORaid or RelStorage would help. It's extremely likely that the
tracebacks you are seeing every second *are* the symptom of the problem,
and those are *not* caused by ZEO server issues. If you had ZEO server
issues, you'd been seeing different messages (related to the ability of
the ZEO client to talk to the ZEO server).
> And, there doesn't seem to be the experience with large deployments.
> Even if we were to add consulting services to our sites, we're not
> certain we'd get a viable deployment that could handle hundreds of
> campus department web sites and hundreds of thousands of pages.
Lots of people run sites that are much bigger than what you've described.
> If you, or Martin, or someone else, based on a stack trace we were
> getting every 200 milliseconds could say -- "Oh, that looks like
> this, you could probably do that" -- I'd have a case to consider if
> hiring to address that problem would be a worthwhile investment.
I did that when you first posted it. The problem is in RedirectionTool.
It tells you which line. Any capable developer will be able to at least
do some debugging starting there. See also my suggestions above.
Martin
--
Author of `Professional Plone Development`, a book for developers who
want to work with Plone. See http://martinaspeli.net/plone-book
More information about the Setup
mailing list