Tuesday, October 16, 2007

Performance sprint warmup

While the Copenhagen Plone performance sprint is still two weeks away, I thought I spent some time on profiling during the Naples sprint at this years Plone conference as well.

I concentrated on profiling Zope startup time when Plone is installed in it. While this doesn't matter that much in production environments it is of quite some importance for us poor developers.

On my one year old MacBook Pro Zope would start in about 16 seconds at first. Using Marius Gedminas great profilehooks.py I identified four big time consumers.

1. Importing modules

Not much can be done about that, though some of the CMFPlone migrations seem to be a good candidate to be excluded from initial loading and being defered to a point when they are actually needed. A couple of hundred imports could be safed here I guess.

2. Compiling and registering translation files

In a Plone environment so far all about 620 po files where read and parsed two to three times. I needed to adjust the mo file compilation code in my python-gettext package a bit and hacked PlacelessTranslationService to use lazy message catalogs instead. Those won't parse all the files on Zope startup but instead once they are first accessed. A typical site will probably never use all the languages and parsing and keeping those in memory was a bad choice.

After this was done I was down at 11 seconds. About a 30% speed increase.

3. Reading and compiling page templates

It turned out that right now we have around 140 purely disk based page template files for various ZMI screens. Until now all of them were always read, parsed and compiled on Zope startup; most of them are never used. So I added some lazy loading code into Zope's PageTemplateFile, made it available as an option and enabled it from CMFPlone. As the CMFPlone code is read after some of the templates are already loaded, we can only skip about 110 of them in this way. I'll add the whole thing as a zope.conf option, which will be parsed early enough in the process to get rid of the remaining ones.

Measuring the startup time again, I'm now down to 8 seconds, another 30%.

4. Parsing ZCML files

Parsing hundreds of ZCML files is currently the remaining big block for startup time. AFAICS this uses expat right now. It might be interesting to experiment with different XML parsers like lxml to see if some performance gain can be achieved here. I'll leave the exercise to someone else (Florian had shown interest in it...).

Conclusion

By spending about half a day of work on the topic I could speed up Zope startup time by an overall 50%. Not too shabby I think :) Lets see if there are some weak spots while loading the Plone site TTW the first time as well...

The downside to this all is that it is only available on Plone and Zope trunk. Maybe we can backport some of these improvements to the Plone 3.x line.

7 comments:

  1. Speeding up ZCML would be greatly appreciated. I'm not so sure the problem is with the XML parser, but this can only be answered for sure by careful analysis.

    One low-hanging fruit is to go ahead and disable all the ZCML that you don't need using Jim's new package zc.configuration (http://pypi.python.org/pypi/zc.configuration). This was done for Grok at the Neanderthal sprint: http://svn.zope.org/grok/branches/neanderthal-startupspeed/ and produced a rough 10% speed increase by excluding a few trivial files.

    Nice catch with the PageTemplates, by the way. I wonder, could you forward-port this to zope.pagetemplate's PageTemplateFile, too?

    ReplyDelete
  2. @Philipp:
    I've already looked at zc.configuration, but Plone and Five don't load much unneeded zcml. The whole zope.app.* packages aren't loaded for example.

    I'll try to port the lazy stuff to zope.pagetemplate as well, ones I made it configurable through a zope.conf option.

    The lazy message catalogs and the automatic mo file compilation are on my list for zope.i18n as well...

    ReplyDelete
  3. You rock! Just imagine the amount of time plone developers will save in aggregate.

    ReplyDelete
  4. For those interested in it, I found that five:implements in combination with fat Zope2 objects takes quite a bit of time. Moving two of those from Five's configure.zcml to the real classes made those actions about 75 times faster...

    The lazy page template behavior is now the default in Zope2.

    One more bottleneck I found was hidden in MimetypesRegistry where a 750KiB big XML file with mimetype information was parsed on each startup. I changed the code to create and load a binary format now instead, which safes another whole second on Zope startup...

    Let's see if there's more :)

    ReplyDelete
  5. Great improvements!

    How much time is spent parsing ZCML in Zope 2?

    At a recent Zope 3 sprint Stephan Richter and I attempted to sidestep the ZCML parsing step altogether (using some nasty hacks to pickle the ZCML configuration actions). Our prototype showed a potential ~33% reduction in Zope 3 startup time (from 3 seconds down to 2 seconds on a pretty new Thinkpad).

    My interest in this approach waned (I see little difference between 2 and 3 seconds -- both are either "fast enough" or "too slow"), but Stephan Richter was very enthusiastic and wanted to pursue it. You might want to talk to him.

    I would also like to see the speedup from switching to a different XML parser. lxml and cElementTree come to mind (one has a reputation of being fast, the other comes with Python 2.5). While fast parsing is slower than no parsing at all, it would be much simpler to implement and maintain than ensuring all the ZCML configuration actions are safe to pickle.

    ReplyDelete
  6. @Marius: Interesting work!

    A current Plone 4.0 / Zope 2.11 instance needs 10 seconds to start up of which 2.5 seconds are spent parsing ZCML. This is the biggest block. Second is creating Archetypes schemas (lots of deepcopies of nested classes) and then already time spent on pure importing of Python files.

    I wonder if the time to execute the ZCML could be speed up by turning it into real Python bytecode instead of using the pickle protocol. We had some good results with this while working on a TAL to Python compiler (some prototype work based on Spitfire - http://code.google.com/p/spitfire/).

    ReplyDelete