Saturday, December 13, 2008

Plone trunk versus kcachegrind

David Glick reminded me a while back of a visualization tool for profiling data called kcachegrind. After trying to optimize Plone trunk based on simple Python profile data for a while, a more advanced analyzing tool was desperately needed.

In order to convert the normal Python profiling output into a format readable by kcachegrind, you need to run a script called pyprof2calltree over it. The publically available version seems to be broken in some ways but David took on the challenge and sent me a fixed version.

After getting familiar with the tool and poking at Plone trunk for a bit, I was able to identify a couple of more places, which I hadn't found so far in my optimization attempts. So at the end of day two of the Plone Performance Sprint 2008 happening in Bristol right now, I got some notable results. Compared to the last results I posted three weeks ago, everything but edit pages have seen another 50% speed increase. Thank you David!



One of the changes was inside ResourceRegistries and can be readily applied in Plone 3.x as well. It saves about 10ms per request in a default site. All other changes still revolve around optimized actions handling. The amount of speed increase you can get by rather simple and straightforward changes still amazes me. Stay tuned for more.

Update:

This morning I found another problem with content type icon expressions being recompiled on every page load. I updated the graph with the new results: We got up from 24 requests/s to 27 requests/s ... I like it :)

Update 2:

I found a bit more places to optimize and took a look at Archetypes edit screens in particular. Updated graph is above, we are now officially three times faster than Plone 3.2 and have reached the magical 30 requests / sec number.

8 comments:

  1. Cool!

    By the way I'm not trying to be secretive; you can grab the patched pyprof2calltree.py from http://wglick.org/pyprof2calltree.py

    ReplyDelete
  2. Would you be willing to make your code for running the benchmarks public, so we can play along at home and see how different machines compare, etc? :)

    ReplyDelete
  3. This is a simple Plone coredev trunk checkout run with the chameleon.cfg. Benchmark is "ab -n 10 ..." against the various different targets. Nothing special at all.

    ReplyDelete
  4. Except I hadn't noticed chameleon.cfg yet, so that's special. :) thanks

    ReplyDelete
  5. Is there a chance of some of those speedups to make it into Plone 3.x? Maybe not the switch to Chameleon, but perhaps the other stuff.

    ReplyDelete
  6. @philikon: I mentioned the RR improvement to be easily backportable. All other changes actually require API changes, new features from CMF trunk or removal of code (Archetypes session support). Our current policy for 3.x seems to forbid the kind of changes that would be required.

    Another thing to note here, is that of course all these numbers are pretty much meaningless for any real world site deployment. What they suggest is that you can lower the site rendering time by about 20ms. If your site renders in 500ms, the effect is still there but doesn't look quite that impressive.

    ReplyDelete
  7. Just as a note, Mike (of Spitfire/YouTube) confirms that the new pyprof2calltree makes kcachegrind give him results that match his other tools, so that is probably something that should be contributed back to the project if we can.

    ReplyDelete
  8. To davisagli and limi:

    I have just integrated David's fix and released version 1.1.0 of pyprof2calltree:

    1.1.0 release at pypi

    diff with previous version at bitbucket.org

    However I still get >= 200% accumulated stats on the dummy xml parsing example of the README.txt file of the distrib. I don't really have the time to digg in further right now so please feel free to send more feedback and or patches :)

    Once I have enough validation of by users I will submit the script to the KCachegrind developers for upstream inclusion.

    ReplyDelete