In order to convert the normal Python profiling output into a format readable by kcachegrind, you need to run a script called pyprof2calltree over it. The publically available version seems to be broken in some ways but David took on the challenge and sent me a fixed version.
After getting familiar with the tool and poking at Plone trunk for a bit, I was able to identify a couple of more places, which I hadn't found so far in my optimization attempts. So at the end of day two of the Plone Performance Sprint 2008 happening in Bristol right now, I got some notable results. Compared to the last results I posted three weeks ago, everything but edit pages have seen another 50% speed increase. Thank you David!

One of the changes was inside ResourceRegistries and can be readily applied in Plone 3.x as well. It saves about 10ms per request in a default site. All other changes still revolve around optimized actions handling. The amount of speed increase you can get by rather simple and straightforward changes still amazes me. Stay tuned for more.
Update:
This morning I found another problem with content type icon expressions being recompiled on every page load. I updated the graph with the new results: We got up from 24 requests/s to 27 requests/s ... I like it :)
Update 2:
I found a bit more places to optimize and took a look at Archetypes edit screens in particular. Updated graph is above, we are now officially three times faster than Plone 3.2 and have reached the magical 30 requests / sec number.
