I've considered using threads in the past, but only because I wanted a unified cache, but my host doesn't allow memcached -- and I was bored.
I'm not sure I see the point otherwise, although it's pretty neat anyway. How substantial an overhead is multiple persistent processes over a single process with multiple threads, when all your persistent data is stored in a DB anyway?
As an aside, if you're that concerned about performance, Python isn't that great a choice. Ignoring the global interpreter lock, CPython is slow as molasses.