Threaded Imageboard Software (23)

1 Name: #!/usr/bin/anonymous : 2008-07-25 21:40 ID:Yr4oROjY

I'd like to know if any of the current imageboard software packages available support threaded execution. I've just finished making the regeneration of the board front pages and thread res page regenerate asynchronously with shared object memory, and I'd like to see if anyone thinks there is already a software package out there which also supports this, before I start flaunting it around as a major feature.

If anyone would like to view the large diff which contains the threading code, you can view it here: http://code.google.com/p/pyib-standalone/source/diff?r=18&format=side&path=/trunk/post.py

2 Name: #!/usr/bin/anonymous : 2008-07-26 17:57 ID:Yr4oROjY

what the fuck

fucking multi threads holy fuck

3 Name: Eleo : 2008-07-26 20:50 ID:yeMRohP9

>>1
Can you explain why this is actually useful? Based on what little I know about threads this will allow concurrent processes to take place, but why is this desirable for an imageboard application? Of all the things I've thought of while both writing and using imageboards, "man I wish this were threaded" was never one of them.

4 Name: #!/usr/bin/anonymous : 2008-07-27 00:02 ID:jzU8kroz

>>1-2

samefag ID:Yr4oROjY

>>3

Is right. Multi-threading is an implementation detail and shouldn't be a "major feature" that more than 20 people in the world care about. That of doesn't mean it's not cool, especially from an academic perspective, but not for marketing purposes.

5 Name: #!/usr/bin/anonymous : 2008-07-27 13:30 ID:Yr4oROjY

>>3

>but why is this desirable for an imageboard application

If the site is small enough, it won't matter. However, if the site has a considerable amount of users which are making posts, the time the app spends on database reads followed by templating can add up, because they are procedural. With the threading, the only part of the process which isn't concurrent is the database reads, which is a limitation from the library. With simple locking I was able to get the app to sidestep every time it wanted to use the database.

>>4
Hm, I'll have to remember to use a proxy the next time I'm being a amer and bumping my own thread. You win this time, ID system.

6 Name: #!/usr/bin/anonymous : 2008-07-27 14:21 ID:Heaven

>>5
But the time doesn't ‘add up’ except in terms of system load, which you're creating more of. The web server is already handling each request concurrently, and spawning an extra thread just to spit out a template isn't going to improve that.

7 Name: #!/usr/bin/anonymous : 2008-07-27 15:34 ID:Heaven

>>6

>But the time doesn't ‘add up’ except in terms of system load

You're exactly right. This causes more system load in order to achieve the end goal in a shorter amount of time.

You're wrong about this not improving the time. I've already tested it with several users constantly posting, and there was a significant speedup. I'm using FCGI if that matters.

8 Name: dmpk2k!hinhT6kz2E : 2008-07-27 17:56 ID:Heaven

I've considered using threads in the past, but only because I wanted a unified cache, but my host doesn't allow memcached -- and I was bored.

I'm not sure I see the point otherwise, although it's pretty neat anyway. How substantial an overhead is multiple persistent processes over a single process with multiple threads, when all your persistent data is stored in a DB anyway?

As an aside, if you're that concerned about performance, Python isn't that great a choice. Ignoring the global interpreter lock, CPython is slow as molasses.

9 Name: #!/usr/bin/anonymous : 2008-07-27 21:14 ID:Heaven

>>5

>Hm, I'll have to remember to use a proxy the next time I'm being a amer and bumping my own thread. You win this time, ID system.

............................................
.......\\/////........._________
....../ \......./ /..
......|-[0][0]-|...../ sage /..
....0| /\ |0../ _______/...
......| ,_____, |...//................
......\ \-|-/ /__/.................
.......\__ __/.................
...........| |.................

10 Name: #!/usr/bin/anonymous : 2008-07-28 05:18 ID:TZEJpATg

It looks like you are creating threads to perform expensive IO operations. To assemble a web page, this makes no sense. Each request should just use 1 thread. Adding threads will increase execution up to a point (and the increase will only be small), but its not scalable. It will take less traffic to use more resoures.

What you should be trying to do is something like the .Net model. 1 thread per request. You can speciify callbacks when performing expensive IO operations so that the current requesting thread is released and then continued after the IO operation.

This can signifigantly decrease the load on your thread pool and allow you to serve a signifigantly more requests. This is the kind of performance you want to concider. Less resources so that the app can do more (wich is incredebly important to web apps as they are multi-user).

11 Name: #!/usr/bin/anonymous : 2008-07-28 15:24 ID:Heaven

gb2bed Trevor

12 Name: #!/usr/bin/anonymous : 2008-07-28 21:37 ID:Heaven

>>8

>How substantial an overhead is multiple persistent processes over a single process with multiple threads, when all your persistent data is stored in a DB anyway?

From what I'm told DreamHost (who is hosting me) has a search and destroy program for any processes which use up too much resources, and when I had the different people post at the same time, none of the processes were killed. I haven't done any true testing on how much higher resource usage it has, but I'll consider it.

>As an aside, if you're that concerned about performance, Python isn't that great a choice. Ignoring the global interpreter lock, CPython is slow as molasses.

It's not really that I'm so worried about performance, it's that by doing this I can shorten the difference between a faster language and Python. It also makes use of Psyco on repetitive and CPU-intense functions, which lessens the distance even further.
I picked Python because it seemed like the ease of learning/use and great standard library was worth the trade in performance.

>>10

>You can speciify callbacks when performing expensive IO operations so that the current requesting thread is released and then continued after the IO operation.

I've thought about that, however in this case that wouldn't work. The process is User makes post -> post is processed and inserted into database -> thread/board is regenerated -> user is redirected to thread or board.
If I were to release the user back to the board before the I/O operations were complete, they would see a stale version of the thread or board. Tell me if I've got something wrong here, because I'm completely open to changing this.

>>11
You must be new here. You're supposed to say that when people appear to be me, not when it's actually me. Thanks for playing, though.

13 Name: #!/usr/bin/anonymous : 2008-07-28 22:00 ID:Heaven

words of advice from IRC:
<MrVacBob> http://en.wikipedia.org/wiki/Varnish_cache#Architecture is this a joke? what kind of modern server has one thread per connection?

14 Name: #!/usr/bin/anonymous : 2008-07-28 23:16 ID:TZEJpATg

>>13

I hope you are posting this to point out the stupidity of the statement.

It's not 1 thread per connection. It's 1 thread per request.

There isn't a good reason to use more than 1 thread to process the information for a request.

To confuse most of you even more, I do sometimes spawn new threads when handling a request. But these are for background tasks. So if a request spawns an email, I don't need to wait for that email to be composed and sent away before the next instructions execute. So I still have 1 thread handling the request at a low priority, but multiple threads to complete the task (and in this case only because I want to execute it later).

15 Name: dmpk2k!hinhT6kz2E : 2008-07-29 02:02 ID:Heaven

> From what I'm told DreamHost (who is hosting me) has a search and destroy program for any processes which use up too much resources, and when I had the different people post at the same time, none of the processes were killed.

I've been running a set of FastCGI processes -- the captchas and sidebar specifically -- on Dreamhost for a couple years now without noticing any problems. I've been nailed by DH's process killer, but for things like screen and irssi. I suspect it's checking the process tree to find who the parent is.

In any case, this raises the question: if you're worried about the process killer, why keep everything in one process? It makes that process larger (and more likely to be killed?) and it's more catastrophic when it goes.

> But these are for background tasks.

Using background queues (or threads?) is a good idea -- all sizeable websites use them, since you can't grow beyond small without them. o.o-b

But I'm a bit concerned about Python's GIL; where I work we had so many problems with Ruby's utter shit threading that I'm all paranoid about it in other interpreted languages. Have you had any problems with it?

16 Name: dmpk2k!hinhT6kz2E : 2008-07-29 03:09 ID:Heaven

Actually, having thought about it a bit, doing background tasks with threads isn't a hot idea, but you probably don't have much choice with DH.

17 Name: #!/usr/bin/anonymous : 2008-07-29 16:12 ID:4YXGKGpd

>>16

You mean like in the case if your waiting thread gets killed, then you have no way to ressurect the actions it was supposed to take?

That of course is always a concern so you have to weigh all of the concerns of the task accordingly. Some tasks do require their info to be stored to some persisted queue before being run.

18 Name: dmpk2k!hinhT6kz2E : 2008-07-29 19:18 ID:Heaven

That's a part of it. The other part isn't something an imageboard really needs to worry about -- if you get enough traffic, keeping long-term threads on the same machine as short-term will be a source of grief.

19 Name: Eleo : 2008-07-30 06:12 ID:yeMRohP9

>>5

>If the site is small enough, it won't matter.

And I guess this brings me to my next point. Most imageboard sites are small. Very small. And they tend to die quickly, because everyone wants to make their own *chan and the fact is that it's hard to get hits and even harder to get people to post and participate. I know because I had such a dream at one point. Take it from me. It's been more than a year for my site which uses my own software, has unique features, and a unique interface, and I'm just now approaching 2000 uniques a day. I still consider my site to be small, although I am grateful that it's a lot less dead than it used to be. Even using Ruby and Rails which are known to be sluggish and hard to scale, my server load is low and my site is quite responsive and I haven't even implemented caching or anything.

That said, whether there is or isn't a benefit (I'm no expert) there aren't going to be many people to reap it. If you were hired by a large chan that needed software that scales better then your endeavor would make more sense, but honestly right now it just doesn't seem practical or truly useful.

You could potentially be using your intellect to do better things.

20 Name: #!/usr/bin/anonymous : 2008-08-07 20:22 ID:DUA1Pufv

Why on earth would you want to run multiple threads in a web application? It's not like the multiuser nature of WWW isn't going to cause the webserver to execute multiple concurrent instances of your webpages given multiple connections.

And anyway, multithreading in a web environment is retarded. Just go with fork, it's the safer way.

21 Name: dmpk2k!hinhT6kz2E : 2008-08-08 18:55 ID:Heaven

Not on Dreamhost it isn't. :/

22 Name: #!/usr/bin/anonymous : 2008-08-09 20:33 ID:Heaven

Don't use Dreamhost. Problem solved, Occam is satisfied.

23 Name: dmpk2k!hinhT6kz2E : 2008-08-10 18:09 ID:Heaven

I can't argue with that...

This thread has been closed. You cannot post in this thread any longer.