Word Clouds / Python Performance (10)

1 Name: #!/usr/bin/anonymous : 2008-09-09 01:22 ID:Uie6ln6O

So, as a way of getting acquainted with Python, I wrote a simple script that generates a word cloud of the user's anime/manga collection. Example: http://i35.tinypic.com/5nr8zn.png

The sizes of the words are calculated based on the amount of a particular series found in the user's collection.

It uses the "brute-force" method of positioning words, where each word is basically tried at each pixel in an ever-widening spiral around the center word until the word fits. The problem is, however, that without Psyco this method is practically useless because it is sloooow.

So, any tips for better word cloud generation? Or any Python performance tips in general?

Code here, btw: http://ls.pastebin.com/fb63871e

2 Name: #!/usr/bin/anonymous : 2008-09-09 09:24 ID:Heaven

printf("FLCL");

3 Name: #!/usr/bin/anonymous : 2008-09-09 21:20 ID:Heaven

Cool.

4 Name: #!/usr/bin/anonymous : 2008-09-09 21:20 ID:wUtQWWqr

>>2
NameError: name 'printf' is not defined

5 Name: #!/usr/bin/anonymous : 2008-09-19 16:26 ID:VwruKosL

lol >>4

6 Name: #!/usr/bin/anonymous : 2008-09-19 17:22 ID:NUrGzVCV

7 Name: #!/usr/bin/anonymous : 2008-10-09 18:24 ID:ySKHnJsr

A few optimizations I might try would be as follows:

> If you have a multi-core machine, threading the code might improve performance. This could be done in multiple ways, but you might try concurrently searching around a word clockwise and counterclockwise. Whichever finds a working position first terminates the other thread and spawns two new ones with the next word.
> Instead of working based on positions around words, you could store all the words already placed in one large list of corner points and a two-tuple which described which side of the point is the outside of the shape. Using this method you could quickly see a word wouldn't fit on a given side by comparing the length of the side to the the corresponding dimension of the word. The only drawback to this method is that, although it would be called much less frequently, the overlap checking routine would be much more complicated. You might want to try doing overlap checks using word boundaries but crawling along the outline with this method. This would probably save time at the cost of memory.

Also, read PEP 8 (http://www.python.org/dev/peps/pep-0008/)

I haven't actually tried my suggestions, so they may not be faster.

8 Name: #!/usr/bin/anonymous : 2008-10-10 04:14 ID:ySKHnJsr

You should stop trying to place words around the central words after a certain point. Cut off checks to some number of words back.

9 Name: dmpk2k!hinhT6kz2E : 2008-10-10 16:47 ID:Heaven

If I understand how the program works, make a list of words sorted by pixel length, then use a binary search to find a good fit.

10 Name: #!/usr/bin/anonymous : 2008-10-26 00:13 ID:Heaven

This thread inspired me to try my hand at a word cloud generator.

My solution was to define a rectangle centered on the canvas that had a fraction of the area of the canvas. I would generate a random (x,y) point inside the rectangle which would be the position of the top-left corner of the word I am trying to place. If the word fit and didn't overlap any other words at that point, great. If not, I would get a new random point and try it again. If it still wasn't fitting after a bunch of tries, I would increase the area of the target rectangle by a bit and try again with the random top-left corner.

If it got to the point where the area of the rectangle was equal to the area of the canvas and it still didn't fit after a whole bunch of tries, I would conclude that the word couldn't fit and move on to the next word.

Here's my sloppy F# code:
http://pastebin.com/f738eb53f

This thread has been closed. You cannot post in this thread any longer.