nikita ([info]hukuma) wrote,
@ 2004-06-27 17:40:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Current mood: accomplished
Current music:Video Killed the Radio Star - Erasure

more on cliques

This is what I did instead of working today. Since I had most of the code for finding largest cliques, this morning I decided why not automate it and put it on the web? It was something fun to do, it game a chance to learn mod_python, ... and it took most of the day! But now you can go to http://nikita.ca/lj/clique.py and find a large clique that you're part of. It's probably the largest, but I make no guarantees). A clique, BTW, is a group of people who all list each other as friends.

To find the cliques, I used a heuristic search. Starting with you, I find your friend with whom you have the largest number of friends in common and add them to the clique. Then I pick the next person who has the most friends in common with the existing clique, and keep extending it this way until I find a maximal clique. This doesn't always give you the correct answer, so I do a bit of backtracking. At each step, I consider at least the 3 top friends. If there are many people tied for first, I consider them all, since it makes no sense to pick one over the other. So far the number 3 has been a good compromise between runtime and exploring a large enough state space. Of course, I don't know if there are larger cliques I'm not finding. And it's possible that once more people start using it, my web server will melt. (Maybe I'll have to rewrite it in C then...)

The largest clique I've found so far is still the size 15 one from my last entry. I've set up a log to see if any larger ones are found... UPDATE: The current winner has size 24! UPDATE2: Wow, there's a clique of size 137!

</p>

The friends data is cached, so if you add someone to your friends list, the results won't change. UPDATE: Now there's a link to clear the cache for you and all your friends and recompute from scratch.




Page 1 of 4
<<[1] [2] [3] [4] >>

(163 comments) - (Post a new comment)


[info]mpnolan
2004-06-27 06:16 pm UTC (link)
Wow, this is the coolest tool I've seen so far on LiveJournal, and an interesting problem too. Nice job!!

(Reply to this) (Thread)


[info]doniago
2004-06-29 03:47 pm UTC (link)
Cooler than the one which shows everyone who's commented on your LJ and how often?

Just checking. :)

(Reply to this) (Parent)

Thanks! Now I feel better.
[info]sunah
2004-06-27 06:47 pm UTC (link)
1. sunah
2. hukuma
3. rebbyribs
4. paisleychick
5. svenof9
6. jilflirt
7. kragen

And of course, GREAT JOB!

(Reply to this)

Why would C be more efficient for your webserver?
[info]sunah
2004-06-27 06:48 pm UTC (link)
Why would C be more efficient for your webserver?

(Reply to this) (Thread)

Re: Why would C be more efficient for your webserver?
[info]hukuma
2004-06-27 07:00 pm UTC (link)
Typically, rewriting my python code in C makes it several times faster. I think the main issues are that python is interpreted and uses more general-purpose data structures, though I'm also not very good at writing efficient python code.

(Reply to this) (Parent)(Thread)

Re: Why would C be more efficient for your webserver? - [info]joxn, 2004-06-29 07:59 am UTC
Re: Why would C be more efficient for your webserver? - [info]joxn, 2004-06-29 08:12 am UTC
Re: Why would C be more efficient for your webserver? - [info]hukuma, 2004-06-30 12:57 am UTC

[info]catamorphism
2004-06-27 07:03 pm UTC (link)
Just one suggestion -- add a text field so that people can easily copy-and-paste the HTML to display their results into their journal, so it can spread virally! (Unless you don't want it to spread that fast and bog down your server ;-) Otherwise, way cool.

(Reply to this) (Thread)


[info]hukuma
2004-06-27 10:37 pm UTC (link)
Hmm... I'll think about it, but I already had to move it onto another server. This one has more RAM, and I modified it to be a bit more efficient, so no meltdowns yet...

(Reply to this) (Parent)

(no subject) - [info]x_rumour_x, 2004-06-30 01:37 am UTC

[info]ketzie
2004-06-27 07:21 pm UTC (link)
Mine I was smaller than I thought it should be. Then I realized that I haven't friended [info]cypherpunk95. Hee hee.

(Reply to this) (Thread)


[info]hukuma
2004-06-27 10:34 pm UTC (link)
Heh. I guess since neither of you post, there isn't much point in friending each other, except for silly social network purposes like this.

(Reply to this) (Parent)

cool
[info]cyt
2004-06-27 07:34 pm UTC (link)
this is a very interesting meme :)

(Reply to this)


[info]ashley_y
2004-06-27 07:35 pm UTC (link)
You probably ought to cache results also...

(Reply to this) (Thread)


[info]ciphergoth
2004-07-01 12:41 am UTC (link)
Agreed, from my experience with such memes...

(Reply to this) (Parent)


[info]ashley_y
2004-06-27 07:47 pm UTC (link)
I suspect using http://www.livejournal.com/misc/fdata.bml?user=hukuma would be faster...

(Reply to this) (Thread)


[info]hukuma
2004-06-27 10:35 pm UTC (link)
You're right; if nothing else, the time saved by not parsing/re-parsing the XML FOAF data is worth it.

(Reply to this) (Parent)(Thread)

(no subject) - [info]mcfnord, 2004-06-29 05:42 pm UTC
(no subject) - [info]wechsler, 2004-06-29 11:26 am UTC
(no subject) - [info]wechsler, 2004-06-29 11:32 am UTC

[info]naturalborn
2004-06-28 12:16 am UTC (link)
There are two basic heuristics for finding cliques - the first is what you're doing, where you start with one element and grow. The second is where you start with all elements and remove one at a time. The second one works vastly better, because if you start with a single element which isn't part of the true max clique, you're hosed.

(Reply to this) (Thread)


[info]hukuma
2004-06-28 12:28 am UTC (link)
How do you choose which elements to drop? Why won't you get in the same situation where the first element you drop was part of the true max clique?

(Reply to this) (Parent)(Thread)

(no subject) - [info]naturalborn, 2004-06-28 10:57 am UTC
(no subject) - [info]ciphergoth, 2004-07-06 11:09 am UTC
(no subject) - [info]naturalborn, 2004-08-02 08:52 pm UTC
(no subject) - [info]dreamingkat, 2004-07-03 04:23 pm UTC
(no subject) - [info]naturalborn, 2004-07-03 06:43 pm UTC
(no subject) - [info]dreamingkat, 2004-07-05 09:12 am UTC
(no subject) - [info]fiendess, 2004-08-04 09:23 am UTC
(no subject) - [info]hukuma, 2004-08-04 11:22 am UTC

[info]bugtilaheh
2004-06-28 06:56 am UTC (link)
Hmm, interesting. It didn't seem very accurate (considering I thought it would tell me who I comment on more or something. For example: a few friends and I comment on one particular person's journal a lot), but I guess since you explained it, it makes sense that way ("[finding] your friend with whom you have the largest number of friends in common and [adding] them to the clique")...

Mmmmmm... programming = yummy. I haven't programmed in so long. I'm so out of the loop. :(

(Reply to this) (Thread)


[info]hukuma
2004-06-28 09:08 am UTC (link)
A clique is a group of people who all list each other as friends. I don't do any checking of number of comments or anything else, just friend connections.

(Reply to this) (Parent)(Thread)

(no subject) - [info]bugtilaheh, 2004-06-28 03:57 pm UTC
(no subject) - [info]mcfnord, 2004-12-04 07:16 am UTC

[info]hythloday
2004-06-28 08:01 am UTC (link)
I wrote a 'correct' (ie bandwidth-blitzing) version, but I found that I was actually doing a great deal of caching - given that everyone taking it is at least partially-cached, have you thought of hooking the backend up to a database and caching there, and doing it optimally?

(Reply to this) (Thread)


[info]hukuma
2004-06-28 09:17 am UTC (link)
A database would be better than my current caching method, but I'm not sure what you mean by 'optimally'. Even with a DB, I have to do a heuristic search, since it's an NP-hard problem.

(Reply to this) (Parent)(Thread)

(no subject) - [info]hythloday, 2004-06-28 09:28 am UTC
(no subject) - [info]hukuma, 2004-06-28 09:41 am UTC
(no subject) - [info]hythloday, 2004-06-28 10:36 am UTC
(no subject) - [info]decibel45, 2004-06-29 11:54 am UTC
(no subject) - [info]ciphergoth, 2004-07-01 12:41 am UTC

[info]azurelunatic
2004-06-28 02:05 pm UTC (link)
Possible interesting refinement:

I got four cliques of 6 each. I noticed that on first glance, it looked like each person from the first clique mentioned was in two of the other three cliques. So I had to calculate the diversity of my multiple cliques.

Turned out for the 20 other possible clique members who weren't me, I had 10 unique people, for a clique diversity of 50%.

No one with multiple cliques will ever get a clique diversity of 0%, of course -- unless, like [info]neitherday, you run it and find that you have two cliques where a supposedly unique user in each (and the only unique user in each) is run by the same meatware unit. But that's something that only a human could tell you.

(Reply to this)


[info]darcydodo
2004-06-29 02:24 am UTC (link)
That's so funny, I came across your clique-finder through someone entirely unconnected to Bab5 and Berkeley; and then I saw the address and thought, huh, nikitab, I wonder if that's... and then it finished running, and I got to the bottom, and I thought, "It is!"

It works quite nicely, in that it divided my Oxford group and my Berkeley group very succinctly; but on the other hand, the numerous Oxford groups and Berkeley groups are really just permutations of each other.

(Reply to this)


[info]shatterstripes
2004-06-29 10:29 am UTC (link)
I found it interesting to compare the results of your clique-finder with the results of the MindMap that's been going around for a while. I'm not in any one overarching huge clique - I got 18 cliques of 8 people that mostly separated out into two themes.

Have you been noting who's in the most cliques as well as the largest? I think that'd be an interesting statistic.

Also, I found myself compelled to repack the expanded <lj user="blah"> tags back down before posting. It was just a little too... huge... for my tastes otherwise! If you rewire it in C or something, I'd suggest using a little of the immense speed gain that'd be to generate a cut-and-paste version of the text as lj user tags - I doubt anyone has enough noncongruent groups to have the expanded LJ tags break the 64k entry-size limit, but it could certainly happen more easily that way.

(Reply to this) (Thread)


[info]shatterstripes
2004-06-29 10:31 am UTC (link)
How very iiiiinteresting! You monsters must lead such iiiiiinteresting lives!

(sorry, I just noticed I used that word a lot in this comment, and my own entry on my results.)

(Reply to this) (Parent)

(no subject) - [info]hukuma, 2004-06-29 11:35 am UTC
(no subject) - [info]shatterstripes, 2004-07-01 12:37 am UTC
(no subject) - [info]ciphergoth, 2004-07-08 08:38 am UTC
Cliques
[info]claudia_yvr
2004-06-29 11:32 am UTC (link)
Very cool! The ones that came up for me would make for a lively, fun bunch, that's for sure.

Thanks for sharing!

(Reply to this)


[info]rtfirefly
2004-06-29 11:39 am UTC (link)
Ramsey Theory in action!

By changing a line or two, you could get another program that would find the largest independent set amongst one's friends - that is, the largest list of one's friends who have no friends in common other than you.

(Reply to this) (Thread)


[info]aliza250
2004-07-06 04:25 pm UTC (link)
That sounds like fun!

(Reply to this) (Parent)


[info]laurenlall
2004-06-29 01:00 pm UTC (link)
"Total time: 10001 steps
Ran out of time, showing best current guess."

Is there anyway around this? What exactly does this mean? Nifty tool, by the by.

(Reply to this) (Thread)


[info]hukuma
2004-06-29 01:18 pm UTC (link)
This means that I wasn't able to search all possibilities to make sure that the results given show the largest clique. I have a faster version in the works which will be able to look through more steps, but I'm not sure I'll be able to get it running before next week. However, it's very likely that the results you got are correct, since usually the largest clique shows up within the first few hundred steps.

(Reply to this) (Parent)


[info]shaggirl
2004-06-29 01:19 pm UTC (link)
Great code, thanks for sharing! I got 4 cliques of 25 users each, but it wouldn't show me anything beyond that.

(Reply to this) (Thread)


[info]ciphergoth
2004-07-08 08:52 am UTC (link)
Interestingly, calculating your cliques has been the hardest challenge I've thrown my clique finder so far - it took nearly four seconds, the first time it's taken over a second! You are in 10 cliques of size 25. 12 people are in all of them: digitalwave, ingrid_m, lapetite_kiki, mosself, paperbkryter, rosesmove, scribblinlenore, shaggirl, stopawhile, svmadelyn, thdancingferret, velvetglove.

(Reply to this) (Parent)


[info]mahoutsukaichan
2004-06-29 03:05 pm UTC (link)
I was only able to get 6 cliques of size 7, but this is a nifty tool. ^^;;

(Reply to this)


[info]qafhappy
2004-06-29 04:27 pm UTC (link)
I had a clique of 19, but I might have had more, because the search timed out. And I tried to search again, but it only used the cached results.

(Reply to this) (Thread)


[info]hukuma
2004-06-29 04:58 pm UTC (link)
It's not very likely that you had more -- usually it's able to find the largest clique pretty quickly. I have a faster version in the works that will search through more of the cliques, but it probably won't be up till next week.

(Reply to this) (Parent)(Thread)

(no subject) - [info]qafhappy, 2004-06-29 05:12 pm UTC
(no subject) - [info]ciphergoth, 2004-07-08 08:58 am UTC
(no subject) - [info]qafhappy, 2004-07-08 10:02 am UTC

[info]madcaptenor
2004-06-29 06:53 pm UTC (link)
Nice job!

I thought of this problem a few months ago, and actually did some searching to find the largest clique in my own journal; unfortunately I don't remember what it was. Didn't use any heuristics - pure brute force, just taking the cliques of size n and scaling them up to size n+1 - and that took quite a while. Of course, the fact that I was doing it by hand probably didn't help. I should learn how to program more.

But you just know someone's going to come along and claim your tool is "broken" because they found a bigger clique among their friends than your tool does.

(Reply to this) (Thread)


[info]hukuma
2004-06-29 07:04 pm UTC (link)
My new, faster code actually did an exhaustive search of all cliques among your friends, but it had to go through 445 cliques. I would imagine that would have taken quite some time by hand! Actually, the 2nd generation that's up on the web page now finds a bunch more cliques than the tool I was using yesterday.

(Reply to this) (Parent)

(no subject) - [info]ciphergoth, 2004-07-08 08:41 am UTC
slightly silly question - [info]rixim, 2004-10-19 05:31 pm UTC

[info]danielmarko
2004-06-30 12:46 am UTC (link)
I am a member of 15 cliques of size 9

Cool! Thanks!

(Reply to this)


[info]icedrake
2004-06-30 12:51 am UTC (link)
In a random "small world" moment of the day, I've been hearing your name since I started school at UW; primarily as a legend from the CSC mythology. Just wanted to wave at your direction and say "hi! The world is a tiny place after all!"

(Reply to this) (Thread)


[info]monroefashion
2004-07-01 02:33 pm UTC (link)
hiiiiiiii im going to UW next year! dont know why im commenting on this but i just did that thing and its pretty cool, plus i saw the word UW and had to get in on it. lol

jamie

(Reply to this) (Parent)(Thread)

Different UW - [info]eve_the_just, 2005-05-30 02:16 pm UTC

[info]jackola
2004-06-30 01:08 am UTC (link)
I very much like it, but maybe it should list the 5 largest? I mean, it's already done the processing, so maybe show the result?

*jack

(Reply to this) (Thread)


[info]bridgetester
2004-06-30 08:11 am UTC (link)
Seconded

(Reply to this) (Parent)(Thread)

(no subject) - [info]mhw, 2004-06-30 06:07 pm UTC
(no subject) - [info]ciphergoth, 2004-07-01 12:40 am UTC

[info]randomchris
2004-06-30 02:17 am UTC (link)
I think you should change the value of the search button to read "Clique here!" (I've done that in my journal.)

I just like bad puns...

(Reply to this) (Thread)

Hahaha!
[info]hukuma
2004-06-30 02:42 am UTC (link)
Great idea, I love it!

(Reply to this) (Parent)


(163 comments) - (Post a new comment)

Page 1 of 4
<<[1] [2] [3] [4] >>

Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…