Subscribe to XML Feed
07 Oct 2009

It all started as a loop in bash

Once upon a time (and quite a few other times), I needed to start a process and watch it’s status to restart it in case of crash (yeah, this kind of things still happens).

It all start with a simple bash script:

until $CMD
do
  echo "crashed! restart in 1 second"
  sleep 1
done

And then I needed to send alert in case of crash, to stop the process on request and to query the status. At this point, I should have moved away from bash … but I didn’t. Worst than that, I stole idea on the web to add an HTTP API … in bash, yes. Why not monit? not even half the fun!

So now there is this big script, with the wonderfull name of “runloop.sh”, hosted on github.

Following are use examples.

  • start
    Start the command line and monitor the process from another background process. The “myAppID” is a simple identifier of the instance of the running program. This create files to log the pid of the app, of the monitor, and also one file with monitor logs. If myapp crash or stop with exit code different from 0, it is restarted.
shell> ./runloop.sh start myAppID /usr/local/myapp/bin/myapp --be-nice
  • status
    This print a message “running” or not, and exit with code 0 if running.
shell> ./runloop.sh status myAppID
  • stop
    Send the kill signal to process identified by myAppID.
shell> ./runloop.sh stop myAppID
  • HTTP:
    Direct browser to http://localhost:9000/myAppID to get status or to stop myAppID.
shell> ./runloop.sh http
View Comments
04 Jun 2009

Ubigraph view of pagerank result

This post is a sequel to the previous post about visualization of random graph through Ubigraph.

This time, I implemented a very simple pagerank algo (functor Pr) to set the size of the vertices.


./socnet.byte -n 320 -ubi


./socnet.byte -n 520 -ubi

View Comments
30 May 2009

Random graph generation example

Have been playing around with the beautiful ocamlgraph library. I also installed xmlrpc-light to draw graph through the free version of the ubigraph server.

I wrote a tool to generate some random graph. The goal is to render graph that have the same connectivity caracteristics as social networks. The generation algorithm is inspired of the forest fire model for graph generation (propagate links to neighboorhood recursively).

The code is online. All is in single file socnet.ml, but I use code from xmlrpc-light library for Ubigraph calls. There is a Builder module to augment the default builder of ocamlgraph with a state; a special UbigraphBuilder to replicate the graph structure for rendering; and finally the algo to generate the graph is in Gen module.

The result is a single binary:


./socnet.byte
  [-n <node-number>]
  [-gv] [-ubi]
  [-fwd <forward-factor>] [-bck <backward-factor>] [-bi <linking-back-factor>]

When creating an edge from a source node to some target node:

  • forward factor is related to the probability to link source node to successors of a target node
  • backward factor is related to the probability to link source node to predecessors of a target node
  • linking back factor is related to probability to create an edge from target node to source node

Following are some screenshots of results, although it is much more interesting to watch the construction steps though Ubigraph renderer.


./socnet.byte -n 12 -bi 0.0 -gv


./socnet.byte -n 12 -gv


./socnet.byte -n 800 -bi 0.0 -ubi


./socnet.byte -n 800 -ubi

View Comments
25 May 2009

Lock-free shared memory: 1 writer, N readers

In recent projects I have been involved with, a lock-free shared memory pattern is used quite often. I would like to describe it here, and if anyone can tell me the name for this pattern, I’ll be very much grateful.

This technic I’m going to describe can be used in the case of 1 writter and N readers. It shares some kind of idea from both a spin-lock and an STM (Software Transactional Memory) in the sense that the writer does not take a lock but write some kind of transaction state of the memory, and reader does not lock neither but keep reading in a loop (spin) until the transaction is coherent.

All that will make more sense with some code (refer to note 1 and 2 for warning about this code; messing with memory model is very easy with lock-free stuffs). Suppose you have a shared data containing 2 values x and y; first step is to add some kind of transaction markers:


// beside of the data ...
// 2 transaction markers v1 and v2 are added to the memory
struct element_t {
  volatile uint v1;
  double x, y;
  volatile uint v2;
} elt;

Next step is simple: as long as v1 is equal to v2, the memory is in coherent state; else we don’t know what the writer is doing and nothing should be read. So writer has to maintain those markers as follow:


// writter thread use marker v1 to start a transaction
elt.v1 ++;
// does some data updates
elt.x = 42;
elt.y = elt.x * 12;
// and set marker v2 as end of transaction
elt.v2 ++;

When the reader access the data, it has to make sure that v1 and v2 are equals and do not change during the time reader access the memory. This can be done like following 3:


// reader thread keep reading the memory ...
// until transaction markers are in "correct" state
uint m1,m2;
double x,y;
do {
  m1 = elt.v1;
  x = elt.x;
  y = elt.y;
  m2 = elt.v2;
} while(m1 != m2);
// at this point, x and y have a correct value

That’s it!
So, what the name of this pattern?

By the way, if you like to read about shared memory concurrency, make sure to read the Cilk++ blog , it’s a gem ;).

1 The marker increment need to be an atomic operation, and you may want to use a memory barrier instruction in some cases.

2 This code rely on operations order. In most of programming languages, order is never enforce by the language definition, and some compiler might try to reorder things (to improve cache or whatever). Even in C, the volatile does not protect the code. Only way I know to check this code: compile and test it! different optimization flag can result in different code. Check the generated code on your compiler/platform to make sure.

3 Of course you need to handle manually how to follow pointer in your data.

View Comments
24 Apr 2009

Adding a timeout to a shell command

Quite a few times one or the other of my automation shell scripts got hang while running some external command (e.g. a remote shell command may hang if network is in weird state). For those cases where a stopped automation script could really hurt the system, I ended up writing a script wrapper that add a timeout to any command.
This is kind of heavy weight implementation: instead of running the command directly, this start a first process sleeping for the duration of the timeout, and execute the command in a background process, and finally return the result of the first finishing process! 3 processes: 1 working, 1 sleeping, 1 waiting ... but it helped me few times!
Gisty code
View Comments
10 Apr 2009

For the lost twitter sheeps

Tweet-tweet here, tweet-tweet there, here tweet, there tweet, ... everybody tweet! And you got to choose who to follow or you'll be quickly overwhelmed!
MrTweet can help you there by proposing you some tweets to follow. How can he achieve that? Through this blog post, let's focus on the subject of recommending twitter users to follow to existing user.
Disclamer: I have no accointance what's so ever with MrTweet, and absolutely no knowledge about it, nada; all the following are random thoughts; oh, and I have no accointance with Twitter neither!
Let say you have access to all the tweets of the world [1], and you can know who's following who on twitter. How would you look for new user to follow? Obviously there are quite a few possibilities ... and some of them are probably yet to be invented. Traditionally, mining this kind of info can be done at 2 levels: (1) looking at the content and recommend to user U1 to follow user U2 if there is any correlation between tweets content of users U1 and U2; (2) second level is to look at the graph on connected communities and recommend user U1 to follow user U2 if they share some links in the graph. On top of this there are many refinements in what can be done to mine such information: focusing and differenciating static data and dynamic aspects of user behaviours, mixing approaches of content and meta data, looking for time-series correlation ... this is a vast domain.
Following are some ideas about different way to implement MrTweet.

The UserRank dictature

Using a Pagerank-like algorithm to explore the graph of followers, you can obtain a global UserRank for all the twitter users.
Why Pagerank? Well, this is probably not the best approach of the problem, as it extract a global knowledge of your data, where in this case we should focus on communities. But heck! I like PageRank, it's simple, and it's a good start to think about the problem. PageRank was the first scientific article that I study thoroughly, and it was  a ha-ha moment: simple to read, easy to understand, easy to implement ... computer science was possible! No need of linear algebra, matrix and eigenvector to understand it (it helps for the proof though :)). [2]
So how to apply Pagerank to "tweeterers"? Well, following is one way to do it.
For each user consider its followers as mark of interest. The algo define PageRank number Pr of user U as the weighted sum of the Pr of all its followers, where the weight is in inverse proportion of the number of people followed by the follower.
That is, with some notations to make it hopefully clearer:
  • Let Pr(U) be the PageRank of user U;
  • Let Fo(U) be the set of all user that U is following, and [Fo(U)] its cardinality;
  • Let Fi(U) be the set of all user that are following U;
  • PageRank is: Pr(Ux) = Sum(Pr(Un)/[Fo(Un)]), for all  Un in Fi(Ux).
Thus Pr is high if the user is followed by users which have a high Pr and are not following everybody! Easy enough but wait ... how to get the Pr of the followers? this is kind of a recursive definition!
The beauty of the PageRank algo is that it propose an simple iterative solution to solve this problem: asume first that everybody as a Pr=Pr_0 of 1/N (N is the number of users, this give everybody a fair starting value), compute the new Pr_1 of each user using Pr_0 assumption, then compute the Pr_2 of each user given your Pr_1 assumption ... and iterate a few more times until the Pr stabilize, usually after few computations only (this is not magic but math ... which sometimes looks alike). If you like matrix and linear algebra, you can write the whole process as a loop which multiply a the vector of all PageRank by a square matrix of all 'follow-relations'.
Now, all we have is a Pr for each user? then what? Who shall I follow? Well, use your network to discover it!
Take all the people you are following, grab all the people they follow themselves, and remove your own sibling from this big set of people; sort them by Pr and you'll get to follow the most "popular" people linked to your network.
We have a UserRank algo to recommend people to follow to any twitter user! Well, all this rely on assumption that being followed give the user some importance. And I don't know if this works, I haven't tried it! Maybe it can be done with google API and the twitter page of the user ;)
Let explore another algo with another kind of idea.

Following the Slope

In fact, what we're looking for with MrTweet, is the same kind of recommendation as Amazon made popular: "users who like this also like that". Using the same meta-data as previously (the graph of followers), we can rephrase it as "user who follow Ux and Uy are also following Uz".
Algo to solve those problems are usually classified as recommender systems (see the NetFlix for importance of those kind of algo). A big problem of those solution for web application is often scalability and dynamic nature. But let's pick one such recommender system and see what can be done for MrTweet problem.
No surprise, let's look at SlopeOne algorithm. The wikipedia page explain it very simply and I surely can't do better. All we need to do is map the data to our problem. Let say that if user U1 follow user U2, this means user U1 rate user U2 with weight 1. Doing so, we'll have only rating of weight 1. This is ok for SlopeOne to give some result though.
The main difference with previous method of UserRank is that this is not a global analysis of all the users but rely more on the network of the user to which you want to do the recommendation.
Wanna real rating value to feed your SlopeOne algo? well, why not use the UserRank you defined previously as the rating value, and compute the SlopeOne prediction based on that? thus you would get as recommendation not the most popular user linked to your network but the most popular user liked by your network! Not so bad! Maybe yes, maybe not ... need to be tested!

More than a graph?

We've been looking at the graph of followers, which give recommendatons based on "neighborhood". Let's open the problem a bit more ...
  • instead of using followers, we can look at the graph of replies
  • why not look at the tweet content? we could check correlation in keyword, or in the URL (even following the URL)?
  • why not cross data from twitter with data from other network? ... and I'm going to read about this just here: http://33bits.org/2009/03/19/de-anonymizing-social-networks :)
I don't know you, but this kind of problem delight me. But after all this thinking and no action, I'm now going to write some real code to do more than think about algo ;).
[1] With N millions users, 7 tweets a day, 140 bytes a tweet in raw format ... storage is not that huge, even for few years. How many twitter users? where does seven come from? who care! engineers only need order of magnitude and units!
[2] Pagerank is closely related to the hub and authorities: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.3875 [3]
[3] I know, all those ref are "old" :)
View Comments
10 Jan 2009

TAOCP vs. Code Complete

Surprising! as of now, in my experiences of software engineer, I met more people that read/know about TAOCP than people who read Code Complete.
It may be a probability accident (I kind of remember that probabilities and small numbers are not good friends). It may also be influenced by my fields of experiences. But assuming this is real (i.e. programmers know more about TOACP), it's doesn't seems right.
I consider TAOCP as a reference to write efficient code; Code Complete is a reference to write good software (eventually being a product maintained by a team). Both are important, but the first one being used way less often!
View Comments
07 Jan 2009

Back to roots ... with physics!

This blog is not completely abandoned ... I'm just buzy lazy those days.
Just to relate my brief come back to roots: doing physics simulation in 3D ;).
I've been playing with Irrlicht as 3D framework. It's good, not to complex. But the API is not always easy (you're forced to use the irrlicht not-automatic counting ref). I haven't found doc about coordinate systems (orientation and angle's units) but it's using common practices in game engine. It's pretty fast, and the simple included demos help a lot to get started (Ogre3D don't come with simple examples).
For the physics engine, I've used Bullet, and it's really pleasant to use! and result are great! I only quickly went through some part of the code, and it's great code, I think I will learn quite a few tricks of this code (e.g. how to avoid conditional branch ('if') to select between 2 unsigned integers ... and it even can be usefull for efficiency on some processor like Cell (maybe related to parallelism and pipelining?). No really, this is beautiful code, and if you just want to play a little bit with physics, using Bullet directly is more fun than PAL (Physic Abstraction Layer).
I'll post the code when/if my Domino Rally simu is done :) ... but without sound it's not good.
View Comments
30 Nov 2008

Terminology: radix, Patricia, TRIE and tree

I took some time to get the terminology: radix tree/trie, patricia tree/trie ... it's all the same, really:
  • A trie is a multi-way tree;
  • I guess radix term replaced the Patricia appellation because it sounds more generic (not specific to information retrieval);
  • And in the case of radix trie, if you use the binary representation of the key, you have an alphabet of 2 and the trie become a 2-ways tree, thus radix tree :P
View Comments

Older Posts

String specific data-structure: set and map 09 Aug 2008 Comments
Generate all pair permutations in (not really) random order (and space efficiently) 07 Aug 2008 Comments
Simulated annealing in OCaml 11 Jun 2008 Comments
SG bus routes ... following 28 May 2008 Comments
SG bus services map 24 May 2008 Comments
screenshot-sgbus-124_inner 24 May 2008 Comments
screenshot-sgbus-124-shortest_inner 24 May 2008 Comments
erlocaml wc map-reduce example 20 May 2008 Comments
Ocamerl - Erlang ... echo-ing in shells 13 May 2008 Comments
Erlang to call an Ocaml seam carving lib 10 May 2008 Comments
Tiny progress on erlocaml 06 May 2008 Comments
me me Euler Pb1: looking at the data (and fun with python) 25 Apr 2008 Comments
ocaml vs. F# for big integer ... surprising performance test! 30 Mar 2008 Comments
ocaml native code 25 Mar 2008 Comments
ocamerl ... no update :( 13 Mar 2008 Comments
For who is this blog? 13 Mar 2008 Comments
Control by fuzzy logic 13 Mar 2008 Comments
Could Erlang and ICE be friends? 19 Feb 2008 Comments
Language typing: no strong preference yet! 17 Jan 2008 Comments
Pythonize your vim 17 Jan 2008 Comments
Coming soon: an interesting post about ocamerl 08 Jan 2008 Comments
Simple game 02 Jan 2008 Comments
ocamerl update: handshake 28 Dec 2007 Comments
Concurrent behaviours 22 Dec 2007 Comments
ocamerl 16 Dec 2007 Comments
Hylomorphism and Euler problem 2 19 Nov 2007 Comments
Euler problem 2 14 Nov 2007 Comments
Hylo again 11 Nov 2007 Comments
Morphing ruby to erlang just for fun 11 Nov 2007 Comments
xkcd draw the time... 05 Nov 2007 Comments
Mobile phone keyboard 26 Oct 2007 Comments
Erlang walk in AVI file 09 Oct 2007 Comments
Erlang/OTP installation from sources 26 Sep 2007 Comments
Different DBMS 07 Sep 2007 Comments
Recommending popular items 06 Sep 2007 Comments
Screenshot: adviserl services 31 Aug 2007 Comments
Screenshot: adviserl 31 Aug 2007 Comments
One more service for adviserl: an HTTP API 31 Aug 2007 Comments
mnesia enter in the playground 23 Aug 2007 Comments
advance backward? 03 Aug 2007 Comments
Sieve again 27 Jul 2007 Comments
delicious adviserl 21 Jul 2007 Comments
Storing data into adviserl 21 Jul 2007 Comments
Advises on delicious tags 21 Jul 2007 Comments
adviserl got a gen_server API 18 Jul 2007 Comments
object AND functional 11 Jul 2007 Comments
Googling to learn collaborative filtering 10 Jul 2007 Comments
Adviserl goes live 06 Jul 2007 Comments
Simple stream implementation in Erlang 07 May 2007 Comments
Programmers everywhere! 28 Apr 2007 Comments
SICP, concurrency and time 28 Apr 2007 Comments
8-queens ... me too! 10 Apr 2007 Comments
Positive integer and recursion 05 Apr 2007 Comments
grinderl is sinan-ized 05 Apr 2007 Comments
Some articles and why they surprised me 31 Mar 2007 Comments
grinderl: less tsung, more capistrano 31 Mar 2007 Comments
List of Erlang data structures 24 Feb 2007 Comments
Simple problem ... looking for simple solution 24 Feb 2007 Comments
GMailReader 15 Feb 2007 Comments
clap clap clap 12 Feb 2007 Comments
Teamwork 12 Feb 2007 Comments
[priv] TaDa - readings 12 Feb 2007 Comments
grinderl: early stage 10 Feb 2007 Comments
STM drive me to the Anarchy 23 Jan 2007 Comments
Erlang: the good, the bad and the chubby 23 Jan 2007 Comments
Music comes from here!?!! 23 Jan 2007 Comments
UX* Design 16 Jan 2007 Comments
Thread, Event, and concurrency model 16 Jan 2007 Comments
meta-musico-post 11 Dec 2006 Comments
To do or not to do? 27 Nov 2006 Comments
Baby steps in the Erlang world! 25 Nov 2006 Comments
Becoming nerd... 15 Nov 2006 Comments
To Zune or not to Zune 14 Nov 2006 Comments
Home Multimedia System 10 Nov 2006 Comments
Pacstatistics 04 Nov 2006 Comments
My next programming language 02 Nov 2006 Comments
Mirror neurons and understanding by imitation 01 Nov 2006 Comments
Blogging is not easy! 31 Oct 2006 Comments
@home 31 Oct 2006 Comments
From how to what 31 Oct 2006 Comments
Re-discovering my music 22 Oct 2006 Comments
CS in learning, learning of CS 22 Oct 2006 Comments
Education 22 Oct 2006 Comments
MacMillan English Dictionary 21 Oct 2006 Comments