haskell-cafe@haskell.org
[Top] [All Lists]

Re: [Haskell-cafe] FGL/Haskell and Hierarchical Clustering/dendograms

Subject: Re: [Haskell-cafe] FGL/Haskell and Hierarchical Clustering/dendograms
From: Nikolas Borrel-Jensen
Date: Tue, 29 Dec 2009 02:15:26 +0100
Thank you very much for your reply! I have been looking at the code, and there are two problems, as I can see. First, trying with the example

t1 :: Tree (Id, Cost)
t1 = Node (4,0)
     [Node (3,2) [Node (1,12) []]
     ,Node (2,3) [Node (5,1) [Node (6,2) [Node (7,2) [] ]]]]

printed as

(4,0)
|
+- (3,2)
|  |
|  `- (1,12)
|
`- (2,3)
   |
   `- (5,1)
      |
      `- (6,2)
         |
         `- (7,2)

your function 'cluster fst snd t1' returns

Many [Many [Many [Many [Many [One (0,[4]),One (1,[5])],One (2,[3])],One (2,[6,7])],One (3,[2])],One (12,[1])]

I can't see how this representation is giving the hierarchical clusters. The example above should resolve into

level 1: [[(2,3),(5,1)],[(6,2)],[(7,2)],[(4,0)], [(3,2)], [(1,12)]]

level 2: [[(2,3),(5,1),(6,2),(7,2)], [(4,0),(3,2)], [(1,12)]]

level 3: [[(2,3),(5,1),(6,2),(7,2),(4,0),(3,2)], [(1,12)]]

level 4 (or (cost) level 12): [[(2,3),(5,1),(6,2),(7,2),(4,0),(3,2),(1,12)]]

By doing it this way, we cluster all nodes connected with edges less than or equal x at (cost) level x. Clearly, we can have level 1: [[(1,1),(2,1)],[(3,1),(4,1)],...] if the edges between [(1,1),(2,1)] and [(3,1),(4,1)] are greater than 1.

Second, I don't think it is trivial to tree-i-fy the root path tree. I have done the function treeifyMST, which surely isn't efficient, since the list encounteredNodes is traversed as many times as the number of nodes (a binary search tree would be more efficient). But more important, the tree isn't correct, since each path is connected at the root of the tree.

Example (LRTree Int): [ [(1,0)],[(5,1),(1,0) ], [(2,2),(1,0)] , [(3,3),(2,2),(1,0)] , [(4,4),(2,2),(1,0)] ] -> [ [(5,1),(1,0) ] , [(3,3),(2,2),(1,0)] , [(4,4),(2,2),(1,0)] ]

In my code, all 3 paths are branching at the root (1,0), but should for the last two paths branch at node (2,2). How should I cope with that in an efficient way?

I wonder if if it is easier to implement it from the ground using the approach given at http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html ?

---------------------------------------------------------------------
--TO DO: now all paths are connected at the root of the tree. Should be patched at the right places inside the tree. The search in the list encounteredNodes is not efficient.
treeifyMST :: LRTree Int -> Tree (Id,Cost)
treeifyMST rootpathtree = 
let
(LP rpt:rpts) = rootpathtree
root = head rpt
revrootpathtree = reverse rootpathtree
in
Node root (constructTree [] revrootpathtree)
where
constructTree :: [Int] -> LRTree Int -> [Tree (Id,Cost)]
constructTree encounteredNodes (LP x:[]) = []
constructTree encounteredNodes (LP x:xs) =  
let 
path1 = x !! 0
path2 = x !! 1
id1 = fst path1
id2 = fst path2
in
case (L.find (==id1) encounteredNodes) of
-- because we have encountered an already processed id, we can skip this sublist
Just _ -> constructTree (id2:encounteredNodes) xs
-- new id, meaning that we have encountered a new path
Nothing -> let
lenpath = length x
revpath = reverse $ take (lenpath-1) x
tree = listToNode revpath
 in
tree:constructTree (id2:encounteredNodes) xs
constructTree _ _ = []

listToNode (p:ps:[]) = Node p [Node ps []]
listToNode (p:ps) = Node p [listToNode ps]
---------------------------------------------------------------------

Nikolas
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@xxxxxxxxxxx
http://www.haskell.org/mailman/listinfo/haskell-cafe
<Prev in Thread] Current Thread [Next in Thread>