
For completeness, you might then do the actual clustering something like:

import Data.Tree
import Data.List
import Data.Function
 ... code from before ...
cluster :: Ord cost
=> (a > b)
> (a > cost)
> Tree a > Cluster (cost,[b])
cluster proj cost t =
 List can't be empty since Tree can't.
let o:os = sortBy (compare `on` fst)
. flatten
. agglom proj cost
$ t
in foldl' cons (One o) os
data Cluster a
= One a
 Many [Cluster a]
deriving(Eq,Ord,Read,Show)
instance Functor Cluster where
fmap f (One a) = One (f a)
fmap f (Many cs) = Many ((fmap . fmap) f cs)
cons :: Cluster a > a > Cluster a
cons c a = Many [c,One a]
{
ghci> let c = cluster fst snd t
ghci> :t c
c :: Cluster (Cost, [Id])
ghci> c
Many [Many [Many [One (0,[4]),One (1,[3,1])],One (3,[2])],One (12,[5])]
ghci> :t fmap snd c
fmap snd c :: Cluster [Id]
ghci> fmap snd c
Many [Many [Many [One [4],One [3,1]],One [2]],One [5]]
ghci> :t fmap fst c
fmap fst c :: Cluster Cost
ghci> fmap fst c
Many [Many [Many [One 0,One 1],One 3],One 12]
}

Matt
On 12/23/09, Matt Morrow <[email protected]> wrote:
> Hi Nikolas,
>
> Interesting problem. I'd do something like the following, where
> the initial spanning tree from you example (retreeified) is:
>
> {
> ghci> :t t
> t :: Tree (Id, Cost)
> g
> ghci> ppT t
> (4,0)
> 
> + (3,1)
>  
>  ` (1,1)
> 
> ` (2,3)
> 
> ` (5,12)
> }
>
> and which results in the tree:
>
> {
> ghci> let s = agglom fst snd t
> ghci> :t s
> s :: Tree (Cost, [Id])
> ghci> ppT s
> (0,[4])
> 
> + (1,[3,1])
> 
> ` (3,[2])
> 
> ` (12,[5])
> }
>
> which can then be flattened/etc as needed by further steps of the algo.
>
> The code for `agglom':
>
> 
> import Data.Tree
> import Data.List
>
> type Id = Int
> type Cost = Int
>
> t :: Tree (Id,Cost)
> t = Node (4,0)
> [Node (3,1) [Node (1,1) []]
> ,Node (2,3) [Node (5,12) []]]
>
> ppT :: Show a => Tree a > IO ()
> ppT = putStrLn . drawTree . fmap show
>
>   Compress the incoming @Tree [email protected] with @[email protected]
> agglom :: Eq cost
> => (a > b)
> > (a > cost)
> > Tree a > Tree (cost,[b])
> agglom proj cost = go
> where accum = accumEq proj cost
> go (Node a []) = Node (cost a,[proj a]) []
> go (Node a ts) = let b = proj a
> c = cost a
> (bs,ss) = accum c ts
> in Node (c,b:bs) (fmap go ss)
>
>   Repeatedly @[email protected], and return a pair
>  whose /first/ element is a list of the projected
>  @[email protected] from those root values along paths from
>  the roots of the trees in the incoming @[Tree a]@
>  which have @[email protected] equal to the third function parameter,
>  and whose /second/ element is the (concatenation of the)
>  list(s) gotten from each of the @[email protected] calls.
> accumEq :: Eq cost
> => (a > b)
> > (a > cost) > cost
> > [Tree a] > ([b],[Tree a])
> accumEq proj cost c = go [] []
> where split ts = splitEq proj cost c ts
> go xs ys [] = (xs,ys)
> go xs ys ts = let (eqs,neqs) = split ts
> in case eqs of
> []> ([],ts)
> _ > let (bs,tss) = unzip eqs
> in go (bs++xs)
> (neqs++ys)
> (concat tss)
>
>   Split the incoming trees into
>  (1) a @[(b,Tree a)]@ of each @[email protected] is the
>  @[email protected] value from an @[email protected] where
>  the @[email protected] of that @[email protected] is equal to
>  the third function parameter, and (2)
>  the members of the incoming @[Tree a]@
>  whose roots' costs are /not/ equal to
>  the third function parameter.
> splitEq :: Eq cost
> => (a > b)
> > (a > cost) > cost
> > [Tree a] > ([(b,[Tree a])],[Tree a])
> splitEq proj cost c = foldl' go ([],[])
> where go (!eqs,!neqs)
> [email protected](Node a ts)
>  c==cost a = ((proj a,ts):eqs,neqs)
>  otherwise = (eqs,t:neqs)
> 
>
> Cheers,
> Matt
>
> On 12/23/09, Nikolas BorrelJensen <[email protected]> wrote:
>> Hi! I have some trouble implementing singlelinkage clustering algorithm
>> by
>> using a minimumspanning tree, so I would appreciate if some of you could
>> give me some advise.
>>
>> I am implementing a singlelinkage clustering algorithm, and my approach
>> is
>> to use minimum spanning trees for that task. I am using the library FGL (
>> http://web.engr.oregonstate.edu/~erwig/fgl/haskell/), and I have managed
>> to
>> compute a minimum spanning tree from an arbitrary fully connected graph
>> with
>> 5 nodes. I get [ [(4,0) ] , [ (3,1) , (4,0) ] , [ (1,1) , (3,1) , (4,0) ]
>> ,
>> [ (2,3) , (4,0) ] , [ (5,12) , (2,3) , (4,0) ] ], which is the root path
>> tree of the minimum spanning tree created by the function msTreeAt.
>>
>> >From that I would create a dendrogram. [ (1,1) , (3,1) , (4,0) ] is
>> telling
>> that node 1,3 and 4 has the same cost, namely cost 1. Therefore these are
>> merged at level 1. At level 1 we now have 3 clusters: (1,3,4), 2 and 5.
>> Now
>> the second lowest should be merged, that is 2 and 4. BUT because 4 is
>> already merged in the cluster (1,3,4), we should merge (1,3,4) and 2 at
>> level 3 (because the cost is 3). Now at level 3 we have 2 clusters,
>> (1,2,3,4) and 5. Now we merge the last one at level 12: (1,2,3,4,5), and
>> we
>> are finished.
>>
>> I have very hard to see, how this could be done efficiently without
>> pointers
>> (as in C). I have thought of just saving the nodes from the start of the
>> root path, and traversing it, but a lot of searching should be done all
>> the
>> time.
>>
>> Can you please give me some advise on that?
>>
>> Kind regards
>>
>> Nikolas BorrelJensen
>> Computer Science
>> University Of Copenhagen
>>
>
_______________________________________________
HaskellCafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskellcafe

