[email protected]
[Top] [All Lists]

Re: [Haskell-cafe] Explicit garbage collection

Subject: Re: [Haskell-cafe] Explicit garbage collection
From: Miguel Mitrofanov
Date: Fri, 8 Jan 2010 01:03:38 +0300
Hmm, interesting. Seems like I have to take a closer look at GHC's garbage collection. Thanks!
On 8 Jan 2010, at 00:53, Edward Kmett wrote:

That is kind of what I thought you were doing.

Retooling to see if we can get any better performance out of collecting, it seems that System.Mem.PerformGC does a foreign call out to performMajorGC to ask for a global collection. But I would hazard that you might be able to get most of the benefit by just asking for the nursery to be cleaned up.
To that end, perhaps it would be worthwhile to consider switching to
something like:
foreign import ccall {-safe-} "performGC" performMinorGC :: IO ()

-- System.Mem imports performMajorGC as 'performGC' but the minor collection appears to be called performGC, hence the rename.
This should let you call for a minor collection, which should be
considerably less time consuming. Especially as if you call it
frequently you'll be dealing with a mostly cleared nursery anyways.
-Edward Kmett

On Thu, Jan 7, 2010 at 4:39 PM, Miguel Mitrofanov <[email protected] > wrote: I liked it too. Seems like I have to show some real code, and my apologies for a long e-mail.
Well, that's what I'm actually trying to do and what I've managed to
achieve so far.
> module Ask where
> import Control.Monad
> import Data.IORef
> import Data.Maybe
> import System.IO.Unsafe -- Yes, I would need that
> import System.Mem
> import System.Mem.Weak

Suppose we're writing a RESTful server, which keeps all state on the client. We want it to send questions to the client and get back the answers. We can also send some state data to the client and we are sure that the client would return them back unchanged. So, that's the type of our server:
> type Server medium = Maybe (medium, String) -> IO (Maybe (medium,
String))
The client starts interaction, sending "Nothing"; when server
finally returns "Nothing", the session stops. I would emulate the
client in GHCi with the following function:
> client :: Show medium => Bool -> Server medium -> IO ()
> client debug server = client' Nothing where
>     client' query =
>         do response <- server query
>            case response of
>              Nothing -> return ()
>              Just (medium, question) ->
>                  do when debug $ putStr "Debug: " >> print medium
>                     putStrLn question
>                     putStr "--> "
>                     answer <- getLine
>                     client' $ Just (medium, answer)

Nothing very interesting. The only thing to note is that the boolean argument allows us to see the state sent by server - for debugging purposes.
But I want something more high-level. The goal is to solve the
Graham's "arc challenge". So the high-level interface is implemented
by this type:
> data Ask a = Finish a | Continue String (String -> Ask a)

A possible application would be:

> test :: Ask ()
> test =
>     do s1 <- ask "Enter the first number"
>        let n1 = read s1
>        s2 <- ask "Enter the second number"
>        let n2 = read s2
>        ask $ show $ n1 + n2
>        return ()

Well, to do that, I need a Monad instance for Ask:

> instance Monad Ask where
>     return = Finish
>     Finish x >>= h = h x
> Continue question k >>= h = Continue question $ \answer -> k answer >>= h
and an "ask" function:

> ask :: String -> Ask String
> ask question = Continue question $ \answer -> return answer

Now, the problem is with the route from high level to the low level. We can do it relatively simply:
> simpleServer :: Ask () -> Server [String]
> simpleServer anything = return . simpleServer' anything . maybe [] (\(medium, answer) -> medium ++ [answer]) where
>     simpleServer' (Finish ()) _ = Nothing
>     simpleServer' (Continue question _) [] = Just ([], question)
>     simpleServer' (Continue _ k) (s : ss) =
>         do (medium, question) <- simpleServer' (k s) ss
>            return (s : medium, question)

And indeed, our test example works:

*Ask> client False $ simpleServer test
Enter the first number
--> 3
Enter the second number
--> 4
7
-->

But, as you've probably guessed already, this "simpleServer" sends way too much information over network:
> nonsense :: Ask ()
> nonsense =
>     do a <- ask "?"
>        b <- ask a
>        c <- ask b
>        d <- ask c
>        e <- ask b -- Note this "b" instead of "d", it's deliberate.
>        return ()

*Ask> client True $ simpleServer nonsense
Debug: []
?
--> a
Debug: ["a"]
a
--> b
Debug: ["a","b"]
b
--> c
Debug: ["a","b","c"]
c
--> d
Debug: ["a","b","c","d"]
b
--> e

All user answers from the very beginning get transmitted.

I want to transmit only those answers that are actually used. I'm gonna write another server, a (relatively) smart one.
A simple boxing type:

> data Box a = Box {unBox :: a}

Now, the server. I use "Nothing" for those user answers that are no longer needed.
> smartServer :: Ask () -> Server [Maybe String]
> smartServer anything = smartServerIO anything . maybe [] (\ (medium, answer) -> medium ++ [Just answer]) where
>     smartServerIO anything query =

Sometimes a value is not used anymore just because it WAS used already. So I'll use a mutable list to keep track of those values that are actually dereferenced:
>         do usedList <- newIORef []

Then, I'll make weak pointers for all user answers:

>            let boxedQuery = map Box query
> weakPtrs <- mapM (\box -> mkWeak box () Nothing) boxedQuery
and actually run my "Ask" value.

>            case findWeakPtrs 0 anything boxedQuery usedList of
>              Finish () -> return Nothing
>              Continue question rest ->

Now, the interesting part. I'd invoke a garbage collector to get rid of those boxes that aren't necessary anymore.
>                  do performGC

Now, weak pointers tell me what boxes are not dereferenced but can be dereferenced in the future, and my mutable list has the numbers of boxes that were dereferenced.
>                     danglingNumbers <- mapM deRefWeak weakPtrs
>                     usedNumbers <- readIORef usedList

I also need to keep this "future" alive, so that usable boxes don't get garbage collected.
>                     case rest undefined of _ -> return ()

And now all I need to do is to filter out those boxes that are not in any of the two lists:
> return $ Just (zipWith3 (isUsed usedNumbers)
[0..] danglingNumbers query, question)
Helper function has an interesting detail:

>     findWeakPtrs _ (Finish ()) _ _ = Finish ()
>     findWeakPtrs _ response [] _ = response

I use unsafePerformIO to track those boxes that get actually dereferenced:
>     findWeakPtrs n (Continue _ k) (s : ss) usedList =
> findWeakPtrs (n+1) (k (unsafePerformIO $ modifyIORef usedList (n :) >> return (fromJust $ unBox s))) ss usedList
A filtering function is not very interesting:

> isUsed usedNumbers index Nothing _ | not (index `elem` usedNumbers) = Nothing
>     isUsed _ _ _ answer = answer

And it works:

*Ask> client True $ smartServer nonsense
Debug: []
?
--> a
Debug: [Just "a"]
a
--> b
Debug: [Nothing,Just "b"]
b
--> c
Debug: [Nothing,Just "b",Just "c"]
c
--> d
Debug: [Nothing,Just "b",Nothing,Nothing]
b
--> e

So, the problem is, I don't think calling performGC every time is a good idea. I'd like to tell explicitly what values are to be garbage collected. It doesn't seem like it's possible with current version of GHC, though.

On 7 Jan 2010, at 23:07, Edward Kmett wrote:

Thats a shame, I was rather fond of this monad. To get that last 'True', you'll really have to rely on garbage collection to approximate reachability for you, leaving the weak reference solution the best you can hope for.
-Edward Kmett

On Thu, Jan 7, 2010 at 12:39 PM, Miguel Mitrofanov <[email protected] > wrote: Damn. Seems like I really need (True, False, True) as a result of "test".

On 7 Jan 2010, at 08:52, Miguel Mitrofanov wrote:

Seems very nice. Thanks.

On 7 Jan 2010, at 08:01, Edward Kmett wrote:

Here is a slightly nicer version using the Codensity monad of STM.

Thanks go to Andrea Vezzosi for figuring out an annoying hanging bug I was having.
-Edward Kmett

{-# LANGUAGE Rank2Types, GeneralizedNewtypeDeriving, DeriveFunctor #-}
module STMOracle
 ( Oracle, Ref
 , newRef, readRef, writeRef, modifyRef, needRef
 ) where

import Control.Applicative
import Control.Monad
import Control.Concurrent.STM

instance Applicative STM where
 pure = return
 (<*>) = ap

newtype Ref s a = Ref (TVar (Maybe a))
newtype Oracle s a = Oracle { unOracle :: forall r. (a -> STM r) -> STM r } deriving (Functor)
instance Monad (Oracle s) where
 return x = Oracle (\k -> k x)
 Oracle m >>= f = Oracle (\k -> m (\a -> unOracle (f a) k))

mkOracle m = Oracle (m >>=)

runOracle :: (forall s. Oracle s a) -> IO a
runOracle t = atomically (unOracle t return)

newRef :: a -> Oracle s (Ref s a)
newRef a = mkOracle $ Ref <$> newTVar (Just a)

readRef :: Ref s a -> Oracle s a
readRef (Ref r) = mkOracle $ do
 m <- readTVar r
 maybe retry return m

writeRef :: a -> Ref s a -> Oracle s a
writeRef a (Ref r) = mkOracle $ do
 writeTVar r (Just a)
 return a

modifyRef :: (a -> a) -> Ref s a -> Oracle s a
modifyRef f r = do
 a <- readRef r
 writeRef (f a) r

needRef :: Ref s a -> Oracle s Bool
needRef (Ref slot) = Oracle $ \k ->
          (writeTVar slot Nothing >> k False)
 `orElse` k True

-- test case : refMaybe b dflt ref = if b then readRef ref else return dflt
refIgnore ref = return "blablabla"
refFst ref = fst `fmap` readRef ref
test = do
  a <- newRef "x"
  b <- newRef 1
  c <- newRef ('z', Just 0)
  -- no performLocalGC required
  x <- needRef a
  y <- needRef b
  z <- needRef c
  u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
                        -- but it won't be known until runtime.
  w <- refIgnore b
  v <- refFst c
  return (x, y, z)



On Wed, Jan 6, 2010 at 10:28 PM, Edward Kmett <[email protected]> wrote: I don't believe you can get quite the semantics you want. However, you can get reasonably close, by building a manual store and backtracking.
{-# LANGUAGE Rank2Types #-}
-- lets define an Oracle that tracks whether or not you might need the reference, by backtracking.
module Oracle
 ( Oracle, Ref
 , newRef, readRef, writeRef, modifyRef, needRef
 ) where

import Control.Applicative
import Control.Arrow (first)
import Control.Monad
import Data.IntMap (IntMap)
import qualified Data.IntMap as M
import Unsafe.Coerce (unsafeCoerce)
import GHC.Prim (Any)

-- we need to track our own worlds, otherwise we'd have to build over ST, change optimistically, and track how to backtrack the state of the Store. Much uglier. -- values are stored as 'Any's for safety, see GHC.Prim for a discussion on the hazards of risking the storage of function types using unsafeCoerce as anything else.
data World s = World { store :: !(IntMap Any), hwm :: !Int }

-- references into our store
newtype Ref s a = Ref Int deriving (Eq)

-- our monad that can 'see the future' ~ StateT (World s) []newtype Oracle s a = Oracle { unOracle :: World s -> [(a, World s)] }
-- we rely on the fact that the list is always non-empty for any
oracle you can run. we are only allowed to backtrack if we thought
we wouldn't need the reference, and wound up needing it, so head
will always succeed.
runOracle :: (forall s. Oracle s a) -> a
runOracle f = fst $ head $ unOracle f $ World M.empty 1


instance Monad (Oracle s) where
 return a = Oracle $ \w -> [(a,w)]
 Oracle m >>= k = Oracle $ \s -> do
     (a,s') <- m s
     unOracle (k a) s'

-- note: you cannot safely define fail here without risking a crash in runOracle -- Similarly, we're not a MonadPlus instance because we always want to succeed eventually.
instance Functor (Oracle s) where
 fmap f (Oracle g) = Oracle $ \w -> first f <$> g w

instance Applicative (Oracle s) where
 pure = return
 (<*>) = ap

-- new ref allocates a fresh slot and inserts the value into the store. the type level brand 's' keeps us safe, and we don't export the Ref constructor.
newRef :: a -> Oracle s (Ref s a)
newRef a = Oracle $ \(World w t) ->
 [(Ref t, World (M.insert t (unsafeCoerce a) w) (t + 1))]

-- readRef is the only thing that ever backtracks, if we try to read a reference we claimed we wouldn't need, then we backtrack to when we decided we didn't need the reference, and continue with its value.
readRef :: Ref s a -> Oracle s a
readRef (Ref slot) = Oracle $ \world ->
maybe [] (\a -> [(unsafeCoerce a, world)]) $ M.lookup slot (store world)
-- note, writeRef dfoesn't 'need' the ref's current value, so
needRef will report False if you writeRef before you read it after
this.
writeRef :: a -> Ref s a -> Oracle s a
writeRef a (Ref slot) = Oracle $ \world ->
[(a, world { store = M.insert slot (unsafeCoerce a) $ store world })]
{-
-- alternate writeRef where writing 'needs' the ref.
writeRef :: a -> Ref s a -> Oracle s a
writeRef a (Ref slot) = Oracle $ \World store v -> do
 (Just _, store') <- return $ updateLookupWithKey replace slot store
 [(a, World store' v)]
 where
 replace _ _ = Just (unsafeCoerce a)
-}

-- modifying a reference of course needs its current value.
modifyRef :: (a -> a) -> Ref s a -> Oracle s a
modifyRef f r = do
 a <- readRef r
 writeRef (f a) r

-- needRef tries to continue executing the world without the element in the store in question. if that fails, then we'll backtrack to here, and try again with the original world, and report that the element was in fact needed.
needRef :: Ref s a -> Oracle s Bool
needRef (Ref slot) = Oracle $ \world ->
 [ (False, world { store = M.delete slot $ store world })
 , (True, world)
 ]

-- test case:
refMaybe b dflt ref = if b then readRef ref else return dflt
refIgnore ref = return "blablabla"
refFst ref = fst <$> readRef ref
test = do
  a <- newRef "x"
  b <- newRef 1
  c <- newRef ('z', Just 0)
  -- no performLocalGC required
  x <- needRef a
  y <- needRef b
  z <- needRef c
  u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
                        -- but it won't be known until runtime.
  w <- refIgnore b
  v <- refFst c
  return (x, y, z)

-- This will disagree with your desired answer, returning:

*Oracle> runOracle test
Loading package syb ... linking ... done.
Loading package array-0.2.0.0 ... linking ... done.
Loading package containers-0.2.0.1 ... linking ... done.
(False,False,True)

rather than (True, False, True), because the oracle is able to see into the future (via backtracking) to see that refMaybe doesn't use the reference after all.
This probably won't suit your needs, but it was a fun little exercise.

-Edward Kmett

On Wed, Jan 6, 2010 at 4:05 PM, Miguel Mitrofanov <[email protected] > wrote:
On 6 Jan 2010, at 23:21, Edward Kmett wrote:

You probably just want to hold onto weak references for your 'isStillNeeded' checks.
That's what I do now. But I want to minimize the network traffic, so
I want referenced values to be garbage collected as soon as possible
- and I couldn't find anything except System.Mem.performIO to do the
job - which is a bit too global for me.
Otherwise the isStillNeeded check itself will keep you from garbage
collecting!
Not necessary. What I'm imagining is that there is essentially only
one way to access the value stored in the reference - with readRef.
So, if there isn't any chance that readRef would be called, the
value can be garbage collected; "isStillNeeded" function only needs
the reference, not the value.
Well, yeah, that's kinda like weak references.


http://cvs.haskell.org/Hugs/pages/libraries/base/System-Mem-Weak.html

-Edward Kmett

On Wed, Jan 6, 2010 at 9:39 AM, Miguel Mitrofanov <[email protected] > wrote:
I'll take a look at them.

I want something like this:

refMaybe b dflt ref = if b then readRef ref else return dflt
refIgnore ref = return "blablabla"
refFst ref =
do
 (v, w) <- readRef ref
 return v
test =
do
 a <- newRef "x"
 b <- newRef 1
 c <- newRef ('z', Just 0)
 performLocalGC -- if necessary
 x <- isStillNeeded a
 y <- isStillNeeded b
 z <- isStillNeeded c
 u <- refMaybe y "t" a -- note that it wouldn't actually read "a",
                       -- but it won't be known until runtime.
 w <- refIgnore b
 v <- refFst c
 return (x, y, z)

so that "run test" returns (True, False, True).


Dan Doel wrote:
On Wednesday 06 January 2010 8:52:10 am Miguel Mitrofanov wrote:
Is there any kind of "ST" monad that allows to know if some STRef is no
longer needed?

The problem is, I want to send some data to an external storage over a
network and get it back later, but I don't want to send unnecessary data.
I've managed to do something like that with weak pointers,
System.Mem.performGC and unsafePerformIO, but it seems to me that invoking
GC every time is an overkill.

Oh, and I'm ready to trade the purity of runST for that, if necessary.

You may be able to use something like Oleg's Lightweight Monadic Regions to get this effect. I suppose it depends somewhat on what qualifies a reference as "no longer needed".
http://www.cs.rutgers.edu/~ccshan/capability/region-io.pdf

I'm not aware of anything out-of-the-box that does what you want, though.
-- Dan
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe



_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe


_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

<Prev in Thread] Current Thread [Next in Thread>