Commit 934a4deb authored by Christoph Ruegge's avatar Christoph Ruegge
Browse files

mention repa's fromList

parent ceb7c971
%% Cell type:markdown id: tags:
# Parallel computed arrays
## parallel and concurrent programming
more than one thread doing something in parallel.
**Parallel** <br />
- goal: faster calculations
- for deterministic functions
- example: matrix multiplication
**concurrent** <br />
- structuring of a program
- for non-deterministic tasks
- example: client/server
%% Cell type:markdown id: tags:
## parall arrays with Repa
- generalized regular arrays with built-in parallel handling.
basic datatype is `Array`
```haskell
data Array rep sh e
```
- e : datatype of the array elements
- sh : shape of the array
- rep : type index or representation
*Shapes:*
constructors : <br/>
- `Z` : Zero dimension <br />
- `(:.)` : add a dimension. right associative
```haskell
Z :. Int :. Int == (Z :. Int) :. Int
```
the dimension can then be defined as
```haskell
type DIM0 = Z
type DIM1 = DIM0 :. Int
...
```
*representations*
- manifest (real data)
- delayed (functions that compute elements)
- meta (special and mixed types)
internally Repa stores the data in a one-dimensional vector
### Manifest Representations (real data)
U -- Adaptive unboxed vectors.
V -- Boxed vectors.
**Boxed** : array values are ordinary Haskell (lazy) values, which are evaluated on demand, and can even contain bottom (undefined) values.
**Unboxed** : are arrays only containing plain values.
%% Cell type:markdown id: tags:
**fromListUnboxed** <br />
create unboxed array from a list
```haskell
fromListUnboxed :: (Shape sh, Unbox a) => sh -> [a] -> Array U sh a
```
**fromList** <br />
variant that is polymorphic in the output representation (defined in `Data.Array.Repa.Eval`)
```haskell
fromList :: (Target r e, Shape sh) => sh -> [e] -> Array r sh e
```
%% Cell type:code id: tags:
``` haskell
import Data.Array.Repa as R
a = fromListUnboxed Z [1] :: Array U DIM0 Int -- a scalar of an int
a
```
%%%% Output: display_data
%% Cell type:code id: tags:
``` haskell
b = fromListUnboxed (Z :. 3) [1,2,3] :: Array U DIM1 Double -- a 3 element vector with doubles
b
```
%%%% Output: display_data
%% Cell type:code id: tags:
``` haskell
c = fromListUnboxed (Z :. 2 :. 4) [1,2..8] :: Array U DIM2 Double -- a (2,4)-matrix with doubles
c
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
**(!)** : <br />
index operator
```haskell
(!) :: (Shape sh, Source r e) => Array r sh e -> sh -> e
```
%% Cell type:code id: tags:
``` haskell
a ! Z
b ! (Z :. 1)
c ! (Z :. 1 :. 2)
```
%%%% Output: display_data
%%%% Output: display_data
%%%% Output: display_data
%% Cell type:markdown id: tags:
**reshape** <br />
change shape of an array (this is cheap, because the internal representation is the same)
```haskell
reshape :: (Shape sh1, Shape sh2, Source r1 e) => sh2 -> Array r1 sh1 e -> Array D sh2 e
```
*example:* because it creates a delayed array, we access one element
%% Cell type:code id: tags:
``` haskell
reshape (Z :. 4 :. 2) c ! (Z :. 1 :. 0 :: DIM2)
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
**rank** <br />
number of dimensions
```haskell
rank :: Shape sh => sh -> Int
```
%% Cell type:markdown id: tags:
**size** <br />
number of elements
```haskell
size :: Shape sh => sh -> Int
```
%% Cell type:markdown id: tags:
**extent** <br />
get the shape of an array
```haskell
extent :: (Shape sh, Source r e) => Array r sh e -> sh
```
since rank and size work on shapes we use extent often
%% Cell type:code id: tags:
``` haskell
rank (extent c)
size (extent c)
```
%%%% Output: display_data
%%%% Output: display_data
%% Cell type:markdown id: tags:
### Delayed Representations and computations
D -- Functions from indices to elements.
C -- Cursor functions.
these are arrays which are not yet calculated. They are calculated with specialised functions for sequentiel or parallel computation. This has the advantage of
- beeing able to exactly choose when and how you calculate
- intermediate arrays are not calculated (*fusion*)
*fusion* means, if you have several operations on some data, in the end the combined operations are used in parallel and there will be no intermediate arrays (which is fast and saves memory)
%% Cell type:markdown id: tags:
**fromFunction** <br />
create delayed arrays
```haskell
fromFunction :: sh -> (sh -> a) -> Array D sh a
```
%% Cell type:code id: tags:
``` haskell
d = fromFunction (Z :. 5) (\(Z:.i) -> i :: Int)
d ! (Z:.1)
:ty d
```
%%%% Output: display_data
%%%% Output: display_data
%% Cell type:markdown id: tags:
or more often with maps oder zips:
```haskell
map :: (Shape sh, Source r a) => (a -> b) -> Array r sh a -> Array D sh b
zipWith :: (Shape sh, Source r1 a, Source r2 b) => (a -> b -> c) -> Array r1 sh a -> Array r2 sh b -> Array D sh c
```
%% Cell type:code id: tags:
``` haskell
e = R.map (+1) d
```
%% Cell type:markdown id: tags:
Evaluation of delayed Arrays:
**computeS** <br />
sequentiel
```haskell
computeS :: (Load r1 sh e, Target r2 e) => Array r1 sh e -> Array r2 sh e
```
**computeP** <br />
parallel
```haskell
computeP :: (Monad m, Load r1 sh e, Target r2 e) => Array r1 sh e -> m (Array r2 sh e)
```
the monad ensures that there is no nested parallel computation (which doesn't work)
<!--Einfachste Möglichkeit: Benutze Identitätsmonade aus `Control.Monad.Identity`-->
*example of sequential calculations*
%% Cell type:code id: tags:
``` haskell
computeS e :: Array U DIM1 Int
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
Some folds which directly create calculated (unboxed) arrays
```haskell
foldS :: (Shape sh, Source r a, Unbox a, Elt a) => (a -> a -> a) -> a -> Array r (sh :. Int) a -> Array U sh a
foldP :: (Monad m, Shape sh, Source r a, Unbox a, Elt a) => (a -> a -> a) -> a -> Array r (sh :. Int) a -> m (Array U sh a)
```
Important: for `foldP` the operator must be associative
*example adding up all rows*
%% Cell type:code id: tags:
``` haskell
foldP (+) 0 c
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
### numeric operations
%% Cell type:markdown id: tags:
*basic elementwise operations*
```haskell
*^ :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c
+^ :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c
-^ :: (Num c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c
/^ :: (Fractional c, Shape sh, Source r1 c, Source r2 c) => Array r1 sh c -> Array r2 sh c -> Array D sh c
```
%% Cell type:code id: tags:
``` haskell
computeP (c +^ c) :: IO (Array U DIM2 Double)
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
*matrix operations* from `Data.Array.Repa.Algorithms.Matrix`
%% Cell type:markdown id: tags:
**mmult** <br />
matrix multiplication
```haskell
mmultS :: Array U DIM2 Double -> Array U DIM2 Double -> Array U DIM2 Double
mmultP :: Monad m=> Array U DIM2 Double -> Array U DIM2 Double -> m (Array U DIM2 Double)
```
%% Cell type:code id: tags:
``` haskell
import Data.Array.Repa.Algorithms.Matrix
mmultP c c
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
**trace2** <br />
get the trace
```haskell
trace2S :: Array U DIM2 Double -> Double
trace2P :: Monad m => Array U DIM2 Double -> m Double
```
**transpose2** <br />
transpose the matrix
```haskell
transpose2S :: Array U DIM2 Double -> Array U DIM2 Double
transpose2P :: Monad m => Array U DIM2 Double -> m (Array U DIM2 Double)
```
%% Cell type:markdown id: tags:
## Example: Mandelbrot fractal
a point $c$ in the complex plane is in the generalised *mandelbrot* set, if and only if the sequence ${z_n},{n \in \mathbb{N}}$ with
$$z_0 = 0$$
$$z_{n+1} = f_c(z_n)$$
is finite.
For the classical *mandelbrot* set $f$ is given by $$f_c(z) = z^2 + c$$
%% Cell type:code id: tags:
``` haskell
:ext FlexibleContexts
import Data.Array.Repa as R
import Data.Array.Repa.Algorithms.Complex
import Data.List (findIndex)
import Data.Packed.Repa
import Numeric.Container (saveMatrix)
-- checks if sequence is infinite (and then returns the number of iterations it needed to determine that)
isGeneralMandel :: (Complex -> Complex -> Complex) -> Int -> Double -> Complex -> Maybe Int
isGeneralMandel f iteration_depth bound c = findIndex ((>bound) . mag) (take iteration_depth (iterate (f c) 0))
-- checks if given number is in mandelbrot set
isMandel :: Int -> Double -> Complex -> Maybe Int
isMandel = isGeneralMandel (\c z -> z*z + c)
-- grid: values in complex plane
grid :: (Int, Int) -> DIM2
grid (x, y) = (Z :. x) :. y
-- calculates the mesh of points in complex plane
calcView :: (Complex, Complex) -> (Int, Int) -> DIM2 -> Complex
calcView ((left, bottom), (right, top)) (max_x, max_y) (Z :. x :. y) = ((right - left) * (fromIntegral x)/(fromIntegral max_x - 1) + left, (top - bottom) * (fromIntegral y)/(fromIntegral max_y - 1) + bottom)
size = (1024,1024)
view = ((-1.5, -1.5),(1.5,1.5))
-- parallel computation for the grid
coord <- computeP $ fromFunction (grid size) ( calcView view size) :: IO (Array U DIM2 Complex)
-- parallel computation for the mandelbrot fractal in every point
zmat <- computeP $ R.map ( maybe 0 fromIntegral . isMandel 35 3) coord :: IO (Array U DIM2 Double)
-- write out matrices to visualize with python
xmat = copyS $ R.map fst coord
ymat = copyS $ R.map snd coord
saveMatrix "xmat.txt" "%f" (repaToMatrix xmat)
saveMatrix "ymat.txt" "%f" (repaToMatrix ymat)
zmats = copyS zmat
saveMatrix "zmat.txt" "%f" (repaToMatrix zmats)
-- ../vis.py -x xmat.txt -y ymat.txt -z zmat.txt -t image
```
%%%% Output: display_data
%% Cell type:code id: tags:
``` haskell
import Data.Array.Repa.Algorithms.ColorRamp
import Data.Array.Repa.Algorithms.Pixel
import Data.Array.Repa.IO.BMP (writeImageToBMP)
(writeImageToBMP "test.bmp" . runIdentity . computeP) $ R.map ( rgb8OfFloat. rampColorHotToCold 0 35 . maybe 0 fromIntegral . isMandel 35 3) coord
```
%%%% Output: display_data
%% Cell type:markdown id: tags:
<img src="../images/mandel.png"></img>
%% Cell type:markdown id: tags:
doing it in the ihaskell notebook don't run it in parallel.. We need to compile it
## actually running the program in parallel
At compilation we need to at the command-line option `-threaded` and `-rtsopts` which enable multiple threads and the ability to have runtime-options respectively.
For example, for the mandelbrot program we do
```
ghc -Odph -rtsopts -threaded -O3 mandelbrot.lhs
```
At runtime we specify with how many threads we will run the program. This is done with the command-line option `-Nx` while `x` is the number of threads.
Again for example run the program
```
./mandelbrot +RTS -N2 -s -RTS
```
and compare the runtime to `-N1`
```
1 thread: Total time 0.875s ( 0.867s elapsed)
2 threads: Total time 0.964s ( 0.607s elapsed)
```
This is not perfect but good enough for this quick program.
## tool for monitoring parallel processes: threadscope
*threadscope* is a program which helps visualizing what is happening during a parallel calculation.
For this we need to compile it with the fla `-eventlog` option which enables creating a *eventlog* which threadscope uses to create its report.
For example, for the mandelbrot program we do
```
ghc -Odph -rtsopts -threaded -O3 -eventlog mandelbrot.lhs
./mandelbrot +RTS -N2 -s -l -RTS
```
then there is a file `mandelbrot.eventlog` and we can run threadscope with it:
```
threadscope mandelbrot.eventlog
```
The result is:
<img src="../images/threadscope.png"></img>
here we can see, that most of the time is spend in a sequential writing the matrix to the drive but the calculation itself is fine in parallel.
%% Cell type:markdown id: tags:
### general way to parallelize your program
To get the most out of parallelization you need to consider several things. This is out of the scope here but we cover some simple advices for the usage with repa.
- 1) think about what parts of you program are trivially parallizable (calculations which do not dependent on each other)
- 2) parallelize them with repa
- 3) think of parts of your program which you can transform into trivially parallelize parts. This usually involves splitting up your algorithm.
Doing especially 3) can be hard. The next step after 3) would be to parallelize over several computers. This has the additional complexity that computers don't share memory and you have to transfer this memory between the computing nodes. This is its own lecture (using for example MPI, the Message Passing Interface).
%% Cell type:code id: tags:
``` haskell
```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment