Matrix multiplication memory schedule generator
[Presentation] [Papers] [Code] [Contacts]

Jean-Guillaume Dumas
[Papers] [Code] [Contacts]

Galet is a brute force search algorithm, similar to the pebble game of Huss-Lederman et al. 1996, for memory schedules of matrix multiplication, and in particular of Strassen-Winograd's algorithms.

A sequence of computations is represented as a directed graph.

A node represents a program variable. The nodes can be classified as initials (when they correspond to inputs), temporaries (for intermediate computations) or finals (results or nodes that we want to keep, such as ready-only inputs).

The arcs represent the operations; they point from the operands to the result.

A pebble represents an allocated memory. We can put pebbles on any nodes, move or remove them according to a set of simple rules shown below.
When a pebble arrives on a node, the computation at the associated variable starts, and can be ``partially'' or ``fully'' executed.
If not specified, it is assumed that the computation is fully executed.

Arcs can be removed, when the corresponding operation has been computed.

These two rules are especially useful for accumulation operations: for example, it is possible to try schedule the multiplication separately from the addition in an otherwise recursive AB+C call; the multiplication-involved arcs would then be removed first and the accumulated part later.
It is also useful if we do not want to fix the way some additions are performed: if U_3 = P_1 + P_6  + P_7 the associativity allows different ways of computing the sum and we let the program explore these possibilities.

At the beginning of the exploration, each initial node has a pebble and we may have a few extra available pebbles.
The program then tries to apply the following rules, in order, on each node. The program stops when every final node has a pebble or when there is no more allowed move:
  1. Computing a result/removing arcs. If a node has a pebble and parents with pebbles, then the operation can be performed and the corresponding arcs removed. The node is then at least partially computed.
  2. Freeing some memory/removing a pebble. If a node is isolated and not final, its pebble is freed. This means that we can reclaim the memory here because this node has been fully computed (no arcs pointing to it) and is no longer in use as an operand (no arcs initiating from it).
  3. Computing in place/moving a pebble. If a node P has a full pebble and a single empty child node S and if other parents of S have pebbles on them, then the pebble on P may be transferred  to S (corresponding arcs are removed). This means an operation has been made in place in the parent P's pebble.
  4. Using more memory/adding a pebble. If parents of an empty node N have pebbles and a free pebble is available, then N can be assigned this pebble and the corresponding arcs are removed. This means that the operation is computed in a new memory location.
  5. Copying some memory/duplicating a pebble. A computed node having a pebble can be duplicated. The arcs pointed to or from the original node are then rearranged between them. This means that a temporary result has been copied into some free place to allow more flexibility.

Jean-Guillaume Dumas
[Presentation] [Code] [Contacts]

For further informations about this work, have a look at:
Jean-Guillaume Dumas
[Source Code]
[Presentation] [Papers] [Contacts]

Latest version: