eriksfunhouse.com

Where the fun never stops!

A Most Peculiar Matrix Operator

January 18, 2025 Long

I recently picked up LCL and dynamical systems theory again (see On Reasoning about Dynamical Systems) and I thought I would write a little bit about one of the more interesting and peculiar operators I came across - what I call the Matrix Convolution operator.

Matrix Convolution

Firstly, the name Convolution is probably not the best name, since a quick googling of the term β€œMatrix Convolution” shows that it is already used for quite a few different thing. Maybe a better term would be Generalised Application.

However, at the time of naming I was too lazy to google it and when I finally realised my erroneous ways I was too lazy to change it. Also, the name is nice - so it stays.

So I call it the Matrix Convolution operator and it goes a little something like this:

\[(a \star_M b)[n] = \sum_{i=0}^{d(M)} (\sum_{v_1 \in \mathbb{N}^{i+1}, |v_1| = n_1} \cdots \sum_{v_M \in \mathbb{N}^{i+1}, |v_M| = n_M+i,v_M(1)=i} \cdots \sum_{v_N \in \mathbb{N}^{i+1}, |v_N| = n_N}\\ a[ v_1(1), ... , v_N(1) ] \prod_{j=2}^{i + 1} b[ v_1(j), ... , v_N(j) ])\]

where:

  • \(a\), \(b\) are \(N\)-dimensional matrices of similar size
  • \(M \in [0 , … , N]\)
  • \(d(M)\) is the size of \(a\) and \(b\) in dimension \(M\)
  • \(n_i\) is the \(i\)’th coordinate of \(n \in \mathbb{N}^N\)
  • \( |v_k| = \sum_{l = 1}^{i+1} v_k(l)\)

So what is this operator and how did it come about?

I define this operator part of the semantics of LCL, where it is used to compute function application and evolving boundary conditions.

The idea of the operator is to embed one matrix \(b\) into another matrix \(a\) for dimension \(M\). This is almost a type of projection of \(a\) with regards to \(b\) for dimension \(M\). Note that this is quite different from something like the Kronecker product, since the dimensions of the resulting matrix are equal to those of \(a\) and not the product of the dimensions, i.e. the dimensions of \(b\) are not present in the resulting matrix except in the actual computation of it.

For each entry \(n\) of the resulting matrix the operator will sum from 0 to the size, or degree, of \(a\) and \(b\)(remember they are equal in size) in dimension \(M\). Each term of the sum will then be made up of an inner sum of products over the elements of \(a\) and \(b\), where each product is made up of exactly one element of \(a\) and \(i\) elements of \(b\)

The inner sequence of sums is meant to iterate over all possible ways of distributing each of the coordinates \(n_k\), viewed simply as a number, across \(i + 1\) vectors, except for the \(M\)-coordinate, which instead must distribute \(n_M + i\) for all but the the first entry.

Example

This might sound a little bit complicated, so to illustrate let us consider the case \(N = 2\), \(n = [ 1 , 2 ]\), \(d(1) = d(2) = 2\) and \(M = 1\).

In this case the inner sums will for each \(i\) produce the following resulting vectors:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   i   β”‚      v1       β”‚       v2       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   0   β”‚     [0]       β”‚      [2]       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   1   β”‚    [1,1]      β”‚     [0,2]      β”‚
β”‚       β”‚               β”‚     [1,1]      β”‚
β”‚       β”‚               β”‚     [2,0]      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   2   β”‚   [2,1,0]     β”‚    [0,0,2]     β”‚
β”‚       β”‚               β”‚    [0,1,1]     β”‚
β”‚       β”‚               β”‚    [0,2,0]     β”‚
β”‚       β”‚               β”‚    [1,1,0]     β”‚
β”‚       β”‚               β”‚    [1,0,1]     β”‚
β”‚       β”‚               β”‚    [2,0,0]     β”‚
β”‚       β”‚   [2,0,1]     β”‚    [0,0,2]     β”‚
β”‚       β”‚               β”‚    [0,1,1]     β”‚
β”‚       β”‚               β”‚    [0,2,0]     β”‚
β”‚       β”‚               β”‚    [1,1,0]     β”‚
β”‚       β”‚               β”‚    [1,0,1]     β”‚
β”‚       β”‚               β”‚    [2,0,0]     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

which then translates into the following total sum:

\[(a \star_1 b)[1, 2] = a[0, 2] \\ \\ a[1, 0] b[1, 2] + a[1, 1] b[1, 1] + a[1, 2] b[1, 0] \\ \\ a[2, 0] b[1, 0] b[0, 2] + a[2, 0] b[1, 1] b[0, 1] + a[2, 0] b[1, 2] b[0, 0] \\ \\ a[2, 1] b[1, 1] b[0, 0] + a[2, 1] b[1, 0] b[0, 1] + a[2, 2] b[1, 0] b[0, 0] \\ \\ a[2, 0] b[0, 0] b[1, 2] + a[2, 0] b[0, 1] b[1, 1] + a[2, 0] b[0, 2] b[1, 0] \\ \\ a[2, 1] b[0, 1] b[1, 0] + a[2, 1] b[0, 0] b[1, 1] + a[2, 2] b[0, 0] b[1, 0]\]

Note from the total sum that we are in fact iterating over all possible ways of distributing \([i+n_1, n_2]\) over a product which either has 0, 1 or 2 elements from \(b\), a single element from \(a\) and with the first coordinate of the \(a\) element fixed to be equal to \(i\).

The use case

As you can probably see, this has an obvious usage when \(a\) and \(b\) are viewed as matrix representations of \(N\)-dimensional polynomials. In this case, the operator simply equates to partial-application, or substitution, of \(b\) into dimension \(M\) of \(a\). So if \(T(a) = f, T(b) = g\) are the mappings of the matrices into polynomials then \(T(a \star_1 b) = f(g(x,y),y)\).

Of particular interest to me is the case where a set of boundary conditions of an system of differential equations, governed by a time variable \(t\), are expressed as polynomials, either strictly or by approximation. A polynomial approximation of the evolution of the system wrt \(t\) may then be computed and using this we may evolve the system from a starting time \(t_0\) and state \(s_0\) to a future time \(t_1\) and state \(s_1\), by any of a number of algebraic schemes.

The standard approach to then compute another future state for time \(t_2 > t_1\) is to use the original approximation applied to the state \(s_1\) to derive the new state \(s_2\).

However, another way to do this would be to evolve the original boundary conditions, using the approximated system, and then compute a new approximation to derive \(s_2\) from \(s_1\) with, using the evolved boundary conditions. So, in a sense, we are re-basing the approximation at the future time \(t_1\).

The hope is that this will give a slightly better approximation by controlling some of the approximation noise. In the deterministic case, this is indeed the case, but only with a very small advantage to the re-basing method. So my hopes are with the stochastic cases, where I hope to prove that this method significantly reduces the build-up of variance noise - alas much work to do on this before any such proof can be obtained.

Implementation

So this is where it gets really interesting! Whenever you see something like this it always screams out for a functional approach ala stream-fusion. However, in this case we are dealing with an operator that does not only need substantial memory, regarding the past, but also knowledge of the future, at least if you take the standard matrix \( \Leftrightarrow \) stream view, and this breaks the clean functional approach that you can achieve with some of the other simpler operators.

So here is some pythonic code describing how I have done it - one of the advantages of this approach is that it neatly splits into parallel computations along the terms of the outer sum. This approach do remove any re-using of computations, but I don’t think there is much re-using to be done in any case.

def do_dimension(N, ii, i, v, k):
    # Base case: when N reaches 0
    if N == 0:
        r = a[v[0]]
        for j in range(1, ii):
            r *= b[v[j]]
        return r
    
    # Handle dimension recursion
    if i == 0:
        return do_dimension(N-1, ii, ii, v, n[N-1])
    
    # Special cases for i=1 or i=2 and N=M+1
    if (i == 1) or (i == 2 and N == M+1):
        v[i - 1][N] += k
        if i == 2:
            v[0][M] += (ii-1)
        return do_dimension(N, ii, i-1, v, 0)
    
    # Main recursion case
    sum = 0.0
    for j in range(k + 1):
        v[i][N] = j
        sum += do_dimension(N-1, ii, i-1, v, k-j)
    return sum

def matrix_convolution(a, b, n, M, N):
    ab = 0.0
    
    # Handle special case when n[M] is 0
    if n[M] == 0:
        ab += a[n]
    
    # Main convolution loop
    for i in range(1, P_M + 1):
        # Initialize v as a nested array structure
        v = [[0] * (N + 1) for _ in range(i + 1)]
        ab += do_dimension(N, i + 1, i + 1, v, n[N])
    
    return ab

Conclusion

What we have here is a really interesting, and I think novel, matrix operator.

For my immediate use case, it works well to model the generalisation of function application but I think it goes deeper than that.

Probably not in its current formulation, which is very specific to the case I am solving, but I have the feeling there is a more general operator lurking there, which would specialise to the above case, but in the general case also specialise to the Kronecker product and other similar operators, and perhaps give us some insight into the theory of computation.