/ Keksipurkki / Posts

N-dimensional arrays

2019-06-30

Introduction

Ah, the joy of number crunching in an inefficient interpreted language in the heat of the summer. But I got a severe flashback from my student days and started to think about my old love NumPy and whether or not N-dimensional arrays could also work in JavaScript.

Here’s a little something I’ve been musing over as of late. I’ve been playing around with N-dimensional arrays in TypeScript. These are objects that generalize vectors and matrices and typically store numbers.

const value = ndarray[i][j][k]; // An array of rank 3

The number of dimensions of an array is its rank and its shape is the tuple (N0, N1, …, Nrank - 1). The shape enumerates the number of elements, i.e. the size, along each dimension. The total number of elements in the array is the product of the sizes.

N-dimensional arrays are the workhorses of scientific computing, statistics and their bastard children in the enterprise. Anybody who has experience with Matlab, R or NumPy has surely come across them. Perhaps a less widely known fact is that Fortran has excellent support for array processing. Fortran is actually the nestor in the array business. Perhaps I’ll write about a blog post about that sometime, too.

How about JavaScript then? The situation used to by rather bleak as numerical work is clearly something the language was not designed for. Before EcmaScript2015 there was no way to alter how array indexing works (cf. Python), nor was it possible to opt out from the one-size-fits-all Number objects that are backed by IEEE-754 floats. Today we have Proxy objects and the TypedArray interface, but there’s still room for improvement.

JavaScript engines can be particularly heinous in their behavior when you start with an empty array and insert elements into it dynamically. The JavaScript engine chooses the backing implementation for the array on the fly. That means that your computer may waste CPU cycles copying data from one implementation to another. Worse still, the way the array is represented in memory depends on the insertion order of the elements. If your array is discontinuous, i.e. it contains empty slots, engines typically resort to implementing the array as a classical hashmap with string keys. Good bye performance!

In a number crunching context, the desired behavior is to be as deterministic and contiguous as possible. This requires to have a good understanding of the problem size, which is conveniently encoded in the shape of the array.

Then, memory is allocated once and used for the remainder of the program. This is to minimize the overhead and indeterminism arising from dynamic memory allocation and garbage collection. In addition, the allocate-once approach allows the program to fail fast. You do not want to first run a program for a couple of days to discover that it halted due to memory issues.

A type-safe allocator function

OK, so let’s write some code that takes the above discussion into account. In the process, I hope to give you some ideas for your next number crunching adventure.

Let’s start simply. You allocate a 2-dimensional array, i.e. a matrix, like this:

function allocate(nrow: number, ncol: number, initialValue = 0.0) {

  if (!Number.isInteger(nrow) || nrow <= 0)
    throw new Error("expected a positive integer");

  if (!Number.isInteger(ncol) || ncol <= 0) 
    throw new Error("expected a positive integer");

  return Array(nrow).fill(undefined).map(() => Array(ncol).fill(initialValue));
}

I tend to start with a function like that in my projects where matrices are called for. Then it dawned to me that I can use recursion to allocate arrays of arbitrary shape. An untyped implementation reads:

const allocatorFactory = (initialValue) => function allocate(dim, ...rest) {

    if (!Number.isInteger(dim) || dim <= 0) {
      throw new Error("expected a positive integer");
    }

    if (!rest.length) {

      return Array(dim).fill(initialValue);

    } else {

      const [next, ...dims] = rest;
      return Array(dim).fill(undefined).map(() => allocate(next, ...dims));

    }

  };
};

const zeros = allocatorFactory(0.0);

You would call zeros with the shape parameters:

const ndarray = zeros(25, 25, 25); // shape is [25,25,25]

Pretty sweet! Notice how we enforce the shape to consist of positive integers in all dimensions.

But how about type-safety? Typing the allocate function turned out to be a bit of a head scratcher. I knew I had to use TypeScript conditional typing introduced in version 2.8. I also got some inspiration from this blog — a veritable tour de force in the calculus of types, albeit rather heavy duty to be sure.

Long story short, the gist in typing the allocate function is to capture the shape as tuple type with the sizes as literal values. We want TypeScript to see the arguments in the above invocation as [25, 25, 25] and not number[]. Different parameter values lead to different argument types, leveraging the power of type inference.

The required types turn out to be:


type Tail<T> = T extends [any, ...infer U] ? U : never;

interface ShapedArray<Shape extends any[], Value = number> {
  [i: number]: Shape extends [number] ? Value : ShapedArray<Tail<Shape>>;
  length: number;
}

type Allocate<V> = <S extends [number, ...number[]]>( ...shape: S) => ShapedArray<S, V>;

Alas, as of version 3.5, the definition of Tail<T> is not yet valid TypeScript syntax. There is an open issue about it. Fortunately, a workaround exists:

type Capture<T extends any[]> = (...args: T) => any;
type Tail<T extends any[]> = Capture<T> extends (( _: any, ...__: infer U) => any) ? U : never;

As our runtime implementation is recursive, so is our type definition. A ShapedArray is a type associated with a Shape that is a tuple whose length is the rank. A ShapedArray is indexable with a number. The referenced value is either a Value if the Shape is of rank 1 or another ShapedArray with a rank one less the current value.

I do not want to define ShapedArray as an extension of any of builtin array interfaces like ArrayLike or IterableIterator. That is, I only want to support array access and none of the array methods.

With the type definitions, our code becomes:

const allocatorFactory = <V>(initialValue: V) => {
  const allocate: Allocate<V> = (dim, ...rest) => {
    if (!Number.isInteger(dim) || dim <= 0) {
      throw new Error("expected a positive integer");
    }

    if (!rest.length) {
      return Array(dim).fill(initialValue);
    } else {
      const [next, ...dims] = rest;
      return Array(dim)
        .fill(undefined)
        .map(() => allocate(next, ...dims));
    }
  };

  return allocate;
};

To iterate over all the elements we can simply nest for-loops as usual:

const [Nx, Ny] = ndarray.map(x => x.length);
for (let i = 0; i < Nx; i++) {
  for (let j = 0; j < Ny; j++) {
    //...
  }
}

That’s it! Remember to organize the loop so that the innermost loop runs over the rightmost index so that you iterate over the numerical values in order. Check out the code in action in the TypeScript playground!.

Diminishing returns

The present solution is still not perfect. There are a couple of mid-hanging fruits in the typings. We could type ShapedArray to have support for static out-of-bounds checks and attempt to make all dimensions except the last to be Readonly. Further, we could explore how to implement a type-safe iterator over an arbitrary ShapedArray.

A second avenue of experimentation has to do with the memory layout. Modern CPUs like hot caches and contiguous memory access. For the most performant implementation possible, one could compute the number of total elements and then allocate one rank-1 array, potentially swapping Array to one of the builtin TypedArray implementations.

The type enhancements adjustments would allow us to generalize the nested for-loop structure and guard against situations like these

const ndarray = zeros(1,1);
ndarray[0][5]; // undefined
ndarray[0] = zeros(5); // don't want this

For the out-of-bounds check, we would need to adjust the index-expression in the ShapedArray definition from [i: number] to something like [i: Index<Head<Shape>>]. The Index<B> type would be the number literal union from 0 up to B - 1. However, this would require us to also type the array indices to something else than number which is not ideal.

As far as I know, NumPy implements its N-dimensional arrays as one-dimensional contiguous arrays with bindings to native code. This is no doubt possible to some extent in NodeJS environment. The flip side is that one has to then implement the mapping logic from the indices i, j, k… to actual memory. This is where NumPy and friends really shine.

The mathematics of storing multidimensional data in one dimension is not that difficult, but the potential upside in JavaScript is not so clear as the language does not admit any way to represent the index tuple in a syntactically nice way. The way to go would be to use the Proxy object to override property access. This is a level of indirection and in general I tend to regard things like Proxy and Reflection as too magical for my taste. However, a low-hanging fruit is to implement a feature that is present in Python and Fortran, namely support for negative indices.

In conclusion, I feel that these enhancements represent diminishing returns given the present state of JavaScript. At any rate, one would need to set up benchmarks to argue about performance. Ain’t nobody got time for that! The big gains are anyway in novel technologies like WebAssembly that would potentially let the web developer tap into computational resources like GPGPUs.