Introduction
Ah, the joy of number crunching in an inefficient interpreted language in the heat of the summer. But I got a severe flashback from my student days and started to think about my old love NumPy and whether or not N-dimensional arrays could also work in JavaScript.
Here’s a little something I’ve been musing over as of late. I’ve been playing around with N-dimensional arrays in TypeScript. These are objects that generalize vectors and matrices and typically store numbers.
const value = ndarray[i][j][k]; // An array of rank 3
The number of dimensions of an array is its rank and its shape is the tuple (N0, N1, …, Nrank - 1). The shape enumerates the number of elements, i.e. the size, along each dimension. The total number of elements in the array is the product of the sizes.
N-dimensional arrays are the workhorses of scientific computing, statistics and their bastard children in the enterprise. Anybody who has experience with Matlab, R or NumPy has surely come across them. Perhaps a less widely known fact is that Fortran has excellent support for array processing. Fortran is actually the nestor in the array business. Perhaps I’ll write about a blog post about that sometime, too.
How about JavaScript then? The situation used to by rather bleak as numerical work is
clearly something the language was not designed for. Before EcmaScript2015 there was no
way to alter how array indexing works (cf.
Python), nor was
it possible to opt out from the one-size-fits-all Number
objects that are backed by
IEEE-754 floats. Today we have Proxy
objects and the TypedArray
interface, but there’s still room for improvement.
JavaScript engines can be particularly heinous in their behavior when you start with an empty array and insert elements into it dynamically. The JavaScript engine chooses the backing implementation for the array on the fly. That means that your computer may waste CPU cycles copying data from one implementation to another. Worse still, the way the array is represented in memory depends on the insertion order of the elements. If your array is discontinuous, i.e. it contains empty slots, engines typically resort to implementing the array as a classical hashmap with string keys. Good bye performance!
In a number crunching context, the desired behavior is to be as deterministic and contiguous as possible. This requires to have a good understanding of the problem size, which is conveniently encoded in the shape of the array.
Then, memory is allocated once and used for the remainder of the program. This is to minimize the overhead and indeterminism arising from dynamic memory allocation and garbage collection. In addition, the allocate-once approach allows the program to fail fast. You do not want to first run a program for a couple of days to discover that it halted due to memory issues.
A type-safe allocator function
OK, so let’s write some code that takes the above discussion into account. In the process, I hope to give you some ideas for your next number crunching adventure.
Let’s start simply. You allocate a 2-dimensional array, i.e. a matrix, like this:
function allocate(nrow: number, ncol: number, initialValue = 0.0) {
if (!Number.isInteger(nrow) || nrow <= 0)
throw new Error("expected a positive integer");
if (!Number.isInteger(ncol) || ncol <= 0)
throw new Error("expected a positive integer");
return Array(nrow).fill(undefined).map(() => Array(ncol).fill(initialValue));
}
I tend to start with a function like that in my projects where matrices are called for. Then it dawned to me that I can use recursion to allocate arrays of arbitrary shape. An untyped implementation reads:
const allocatorFactory = (initialValue) => function allocate(dim, ...rest) {
if (!Number.isInteger(dim) || dim <= 0) {
throw new Error("expected a positive integer");
}
if (!rest.length) {
return Array(dim).fill(initialValue);
} else {
const [next, ...dims] = rest;
return Array(dim).fill(undefined).map(() => allocate(next, ...dims));
}
};
};
const zeros = allocatorFactory(0.0);
You would call zeros
with the shape parameters:
const ndarray = zeros(25, 25, 25); // shape is [25,25,25]
Pretty sweet! Notice how we enforce the shape to consist of positive integers in all dimensions.
But how about type-safety? Typing the allocate
function turned out to be a bit of a head
scratcher. I knew I had to use TypeScript conditional
typing
introduced in version 2.8. I also got some inspiration from this
blog — a
veritable tour de force in the calculus of types, albeit rather heavy duty to be sure.
Long story short, the gist in typing the allocate
function is to capture the shape as
tuple type with the sizes as literal values. We want TypeScript to see the arguments in
the above invocation as [25, 25, 25]
and not number[]
. Different parameter values lead
to different argument types, leveraging the power of type inference.
The required types turn out to be:
type Tail<T> = T extends [any, ...infer U] ? U : never;
interface ShapedArray<Shape extends any[], Value = number> {
[i: number]: Shape extends [number] ? Value : ShapedArray<Tail<Shape>>;
length: number;
}
type Allocate<V> = <S extends [number, ...number[]]>( ...shape: S) => ShapedArray<S, V>;
Alas, as of version 3.5, the definition of Tail<T>
is not yet valid TypeScript syntax.
There is an open issue about it.
Fortunately, a workaround exists:
type Capture<T extends any[]> = (...args: T) => any;
type Tail<T extends any[]> = Capture<T> extends (( _: any, ...__: infer U) => any) ? U : never;
As our runtime implementation is recursive, so is our type definition. A ShapedArray
is
a type associated with a Shape
that is a tuple whose length is the rank. A ShapedArray
is indexable with a number. The referenced value is either a Value
if the Shape
is of
rank 1 or another ShapedArray
with a rank one less the current value.
I do not want to define ShapedArray
as an extension of any of builtin array interfaces
like ArrayLike
or IterableIterator
. That is, I only want to support array access and
none of the array methods.
With the type definitions, our code becomes:
const allocatorFactory = <V>(initialValue: V) => {
const allocate: Allocate<V> = (dim, ...rest) => {
if (!Number.isInteger(dim) || dim <= 0) {
throw new Error("expected a positive integer");
}
if (!rest.length) {
return Array(dim).fill(initialValue);
} else {
const [next, ...dims] = rest;
return Array(dim)
.fill(undefined)
.map(() => allocate(next, ...dims));
}
};
return allocate;
};
To iterate over all the elements we can simply nest for-loops as usual:
const [Nx, Ny] = ndarray.map(x => x.length);
for (let i = 0; i < Nx; i++) {
for (let j = 0; j < Ny; j++) {
//...
}
}
That’s it! Remember to organize the loop so that the innermost loop runs over the rightmost index so that you iterate over the numerical values in order. Check out the code in action in the TypeScript playground!.
Diminishing returns
The present solution is still not perfect. There are a couple of mid-hanging fruits in the
typings. We could type ShapedArray
to have support for static out-of-bounds checks and
attempt to make all dimensions except the last to be Readonly
. Further, we could explore
how to implement a type-safe iterator over an arbitrary ShapedArray
.
A second avenue of experimentation has to do with the memory layout. Modern CPUs like hot
caches and contiguous memory access. For the most performant implementation possible, one
could compute the number of total elements and then allocate one rank-1 array, potentially
swapping Array
to one of the
builtin
TypedArray
implementations.
The type enhancements adjustments would allow us to generalize the nested for-loop structure and guard against situations like these
const ndarray = zeros(1,1);
ndarray[0][5]; // undefined
ndarray[0] = zeros(5); // don't want this
For the out-of-bounds check, we would need to adjust the index-expression in the
ShapedArray
definition from [i: number]
to something like [i: Index<Head<Shape>>]
.
The Index<B>
type would be the number literal union from 0 up to B - 1. However, this
would require us to also type the array indices to something else than number
which is
not ideal.
As far as I know, NumPy implements its N-dimensional arrays as one-dimensional contiguous arrays with bindings to native code. This is no doubt possible to some extent in NodeJS environment. The flip side is that one has to then implement the mapping logic from the indices i, j, k… to actual memory. This is where NumPy and friends really shine.
The mathematics of storing multidimensional data in one dimension is not that difficult,
but the potential upside in JavaScript is not so clear as the language does not admit any
way to represent the index tuple in a syntactically nice way. The way to go would be to
use the Proxy
object to override property access. This is a level of indirection and in
general I tend to regard things like Proxy
and Reflection
as too magical for my taste.
However, a low-hanging fruit is to implement a feature that is present in Python and
Fortran, namely support for negative
indices.
In conclusion, I feel that these enhancements represent diminishing returns given the present state of JavaScript. At any rate, one would need to set up benchmarks to argue about performance. Ain’t nobody got time for that! The big gains are anyway in novel technologies like WebAssembly that would potentially let the web developer tap into computational resources like GPGPUs.