I have to create the following global matrix
m = malloc (20 * sizeof *m);
One solution could be to allocate a single array and compute the 1d-index but I would prefer if it was the compiler to do it.
That is actually the good idea. Of course that array data should be allocated into heap (using
operator new in C++, or
calloc in C). and computing the offset from various indexes is easy.
You probably should find a good existing matrix library. Some of them might even have optimization taking advantage of specific hardware (e.g. OpenMP or OpenCL based).
See also this answer for a C approach.
for performance purposes I would prefer contiguous memory for caching.
Such caching considerations only matter for the most inner loops of your computation. See also this.