Maulwurf Maulwurf - 18 days ago 5
C++ Question

c++ overloaded parentheses operator slower than direct access to array member

At the moment I am trying to figure out why my naive matrix-matrix-multiplication is slower (0.7 sec) when I use the overloaded parentheses operator (

//first multiplication
). If I don't use them (
//second multiplication
) and the multiplication directly access the class member array
data_
it is about twice as fast (0.35 sec). I use my own matrix class as defined in
Matrix.h
.

Why is there such a significant difference in speed? Is there something wrong with my copy constructor? Is there so much "overhead" in calling the overloaded operator function that it justifies for that kind of performance penalty?

There is one more question / weird behavior: When you exchange the two inner most loops (
x
and
inner
) with each other, then the multiplication gets (of course) really slow, but both multiplications take almost the SAME time (7 sec) now. Why does it take the same time for them in this case, but before there was a ~50% performance difference.

edit: The program is compiled the following way:
g++ -c -std=c++0x -O3 -DNDEBUG


Thank you so much for your help!

My main function looks like this:

#include "Matrix.h"
int main(){
Matrix m1(1024,1024, 2.0);
Matrix m2(1024,1024, 2.5);
Matrix m3(1024,1024);

//first multiplication
for(int y = 0; y < 1024; ++y){
for(int inner = 0; inner < 1024; ++inner){
for(int x = 0; x < 1024; ++x){
m3(y,x) += m1(y, inner) * m2(inner, x);
}
}
}

//second multiplication
for(int y = 0; y < 1024; ++y){
for(int inner = 0; inner < 1024; ++inner){
for(int x = 0; x < 1024; ++x){
m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+inner];
}
}
}
}


And here is the part of
Matrix.h
:

class Matrix{

public:
Matrix();
Matrix(int sizeY, int sizeX);
Matrix(int sizeY, int sizeX, double init);
Matrix(const Matrix & orig);
~Matrix(){delete[] data_;}
double & operator() (int y, int x);
double operator() (int y, int x) const;

double * data_;

private:
int sizeX_;
int sizeY_;

}


And here the Implementation of
Matrix.h


Matrix::Matrix()
: sizeX_(0),
sizeY_(0),
data_(nullptr)
{ }

Matrix::Matrix(int sizeY, int sizeX)
: sizeX_(sizeX),
sizeY_(sizeY),
data_(new double[sizeX*sizeY]())
{
assert( sizeX > 0 );
assert( sizeY > 0 );
}

Matrix::Matrix(int sizeY, int sizeX, double init)
: sizeX_(sizeX),
sizeY_(sizeY)
{
assert( sizeX > 0 );
assert( sizeY > 0 );
data_ = new double[sizeX*sizeY];
std::fill(data_, data_+(sizeX_*sizeY_), init);
}

Matrix::Matrix(const Matrix & orig)
: sizeX_(orig.sizeX_),
sizeY_(orig.sizeY_)
{
data_ = new double[orig.sizeY_*orig.sizeX_];
std::copy(orig.data_, orig.data_+(sizeX_*sizeY_), data_);
}


double & Matrix::operator() (int y, int x){
assert( x >= 0 && x < sizeX_);
assert( y >= 0 && y < sizeY_);
return data_[y*sizeX_ + x];
}

double Matrix::operator() (int y, int x) const {
assert( x >= 0 && x < sizeX_);
assert( y >= 0 && y < sizeY_);
return data_[y*sizeX_ + x];
}


EDIT2: Turns out I used the wrong array access for the
//second multiplication
. I changed it to
m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+x];
and now both multiplications take the same time.
Thank you very much for your help!

Answer

I think your two versions are not computing the same thing:

In the first you have:

m3(y,x) += m1(y, inner) * m2(inner, x);

But in the second you have

m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+inner];

The second one can factor inner out and instead do inner * (1024 + 1) which can be optimized a number of ways that the first can't.

What are the outputs of the two versions? Do they match?

Edit: Another answerer is quite right suggesting that the dimensions in the class not being constant will take some optimizations off the table; in the first version the compiler doesn't know that the size is a power of two so it uses general-purpose multiplication but in the second version it knows that one of the operands is 1024 (not just a constant but a compile time constant) so it can use fast multiplication (left shift by the power of two).

(Apologies for my earlier answer about NDEBUG: I had the page open for a while so didn't see your edit with the compilation line.)

Comments