Maulwurf - 18 days ago 5

C++ Question

At the moment I am trying to figure out why my naive matrix-matrix-multiplication is slower (0.7 sec) when I use the overloaded parentheses operator (

`//first multiplication`

`//second multiplication`

`data_`

`Matrix.h`

Why is there such a significant difference in speed? Is there something wrong with my copy constructor? Is there so much "overhead" in calling the overloaded operator function that it justifies for that kind of performance penalty?

There is one more question / weird behavior: When you exchange the two inner most loops (

`x`

`inner`

edit: The program is compiled the following way:

`g++ -c -std=c++0x -O3 -DNDEBUG`

Thank you so much for your help!

My main function looks like this:

`#include "Matrix.h"`

int main(){

Matrix m1(1024,1024, 2.0);

Matrix m2(1024,1024, 2.5);

Matrix m3(1024,1024);

//first multiplication

for(int y = 0; y < 1024; ++y){

for(int inner = 0; inner < 1024; ++inner){

for(int x = 0; x < 1024; ++x){

m3(y,x) += m1(y, inner) * m2(inner, x);

}

}

}

//second multiplication

for(int y = 0; y < 1024; ++y){

for(int inner = 0; inner < 1024; ++inner){

for(int x = 0; x < 1024; ++x){

m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+inner];

}

}

}

}

And here is the part of

`Matrix.h`

`class Matrix{`

public:

Matrix();

Matrix(int sizeY, int sizeX);

Matrix(int sizeY, int sizeX, double init);

Matrix(const Matrix & orig);

~Matrix(){delete[] data_;}

double & operator() (int y, int x);

double operator() (int y, int x) const;

double * data_;

private:

int sizeX_;

int sizeY_;

}

And here the Implementation of

`Matrix.h`

`Matrix::Matrix()`

: sizeX_(0),

sizeY_(0),

data_(nullptr)

{ }

Matrix::Matrix(int sizeY, int sizeX)

: sizeX_(sizeX),

sizeY_(sizeY),

data_(new double[sizeX*sizeY]())

{

assert( sizeX > 0 );

assert( sizeY > 0 );

}

Matrix::Matrix(int sizeY, int sizeX, double init)

: sizeX_(sizeX),

sizeY_(sizeY)

{

assert( sizeX > 0 );

assert( sizeY > 0 );

data_ = new double[sizeX*sizeY];

std::fill(data_, data_+(sizeX_*sizeY_), init);

}

Matrix::Matrix(const Matrix & orig)

: sizeX_(orig.sizeX_),

sizeY_(orig.sizeY_)

{

data_ = new double[orig.sizeY_*orig.sizeX_];

std::copy(orig.data_, orig.data_+(sizeX_*sizeY_), data_);

}

double & Matrix::operator() (int y, int x){

assert( x >= 0 && x < sizeX_);

assert( y >= 0 && y < sizeY_);

return data_[y*sizeX_ + x];

}

double Matrix::operator() (int y, int x) const {

assert( x >= 0 && x < sizeX_);

assert( y >= 0 && y < sizeY_);

return data_[y*sizeX_ + x];

}

`//second multiplication`

`m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+x];`

Thank you very much for your help!

Answer

I think your two versions are not computing the same thing:

In the first you have:

m3(y,x) += m1(y, inner) * m2(inner, **x**);

But in the second you have

m3.data_[y*1024+x] += m1.data_[y*1024+inner]*m2.data_[inner*1024+**inner**];

The second one can factor inner out and instead do inner * (1024 + 1) which can be optimized a number of ways that the first can't.

What are the outputs of the two versions? Do they match?

Edit: Another answerer is quite right suggesting that the dimensions in the class not being constant will take some optimizations off the table; in the first version the compiler doesn't know that the size is a power of two so it uses general-purpose multiplication but in the second version it knows that one of the operands is 1024 (not just a constant but a **compile time** constant) so it can use fast multiplication (left shift by the power of two).

(Apologies for my earlier answer about NDEBUG: I had the page open for a while so didn't see your edit with the compilation line.)

Source (Stackoverflow)

Comments