Kumar Kumar - 19 days ago 6
C++ Question

MKL cblas_dgemm giving me garbage results when debugging in x64

I'm writing a simple program in visual studio to multiply 2 matrices of type double using the cblas_dgemm function in the MKL library. This works perfectly in x86. However, when I switch to x64 mode I'm getting garbage values. Is there declaration i have to write or parameter i need to change when using MKL in x64?
I have pasted the outputs from the both x86 and x64 debugging modes below. Also I'm not using ILP64

#include <mkl.h>
#include <iostream>
#include <iomanip>
#include <cstdlib>

#define n1 12
#define n2 20
#define n3 15
#define size1 (sizeof(double) * 12)
#define size2 (sizeof(double) * 20)
#define sizer (sizeof(double) * 15)

using namespace std;

int main() {

double * mkl_mat1 = (double*)mkl_malloc(size1, 8);
double * mkl_mat2 = (double*)mkl_malloc(size2, 8);
double * mkl_matr = (double *)mkl_malloc(sizer, 8);

for (int i = 0; i < n1; i++) {

double add = ((double)i) / 10;

mkl_mat1[i] = add;

}


for (int i = 0; i < n2; i++) {

double add = (((double)(i)) / 10) + i;

mkl_mat2[i] = add;

}

cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 3, 5, 4, 1, mkl_mat1, 4, mkl_mat2, 5, 1, mkl_matr, 5);


cout << "matrix A:" << endl;
for (int i = 0; i < n1; i++) {

cout << fixed << setprecision(3);
cout << mkl_mat1[i] << " ";

if ((i + 1) % 4 == 0) {

cout << endl;

}

}

cout << endl << "matrix B:" << endl;

for (int i = 0; i < n2; i++) {

cout << mkl_mat2[i] << " ";

if ((i + 1) % 5 == 0) {

cout << endl;

}

}

cout << endl << "result: " << endl;

for (int i = 0; i < n3; i++) {

cout << mkl_matr[i] << " ";

if ((i + 1) % 5 == 0) {

cout << endl;

}


}

mkl_free(mkl_mat1);
mkl_free(mkl_mat2);
mkl_free(mkl_matr);

system("pause");


return 0;

}


This is the output from x86 debugging. The result from the matrix multiplication is correct.

matrix A:
0.000 0.100 0.200 0.300
0.400 0.500 0.600 0.700
0.800 0.900 1.000 1.100

matrix B:
0.000 1.100 2.200 3.300 4.400
5.500 6.600 7.700 8.800 9.900
11.000 12.100 13.200 14.300 15.400
16.500 17.600 18.700 19.800 20.900

result:
7.700 8.360 9.020 9.680 10.340
20.900 23.320 25.740 28.160 30.580
34.100 38.280 42.460 46.640 50.820


This is what i get using x64. As you can see the result from cblas_dgemm is completely off.

matrix A:
0.000 0.100 0.200 0.300
0.400 0.500 0.600 0.700
0.800 0.900 1.000 1.100

matrix B:
0.000 1.100 2.200 3.300 4.400
5.500 6.600 7.700 8.800 9.900
11.000 12.100 13.200 14.300 15.400
16.500 17.600 18.700 19.800 20.900

result:
93357590358131489749598208.000 2092621572586762403840.000 4987469390756061264329844035377914970112.000 840081208810589537915941265441360790647650203528241571795304448.000 1265439546878571763047336120349965224870150144.000
4291962.185 436361906378867406158334692399598789955786331512676402591277664910164164198641274664924994606214109295966461669812169210474109522783704059707122304404462832655621395047756256406765223308548365248881843421206279880627874460809085834208907100160.000 13231929878462779146578339497339877798921922010371671205674841175110389140714713179395701776807407261768913753318212756050190821210932002094269291681739783953603144153165258692494693046103252723040256.000 28.160 90881598017775964584735574016196848228044002038335559904513002632541601693427142081081275209626819242936591400845048349235711358896439078191053304572463519943645548707404624938627997148332398182386193400701779273442048081920.000
2537893164253223640010795635630559650975318016.000 170998293682572028485580594799506949489271077618286616693147899109522239738912059933878848909596303621717788311103047805626305223755691481550735323302132640120832.000 982135465147371735969556303529184247574036444253662581920872782225358090288015737548432603695740192230049566850171912121221797204870311530664537504528953201237190120366465000992322672490293353741933273323877930430705846845174170063267050815488.000 3669252151527883650882879225154439634779201105036709633261510558550031781881607608946777435557242020978154455651697127672034635122085159876765319981763404894584455323179401068638393473383612903205233978131817062603866165628370944.000 9576598406140674788853463577457446849100658411025734324791073248973248075712095334842922298642349822371447240257432929894715445313114259564166602659002428082685155628516429230074688724891359202918002033473039106048.000

Answer

I fixed the problem! I had to go to

Properties » Configuration properties » C/C++ » Code Generation » Runtime Library

and change it from

-Multi-threaded Debug DLL (/MDd): linking with dynamic Intel MKL libraries

to

-Multi-threaded (/MT): linking with static Intel MKL libraries

The issue was that I was doing dynamic linking instead of static. I found this webpage very useful in helping me overcome the issue

https://software.intel.com/en-us/articles/intel-math-kernel-library-inte...