Daniel A. Thompson Daniel A. Thompson - 2 months ago 28x
Python Question

Tensorflow 0.10 (CUDA) on OSX segfaults on python import

I've been trying to get tensorflow 0.10 up and running on my El Capitan Macbook Pro (Late 2013, GeForce GT 750M), so far without success. I've tried the official tensorflow documentation's instructions and a number of other folks' approaches, including this one and this one.

For reference, I'm trying to use Python3, CUDA 7.5, and tensorflow 0.10 on OSX 10.11.5.

I've gotten CUDA installed and it recognizes my GPU. I can successfully compile the

sample in
. Its output when run is:

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GT 750M"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 926 MHz (0.93 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 750M
Result = PASS

I've also downloaded the cudnn-7.5 library and header and put those files in the correct locations in

In the python3 interactive REPL, if I type in
import tensorflow
, I get the following output:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.7.5.dylib locally
Segmentation fault: 11

My question is, what do I need to do to successfully import the module without a segfault? In case it helps, I posted a gist of the
output of running that command in the python3 REPL here, and a gist of the diagnostic (crash) report with stacktrace here.


The problem is described in this comment: https://github.com/tensorflow/tensorflow/issues/2940#issuecomment-238952433

"There is a bug with loading libcuda.dylib - the default cuda install creates libcuda.dylib, but tensorflow tries to load libcuda.1.dylib . This fails, resorting to using the LD_LIBRARY_PATH which if NULL crashes. If you copy libcuda.dylib to libcuda.1.dylib it loads fine."

It would be pretty easy to fix the crash for everyone else with a pull request -- ie compile with -c dbg to see exactly which line is trying to use the null value, and adding something like this to the code

if (mystring == NULL) { return; }