Setup Deep learning on Arch linux, Cuda, Keras.

Jagadeesh Reddy Pyla
8 min readJul 26, 2020

--

  1. Setting up cuda, cudnn, tensorflow and keras on arch linux is pretty straight forward, but there’s lack of decent documentation. This post will help you setup the above with few simple steps and gives some trouble shooting tips as well.

2. Firstly, DONOT enable GPU using either bumblebee or optimus-manager on the machine, these are not required. If required you can later install optimus-manager, but the keras programs can be run just by having optimus-manager mode in intel

3. Install cuda/cudnn using following command

$ sudo pacman -S cuda cudnn

4. Install nvidia-dkms and nvidia-utils using following command

sudo pacman -S nvidia-dkms nvidia-utils

5. These commands many take quite a bit of time based on your network speed, also make sure there is ample space in your root directory as these packages comsume upto 5–10GB disk space

6. Please DONOT change any grub configurations as mentioned in some of the blogs, specifiall DONOT set nvidia-drm.modeset=1 in the /etc/default/grub and DONOT add modules nvidia, nvidia_modeset, nvidia_uvm and nvidia_drm to your initramfs in /etc/mkinicpio.conf, these are not required to run keras models

7. Also please DONOT run $ nvidia-xconfig, if at all you run it, you may run into blank blinking screen problem which is most probably due to the bad xconf file created. You can delete this file by logging into cmd mode (tty mode) on startup, this you can do by pressing ctrl + alt + f2 on arch linux. If f2 does not take you there, try other function keys with ctrl + atl. Once you are in the tty mode, login with your creds and delete the file /etc/X11/xorg.conf, please delete this only if you have never edited or created this file by yourself and if you are sure it is created by nvidia-xconfig command

8. General linux Tip : After the above step, you should possibly get rid of the blank blinking cursor screen. Generally if you see any that on startup display/graphics are not starting and you are stuck at blank screen, you can always restart and on restart edit the grub params for the OS and remove the the entries "quiet" for the param GRUB_CMDLINE_LINUX_DEFAULT and GRUB_CMDLINE_LINUX. This makes sure you can see the starup logs on start up and from there you can debug your problem

9. Install linux-headers packages from arch repo for the right kernel version. (Present under /usr/lib/modules/<version>-arch1-1), You can verify this by running, this will load nvidia modules, if this fails mostly the relevent linux-headers packages are missing, please install those

$ sudo modprobe nvidia

The latest linux-headers modules can be found at https://www.archlinux.org/packages/core/x86_64/linux-headers/, you can install it by running the following command

$ sudo pacman -S linux-headers

10. If everything so far is good, you can go to /opt/cuda and copy the samples folder to your home directory, then run the following commands

$ cd ~/samples/1_Utilities/deviceQuery
$ ./deviceQuery

The above should give an output similar to following

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 10.2 / 10.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4040 MBytes (4236312576 bytes)
( 6) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1620 MHz (1.62 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

The Result = PASS indicates that the GPU is identified and can work with the installed cuda libraries.

11. Update cuda and cudnn if there are any errors. Run following commands to update.

$ sudo pacman -Syu
$ sudo pacman -S cuda cudann

12. Install python-tensorflow-opt-cuda, DONOT install tensorflow from pip or python. Run the following commands

$ sudo pacman -S python-tensorflow-opt-cuda

13. Use the following python commands to see if GPU is correctly loaded with tensorflow and python.

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

14. The above two commands on python prompt should list the GPU device similar to following (I have an integrated intel GPU and Nvidia GPU, you should see nvidia GPU listed in this, your output might differ based on your system GPU/CPU hardware)

[name: "/device:CPU:0" device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: <hidden>
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: <hidden>
physical_device_desc: "device: XLA_CPU device"
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3775463424
locality {
bus_id: 1
links {
}
}
incarnation: <hidden>
physical_device_desc: "device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: <hidden>
physical_device_desc: "device: XLA_GPU device"
]

15. There is a possibility that you may see errors related to einsum package, so if the command on step 13 does not give the output seen in setup 14, and if the error says missing einsum package, then run the following commands to install einsum package

yay -S python-opt-einsum

python-opt-einsum is an AUR package so you need to install yay before install python-opt-einsum. You can see what is yay and how to install it at https://www.ostechnix.com/yay-found-yet-another-reliable-aur-helper/ .After installing python-opt-einsum repeat step 13. Hopefully this time you should not get any errors and the step 14 should give an output without errors similar to what you see in step 15.

16. One more important point is, don’t install anything with pip, use only official arch-packages or AUR packages.

17. Almost done, now you can install Keras directly from github or you can use the keras bundled with the package python-tensorflow-opt-cuda which we installed in step 12. I haven't tried the bundled Keras, so I suggest install Keras from github because it has many examples which you can readily run and verify if your entire installation has gone through correctly. Run following command to install Keras

$ git clone https://github.com/fchollet/keras
$ cd keras
$ sudo python setup.py install

This may take a while, be patient. After it finishes, You can try to run a Keras script, such as this MNIST example:

python examples/mnist_cnn.py

From the output of above command you can tell whether your GPU is being used of CPU is being used by Keras. The output should look similar to following, observe that the running time of each learning is around 5–10secs on my GeForce GTX 1050 Ti mobile GPU, it could be different for different GPUs, more powerful GPU with more memory might have much better running times. If the running time of each run is close to a minute, then most probably Keras is using your CPU for running the models and you should be able to see the error why GPU ins’t loaded when you run the above Keras script with reason. The output of above script is as follows.

Using TensorFlow backend.
...<Some hidden logs, not important>
2019-12-18 09:30:35.744891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 6 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 104.43GiB/s
2019-12-18 09:30:35.744989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2019-12-18 09:30:35.745052: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-18 09:30:35.745091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2019-12-18 09:30:35.745120: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2019-12-18 09:30:35.745147: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2019-12-18 09:30:35.745175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2019-12-18 09:30:35.745203: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-12-18 09:30:35.745408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-18 09:30:35.746200: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-18 09:30:35.746747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2019-12-18 09:30:48.526169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-18 09:30:48.526217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2019-12-18 09:30:48.526233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2019-12-18 09:30:48.542475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-18 09:30:48.543110: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-18 09:30:48.543685: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-12-18 09:30:48.553705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3600 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-12-18 09:30:48.579034: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564b823676d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2019-12-18 09:30:48.579091: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2019-12-18 09:30:49.879124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2019-12-18 09:30:52.014089: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
60000/60000 [==============================] - 17s 277us/step - loss: 0.2599 - accuracy: 0.9205 - val_loss: 0.0583 - val_accuracy: 0.9806
Epoch 2/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0903 - accuracy: 0.9730 - val_loss: 0.0449 - val_accuracy: 0.9858
Epoch 3/12
60000/60000 [==============================] - 8s 137us/step - loss: 0.0668 - accuracy: 0.9801 - val_loss: 0.0323 - val_accuracy: 0.9893
Epoch 4/12
60000/60000 [==============================] - 8s 136us/step - loss: 0.0566 - accuracy: 0.9835 - val_loss: 0.0356 - val_accuracy: 0.9883
Epoch 5/12
60000/60000 [==============================] - 8s 137us/step - loss: 0.0475 - accuracy: 0.9857 - val_loss: 0.0286 - val_accuracy: 0.9905
Epoch 6/12
60000/60000 [==============================] - 8s 138us/step - loss: 0.0414 - accuracy: 0.9874 - val_loss: 0.0317 - val_accuracy: 0.9897
Epoch 7/12
60000/60000 [==============================] - 8s 139us/step - loss: 0.0381 - accuracy: 0.9883 - val_loss: 0.0284 - val_accuracy: 0.9912
Epoch 8/12
60000/60000 [==============================] - 8s 137us/step - loss: 0.0341 - accuracy: 0.9895 - val_loss: 0.0277 - val_accuracy: 0.9909
Epoch 9/12
60000/60000 [==============================] - 8s 137us/step - loss: 0.0319 - accuracy: 0.9902 - val_loss: 0.0261 - val_accuracy: 0.9911
Epoch 10/12
60000/60000 [==============================] - 8s 139us/step - loss: 0.0286 - accuracy: 0.9910 - val_loss: 0.0245 - val_accuracy: 0.9920
Epoch 11/12
60000/60000 [==============================] - 8s 135us/step - loss: 0.0275 - accuracy: 0.9916 - val_loss: 0.0248 - val_accuracy: 0.9922
Epoch 12/12
60000/60000 [==============================] - 8s 139us/step - loss: 0.0257 - accuracy: 0.9920 - val_loss: 0.0254 - val_accuracy: 0.9920
Test loss: 0.02538239802838725
Test accuracy: 0.9919999837875366

18. That is it, we are done. I hope this document helps someone. These steps definitely helped me to setup Deep learning with Keras on Arch linux with Nvidia GPU. All arch packages are latest versions at the time of writing this. If you have any errors or queries please open issues with your errors I will try to help.

Happy Deep learning!

--

--