Introduction

WSL2 provides a way to run a Linux environment on a Windows machine, making it possible to use Linux tools and applications natively within Windows. However, due to the differences between Linux and Windows, getting CUDA to work under WSL2 can be challenging.

Despite the challenges, it’s important to be able to diagnose CUDA problems under WSL2, because WSL2 offers a number of benefits over a traditional dual-boot setup, including better resource utilization, faster startup times, and the ability to seamlessly switch between Windows and Linux environments.

In addition, as more and more data scientists and developers adopt WSL2 as their primary development environment, understanding how to diagnose and troubleshoot CUDA issues under WSL2 will become increasingly important.

In this article, we’ll provide an overview of CUDA and WSL2, common issues that can arise when trying to use CUDA under WSL2, and some tips and tricks for diagnosing and resolving these issues. We’ll also provide real-world case studies that demonstrate how to diagnose and fix common CUDA problems under WSL2.

Understanding CUDA and WSL2:

CUDA is a parallel computing platform and programming model that enables developers to use a GPU for general-purpose computing. It’s widely used in deep learning, scientific simulations, and other computationally-intensive applications.

WSL2 is a new version of Windows Subsystem for Linux that runs a real Linux kernel in a lightweight virtual machine. This makes it possible to run Linux tools and applications natively within Windows, with improved performance and resource utilization compared to the previous version of WSL.

To use CUDA under WSL2, you need to install the NVIDIA driver and CUDA toolkit within the WSL2 environment. However, there are some differences between running CUDA under WSL2 and running it on a native Linux installation. For example, WSL2 has limited support for accessing GPUs directly, so you may need to use a workaround like NVIDIA’s CUDA on WSL User Guide.

For more detailed information on CUDA and WSL2, refer to NVIDIA’s official guides:

NVIDIA CUDA Toolkit Documentation: https://docs.nvidia.com/cuda/index.html
NVIDIA CUDA on WSL User Guide: https://docs.nvidia.com/cuda/wsl-user-guide/index.html
Microsoft Enable NVIDIA CUDA on WSL: https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl

Common Problems and Solutions:

When using CUDA under WSL2, you may encounter a variety of issues, such as:

Inability to detect the GPU within the WSL2 environment
Incompatibility issues between the NVIDIA driver and the WSL2 kernel
Installation failures or errors with the CUDA toolkit

To resolve these issues, some common solutions include:

Installing the latest NVIDIA driver and CUDA toolkit within the WSL2 environment
Ensuring that the WSL2 kernel and NVIDIA driver are compatible by checking the NVIDIA driver release notes
Using workarounds such as NVIDIA’s CUDA on WSL User Guide to enable GPU access within the WSL2 environment

For more detailed information on common problems and solutions when using CUDA under WSL2, refer to the links mentioned above. These guides provide step-by-step instructions for diagnosing and resolving common issues, as well as troubleshooting tips and best practices for ensuring a smooth CUDA experience under WSL2.

Debugging Tips and Tricks

Debugging CUDA issues under WSL2 can be a challenging process, but there are some tips and tricks you can use to make it easier:

WSL2 do not need to install NVIDIA GPU drivers; it borrows/symlinks GPU drivers directly from the Windows kernel. Though the current version is known to have some problems with the symlink. This problem can be resolved using method 1 or method 2.
From time to time, it is good to nuke everything and start over. Here’s how to completely uninstall CUDA from Linux.
Both PyTorch and Tensorflow/Jax are extremely sensitive to GPU Driver version, CUDA version, and CuDNN version. For the latest compatibility matrix, please see the following table for PyTorch Release Compatibility Matrix

PyTorch version	Python	Stable CUDA	Experimental CUDA
2.0	>=3.8, <=3.11	CUDA 11.7, CUDNN 8.5.0.96	CUDA 11.8, CUDNN 8.7.0.84
1.13	>=3.7, <=3.10	CUDA 11.6, CUDNN 8.3.2.44	CUDA 11.7, CUDNN 8.5.0.96
1.12	>=3.7, <=3.10	CUDA 11.3, CUDNN 8.3.2.44	CUDA 11.6, CUDNN 8.3.2.44

The command torch.cuda.is_available() is not too helpful to get the actual errors. To obtain the specific contextual error message on why CUDA isn’t working, shove a tensor into torch CUDA and force it to print the actuall error message.

python -c "import torch;torch.zeros(1).cuda()

To obtain PyTorch environment settings, do:

$ python -m torch.utils.collect_env

Collecting environment information...
PyTorch version: 2.0.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.1
Libc version: glibc-2.35

Python version: 3.11.2 (main, Feb  7 2023, 13:52:42) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.7.64
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA TITAN RTX

Nvidia driver version: 531.41
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.5.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.5.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   48 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          48
On-line CPU(s) list:             0-47
Vendor ID:                       AuthenticAMD
Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
CPU family:                      23
Model:                           49
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       1
Stepping:                        0
BogoMIPS:                        7600.04
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext perfctr_core ssbd ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat npt nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload umip rdpid
Virtualization:                  AMD-V
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       768 KiB (24 instances)
L1i cache:                       768 KiB (24 instances)
L2 cache:                        12 MiB (24 instances)
L3 cache:                        16 MiB (1 instance)
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchvision==0.15.1
[conda] Could not collect

For NVIDIA GPU Architectures and CUDA gencodes, plesae refer to the following chart. Source

Fermi	Kepler	Maxwell	Pascal	Volta	Turing	Ampere	Ada (Lovelace)	Hopper
sm_20	sm_30	sm_50	sm_60	sm_70	sm_75	sm_80	sm_89	sm_90
	sm_35	sm_52	sm_61	sm_72 (Xavier)		sm_86		sm_90a (Thor)
	sm_37	sm_53	sm_62			sm_87 (Orin)

From time to time, CUDA might not support the latest GPU’s compute power. So you would have to downgrade compute using the following environment variables.

export TORCH_CUDA_ARCH_LIST="8.0"

Check the NVIDIA driver and CUDA toolkit versions: Make sure that the versions of the NVIDIA driver and CUDA toolkit installed within the WSL2 environment are compatible with each other and with the WSL2 kernel. Check the release notes for each component to ensure compatibility.
Check GPU detection: Use the nvidia-smi command to check whether the GPU is being detected within the WSL2 environment. If it’s not, check that the NVIDIA driver is installed correctly and that the nvidia module is loaded.
Use verbose mode for debugging: When running CUDA applications, use verbose mode to get more detailed error messages that can help you diagnose issues. To do this, set the CUDA_VISIBLE_DEVICES environment variable to the index of the GPU you want to use and add the –verbose flag to your CUDA application command.
Check resource utilization: Use tools such as htop or nvidia-smi to monitor resource utilization, including CPU, memory, and GPU usage. This can help you identify performance bottlenecks and diagnose issues related to resource contention.
Use containerization: Consider using containerization tools such as Docker to manage your CUDA environment. This can help you ensure that the correct versions of the NVIDIA driver and CUDA toolkit are installed, and make it easier to reproduce issues in a controlled environment.

Conclusion

In conclusion, diagnosing CUDA problems under WSL2 can be challenging, but understanding the basics of CUDA and WSL2, common issues and solutions, and using debugging tips and tricks can make the process more manageable. NVIDIA’s official documentation and guides are excellent resources for learning more and getting detailed instructions. Despite the challenges, using WSL2 as a development environment offers many benefits and can be a productive experience with the right tools and techniques.