Getting Started¶
This guide helps you install cudass and run your first solve. For a detailed overview of concepts and many examples, see the User Guide.
Prerequisites¶
Before installing cudass, ensure you have:
Python 3.8+
PyTorch ≥ 2.0 with CUDA support (pip install torch from pytorch.org)
NVIDIA GPU and a driver compatible with your PyTorch CUDA version
nvidia-cudss-cu12 or nvidia-cudss-cu13 (>=0.6), chosen from PyTorch’s
torch.version.cuda(override:CUDASS_CUDA_MAJOR=12or13). The build will pull the matching package.
Note
cudass supports CUDA 12.x and 13.x. The matching nvidia-cudss-cu12 or
nvidia-cudss-cu13 is chosen from PyTorch’s torch.version.cuda at
build/install time (override: CUDASS_CUDA_MAJOR=12 or 13). No conda
or system-wide cuDSS is required.
Installation¶
From PyPI (recommended)¶
pip install cudass
This builds cudass from the sdist at install time. You need nvcc
(CUDA toolkit) on your system; the build pulls in Cython and torch, and
picks nvidia-cudss-cu12 or cu13 from your PyTorch CUDA.
From source¶
For development or to use the latest code:
git clone https://github.com/MoCA-Technion/cudass.git
cd cudass
pip install torch Cython
# Set CUDA_HOME if nvcc is not on PATH, e.g.:
# export CUDA_HOME=/usr/local/cuda
# or: module load cuda/12.4 (or cuda/13.0 for CUDA 13)
pip install -e .
# nvidia-cudss-cu12 or cu13 is chosen from torch.version.cuda; override:
# CUDASS_CUDA_MAJOR=13 pip install -e .
You need nvcc (CUDA Toolkit) and Cython to build the cuDSS bindings and
the optional sparse_to_dense CUDA kernel. If the latter fails, the package
still installs; the cuSOLVER Dense backend will densify matrices without it.
Verifying the installation¶
Check that cudass imports and a minimal solve works:
import torch
from cudass import CUDASparseSolver, MatrixType
# Small 2×2 system: A = [[4,1],[1,3]], b = [1, 2]
# COO: rows [0,0,1,1], cols [0,1,0,1], values [4,1,1,3]
index = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]], device="cuda", dtype=torch.int64)
value = torch.tensor([4.0, 1.0, 1.0, 3.0], device="cuda", dtype=torch.float64)
m, n = 2, 2
b = torch.tensor([1.0, 2.0], device="cuda", dtype=torch.float64)
solver = CUDASparseSolver(matrix_type=MatrixType.SPD, use_cache=True)
solver.update_matrix((index, value, m, n))
x = solver.solve(b)
print(x) # e.g. tensor([0.0909, 0.6364], device='cuda:0')
print(solver.backend_name) # 'cudss' or 'cusolver_dn'
If this runs without errors, you are ready to use cudass.
Troubleshooting¶
``RuntimeError: CUDA is not available``
Install PyTorch with CUDA:
pip install torchfrom the PyTorch website and select a CUDA build.Ensure the NVIDIA driver supports your PyTorch CUDA version.
``nvidia-cudss-cu12 not found`` / ``cudss_bindings not available``
Run:
pip install nvidia-cudss-cu12.If the cuDSS bindings still fail to build, the solver will fall back to the cusolver_dn backend. You can force it with
CUDASparseSolver(..., force_backend="cusolver_dn").
``sparse_to_dense kernel not built``
The optional
sparse_to_densekernel needs nvcc and PyTorch at build time. SetCUDA_HOME(orCUDA_PATH) and reinstall:pip install -e ..If it is not built, the cuSOLVER Dense backend uses
torch.sparseto densify; slower but works.
Slow first solve or OOM on large matrices
The first solve factors the matrix; later solves with the same structure are faster (especially with
use_cache=True).For very large or ill-conditioned systems, try
force_backend="cusolver_dn"orprefer_dense=True; or use a different matrix type if it is singular/rectangular.
Next steps¶
Overview — Concepts: matrix format, matrix types, backends.
Quick Start — A first example explained step by step.
Basic Usage — SPD, general, symmetric, and multiple RHS.
Rectangular and Singular Systems — Over/underdetermined and singular systems.
Advanced Options —
prefer_dense,force_backend, cache, dtypes.