Overview¶
What is cudass?¶
cudass is a high-performance sparse linear solver for PyTorch. It solves systems \(A x = b\) where \(A\) is a sparse matrix on GPU. It supports:
Multiple matrix types: general, symmetric, SPD, rectangular, and singular
Several backends: cuDSS (primary), cuSOLVER Dense (fallback), cuSolverSp (optional)
PyTorch integration: tensors on CUDA,
float32/float64, and batched RHS
You provide \(A\) in COO (coordinate) sparse format and a right-hand side \(b\); cudass chooses a backend, factorizes \(A\), and returns \(x\).
When to use cudass¶
Use cudass when:
You have sparse \(A\) and want to reuse the factorization for many \(b\)
You need min-norm or least-squares solutions for singular or rectangular systems
You want a single API that switches between cuDSS, cuSOLVER Dense, and cuSolverSp according to the matrix type
For dense matrices or one-off solves, torch.linalg.solve may be simpler. For very large or distributed systems, consider specialized solvers or iterative methods.
Matrix format: COO¶
cudass expects \(A\) as a tuple (index, value, m, n):
``index``:
[2, nnz],int64, on CUDA. Row indices inindex[0], column indices inindex[1]. Duplicate \((i,j)\) entries are not coalesced; the backend will merge them as required.``value``:
[nnz],float32orfloat64, on CUDA. The non-zero values.``m``, ``n``: integers. Number of rows and columns.
Example: the matrix
in COO:
index = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]], device="cuda", dtype=torch.int64)
value = torch.tensor([4.0, 1.0, 1.0, 3.0], device="cuda", dtype=torch.float64)
m, n = 2, 2
You can build (index, value) from torch.sparse_coo_tensor or from your own
structure; ensure index and value are on the same CUDA device and that
index.dtype == torch.int64.
Matrix types¶
You must pass a ``MatrixType`` when creating the solver. It selects the algorithm and backend. Choose the one that matches your matrix:
Type |
Shape |
Description |
|---|---|---|
|
\(m=n\) |
General non-singular square |
|
\(m=n\) |
Symmetric (non-singular) |
|
\(m=n\) |
Symmetric positive definite |
|
\(m=n\) |
General singular, min-norm solution |
|
\(m=n\) |
Symmetric singular, min-norm |
|
\(m \neq n\) |
Rectangular, full rank, least-squares |
|
\(m \neq n\) |
Rectangular, rank-deficient |
Note
Picking the correct MatrixType matters: it affects both accuracy and
performance. Use SPD only when \(A\) is truly positive definite;
otherwise prefer SYMMETRIC or GENERAL.
Backends¶
cudass uses one of these backends internally:
cuDSS — Primary for general, symmetric, and SPD square systems. Fast when available; requires the cuDSS bindings to be built.
cuSOLVER Dense — Used for singular, rectangular, and as a fallback when cuDSS is missing or returns “not supported”. Densifies \(A\) and uses cuSOLVER (via PyTorch / cuBLAS).
cuSolverSp — Reserved for future use (e.g. OOM fallback).
You usually do not need to pick a backend; the solver does it from MatrixType
and shape. Options Prefer cuSOLVER Dense (prefer_dense) and Forcing a specific backend (force_backend) override this when
needed.
Right-hand side and solution¶
Single RHS:
bof shape[m].solve(b)returnsxof shape[n].Multiple RHS:
bof shape[m, k].solve(b)returnsxof shape[n, k].
\(A\) has shape \((m, n)\), so \(b\) has \(m\) rows and \(x\)
has \(n\) rows. b and x must be on the same CUDA device as the solver
and use float32 or float64 to match the matrix.