Overview¶

What is cudass?¶

cudass is a high-performance sparse linear solver for PyTorch. It solves systems \(A x = b\) where \(A\) is a sparse matrix on GPU. It supports:

Multiple matrix types: general, symmetric, SPD, rectangular, and singular
Several backends: cuDSS (primary), cuSOLVER Dense (fallback), cuSolverSp (optional)
PyTorch integration: tensors on CUDA, float32/float64, and batched RHS

You provide \(A\) in COO (coordinate) sparse format and a right-hand side \(b\); cudass chooses a backend, factorizes \(A\), and returns \(x\).

When to use cudass¶

Use cudass when:

You have sparse \(A\) and want to reuse the factorization for many \(b\)
You need min-norm or least-squares solutions for singular or rectangular systems
You want a single API that switches between cuDSS, cuSOLVER Dense, and cuSolverSp according to the matrix type

For dense matrices or one-off solves, torch.linalg.solve may be simpler. For very large or distributed systems, consider specialized solvers or iterative methods.

Matrix format: COO¶

cudass expects \(A\) as a tuple (index, value, m, n):

``index``: [2, nnz], int64, on CUDA. Row indices in index[0], column indices in index[1]. Duplicate \((i,j)\) entries are not coalesced; the backend will merge them as required.
``value``: [nnz], float32 or float64, on CUDA. The non-zero values.
``m``, ``n``: integers. Number of rows and columns.

Example: the matrix

\[\begin{split}A = \begin{pmatrix} 4 & 1 \\ 1 & 3 \end{pmatrix}\end{split}\]

in COO:

index = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]], device="cuda", dtype=torch.int64)
value = torch.tensor([4.0, 1.0, 1.0, 3.0], device="cuda", dtype=torch.float64)
m, n = 2, 2

You can build (index, value) from torch.sparse_coo_tensor or from your own structure; ensure index and value are on the same CUDA device and that index.dtype == torch.int64.

Matrix types¶

You must pass a ``MatrixType`` when creating the solver. It selects the algorithm and backend. Choose the one that matches your matrix:

Type	Shape	Description
`GENERAL`	\(m=n\)	General non-singular square
`SYMMETRIC`	\(m=n\)	Symmetric (non-singular)
`SPD`	\(m=n\)	Symmetric positive definite
`GENERAL_SINGULAR`	\(m=n\)	General singular, min-norm solution
`SYMMETRIC_SINGULAR`	\(m=n\)	Symmetric singular, min-norm
`GENERAL_RECTANGULAR`	\(m \neq n\)	Rectangular, full rank, least-squares
`GENERAL_RECTANGULAR_SINGULAR`	\(m \neq n\)	Rectangular, rank-deficient

Note

Picking the correct MatrixType matters: it affects both accuracy and performance. Use SPD only when \(A\) is truly positive definite; otherwise prefer SYMMETRIC or GENERAL.

Backends¶

cudass uses one of these backends internally:

cuDSS — Primary for general, symmetric, and SPD square systems. Fast when available; requires the cuDSS bindings to be built.
cuSOLVER Dense — Used for singular, rectangular, and as a fallback when cuDSS is missing or returns “not supported”. Densifies \(A\) and uses cuSOLVER (via PyTorch / cuBLAS).
cuSolverSp — Reserved for future use (e.g. OOM fallback).

You usually do not need to pick a backend; the solver does it from MatrixType and shape. Options Prefer cuSOLVER Dense (prefer_dense) and Forcing a specific backend (force_backend) override this when needed.

Right-hand side and solution¶

Single RHS: b of shape [m]. solve(b) returns x of shape [n].
Multiple RHS: b of shape [m, k]. solve(b) returns x of shape [n, k].

\(A\) has shape \((m, n)\), so \(b\) has \(m\) rows and \(x\) has \(n\) rows. b and x must be on the same CUDA device as the solver and use float32 or float64 to match the matrix.