.. _user_guide_overview: Overview -------- What is cudass? ~~~~~~~~~~~~~~~ **cudass** is a high-performance sparse linear solver for `PyTorch `_. It solves systems :math:`A x = b` where :math:`A` is a sparse matrix on GPU. It supports: * **Multiple matrix types**: general, symmetric, SPD, rectangular, and singular * **Several backends**: cuDSS (primary), cuSOLVER Dense (fallback), cuSolverSp (optional) * **PyTorch integration**: tensors on CUDA, ``float32``/``float64``, and batched RHS You provide :math:`A` in **COO (coordinate) sparse format** and a right-hand side :math:`b`; cudass chooses a backend, factorizes :math:`A`, and returns :math:`x`. When to use cudass ~~~~~~~~~~~~~~~~~~ Use cudass when: * You have **sparse** :math:`A` and want to reuse the factorization for many :math:`b` * You need **min-norm** or **least-squares** solutions for singular or rectangular systems * You want a **single API** that switches between cuDSS, cuSOLVER Dense, and cuSolverSp according to the matrix type For dense matrices or one-off solves, `torch.linalg.solve` may be simpler. For very large or distributed systems, consider specialized solvers or iterative methods. Matrix format: COO ~~~~~~~~~~~~~~~~~~ cudass expects :math:`A` as a tuple ``(index, value, m, n)``: * **``index``**: ``[2, nnz]``, ``int64``, on CUDA. Row indices in ``index[0]``, column indices in ``index[1]``. Duplicate :math:`(i,j)` entries are not coalesced; the backend will merge them as required. * **``value``**: ``[nnz]``, ``float32`` or ``float64``, on CUDA. The non-zero values. * **``m``**, **``n``**: integers. Number of rows and columns. Example: the matrix .. math:: A = \begin{pmatrix} 4 & 1 \\ 1 & 3 \end{pmatrix} in COO: .. code-block:: python index = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]], device="cuda", dtype=torch.int64) value = torch.tensor([4.0, 1.0, 1.0, 3.0], device="cuda", dtype=torch.float64) m, n = 2, 2 You can build ``(index, value)`` from ``torch.sparse_coo_tensor`` or from your own structure; ensure ``index`` and ``value`` are on the same CUDA device and that ``index.dtype == torch.int64``. Matrix types ~~~~~~~~~~~~ You must pass a **``MatrixType``** when creating the solver. It selects the algorithm and backend. Choose the one that matches your matrix: +------------------------------------+------------------+--------------------------------------------------+ | Type | Shape | Description | +====================================+==================+==================================================+ | ``GENERAL`` | :math:`m=n` | General non-singular square | +------------------------------------+------------------+--------------------------------------------------+ | ``SYMMETRIC`` | :math:`m=n` | Symmetric (non-singular) | +------------------------------------+------------------+--------------------------------------------------+ | ``SPD`` | :math:`m=n` | Symmetric positive definite | +------------------------------------+------------------+--------------------------------------------------+ | ``GENERAL_SINGULAR`` | :math:`m=n` | General singular, min-norm solution | +------------------------------------+------------------+--------------------------------------------------+ | ``SYMMETRIC_SINGULAR`` | :math:`m=n` | Symmetric singular, min-norm | +------------------------------------+------------------+--------------------------------------------------+ | ``GENERAL_RECTANGULAR`` | :math:`m \neq n` | Rectangular, full rank, least-squares | +------------------------------------+------------------+--------------------------------------------------+ | ``GENERAL_RECTANGULAR_SINGULAR`` | :math:`m \neq n` | Rectangular, rank-deficient | +------------------------------------+------------------+--------------------------------------------------+ .. note:: Picking the correct ``MatrixType`` matters: it affects both accuracy and performance. Use ``SPD`` only when :math:`A` is truly positive definite; otherwise prefer ``SYMMETRIC`` or ``GENERAL``. Backends ~~~~~~~~ cudass uses one of these backends internally: * **cuDSS** — Primary for general, symmetric, and SPD square systems. Fast when available; requires the cuDSS bindings to be built. * **cuSOLVER Dense** — Used for singular, rectangular, and as a fallback when cuDSS is missing or returns "not supported". Densifies :math:`A` and uses cuSOLVER (via PyTorch / cuBLAS). * **cuSolverSp** — Reserved for future use (e.g. OOM fallback). You usually do not need to pick a backend; the solver does it from ``MatrixType`` and shape. Options :ref:`prefer_dense` and :ref:`force_backend` override this when needed. Right-hand side and solution ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * **Single RHS**: ``b`` of shape ``[m]``. ``solve(b)`` returns ``x`` of shape ``[n]``. * **Multiple RHS**: ``b`` of shape ``[m, k]``. ``solve(b)`` returns ``x`` of shape ``[n, k]``. :math:`A` has shape :math:`(m, n)`, so :math:`b` has :math:`m` rows and :math:`x` has :math:`n` rows. ``b`` and ``x`` must be on the same CUDA device as the solver and use ``float32`` or ``float64`` to match the matrix.