We always want to use a suitable data-storage format when representing operators and states, even if this is not always the same format. Even if there are only two different formats, this poses a large problem for code duplication in the library; functions like add and matmul take two inputs and return one output, giving eight different type specialisations that need to be written and maintained for full coverage, or every mathematical function turns into something nightmarish like

def add(left, right, out=None):
   if isinstance(left, CSR):
      if isinstance(right, CSR):
         if out is None or isinstance(out, CSR):
            return add_csr(left, right, out)
         elif isinstance(out, Dense):
            return add_csr_csr_dense(left, right, out)
            raise TypeError
      elif isinstance(right, Dense):
         # ...
      # ...
   # ...
      raise TypeError

This is obviously completely unsustainable for even two different types, let alone more than that, and offers very little user customisation.

Instead, we want a unified, centrally controlled system where the developer simply calls add(left, right), and the correct specialisation is determined.

Why Not Just Use NumPy?

NumPy is a fantastic tool for representing numerical data, but it is limited to dense matrices, while many operators in quantum mechanics are often much more suited to a sparse representation.

For cases which are well-described by dense matrices, the data-layer type Dense is very similar to a NumPy array underneath (and in fact can be directly viewed as one using its as_ndarray() method), but is guaranteed to hold exactly two dimensions, of which one is stored contiguously. These additional internal guarantees help speed in the tightest loops, and the type can be constructed very quickly from an ndarray that is already in the correct format.

For the large number of cases where the underlying data is much sparser, we use the type, which is a form of compressed sparse row matrix very similar to SciPy’s scipy.sparse.csr_matrix. There are a few reasons for not wanting to use SciPy’s implementation:

  1. Instantiation of csr_matrix is very slow.

  2. csr_matrix can use different types as integer indices in its index arrays, but this can make it more difficult to interface with C code underneath.

  3. QuTiP has many parts where very low-level C access is required, and having to always deal with Python types means that we must often hold the GIL and pay non-trivial overhead penalties when accessing Python attributes.

  4. We need to add enough specialised routines that it’s not such a big deal if we replicate some functionality as well.

Older versions of QuTiP used to reduce these issues by using a fast_csr_matrix type which derived from csr_matrix and overrode its __init__() method to remove the slow index-checking code and ensured that only data of the correct types was stored. In C-level code, a secondary struct CSR_Matrix was defined, which led to various parts of the code have several entry points, depending on how many of the arguments had been converted to the structure representation, and there was still a lot of overhead in converting back to Python-level code at the end.

The new CSR type stores data in conceptually the same manner as SciPy, but is defined purely at the Cython level. This means that it pays almost no overhead when switching between Python and C access, and code working with the types need not hold the GIL. Further, the internal storage makes similar guarantees to the Dense format about the data storage, simplifying mathematical code within QuTiP. It can also be viewed as a SciPy object when it needs to be used from within Python.

Previous versions of QuTiP also only supported the fast_csr_matrix type as the backing data store. There are many cases where this is a deeply unsuitable type: in small systems, sparse matrices require large overheads and stymie data caching, while even in large systems many operations produce outputs which are nearly 100% dense such as time-evolution operators and matrix exponentials. For optimal control applications, the majority of the time spent was just in dealing with the sparse overheads. Allowing multiple types to represent data lets us use the right tool for each job, but it does mean that further care is taken to ensure that all the mathematical parts of the library can function without needing to produce an exponential number of new mathematical functions whenever a type or new operation is added.

Further, even if we were to use NumPy and SciPy objects, we would still be faced with the problem of handling multiple dispatch. As soon as QuTiP needed to add any new functionality that was not already a function in scipy.sparse or scipy.linalg, particularly one that takes two matrices as arguments, we would have had to implement the same dispatch system anyway.