Dispatch Operations: Dispatcher

The user most commonly interacts with the data layer by calling mathematical operations on Qobj instances, which verify that the operation makes mathematical sense (e.g. the Hilbert space dimensions are equal) and then call the data-layer dispatchers to handle the particular storage formats in use.

The multiple-dispatch system implemented by Dispatcher lets all mathematical operations in QuTiP use every possible combination of concrete input types, even if no developer or user actually wrote a function for that specific combination.

A basic example of adding two matrices together, and controlling the output type using the dispatch mechanism:

>>> import scipy.sparse
>>> import numpy as np
>>> a = data.CSR(scipy.sparse.csr_matrix(np.random.rand(5, 5)))
>>> b = data.Dense(np.random.rand(5, 5))
>>> data.add(a, b)
Dense(shape=(5, 5), fortran=True)
>>> data.add(a, b, out=data.CSR)
CSR(shape=(5, 5), nnz=25)

Using key-lookup syntax to get a callable object which says whether it is a concrete, “built-in” function (direct) or an ersatz one formed by converting the inputs and outputs with to:

>>> data.add[data.CSR, data.Dense]
<indirect specialisation (CSR, Dense, Dense) of add>
>>> data.add[data.CSR, data.CSR, data.CSR]
<direct specialisation (CSR, CSR, CSR) of add>

Basic Usage

A Dispatcher provides a single mathematical function for all combinations of types known by to, regardless of whether the particular specialisation has been defined for the input data types. In the first example above, the operator add currently only knows two specialisations; it knows how to add CSR + CSR -> CSR and Dense + Dense -> Dense directly, but it is still able to produce the correct result when asked to do CSR + Dense -> CSR and similar. The type of the output can be, but does not need to be, specified. The Dispatcher will choose a suitable output type if one is not given.

For example, the objects qutip.data.add, qutip.data.pow and qutip.data.matmul are some examples of dispatchers in the data layer. Respectively, these have the signatures

data.add(left: Data, right: Data, scale: complex = 1) -> Data
data.pow(matrix: Data, n: integer) -> Data
data.matmul(left: Data, right: Data) -> Data

These are callable functions, so the base use is to call them.

Just like to, key-lookup syntax can be used to get a single callable object representing a single specialisation. The callable object has an attribute direct which is True if no type conversions would need to take place, and False is at least one would have to happen. Just like in the regular call, you can either specify or not specify the type of the output, but the types of the inputs must always be given.

>>> data.pow[data.CSR]
<direct specialisation (CSR, CSR) of pow>
>>> data.pow[data.CSR].direct
True
>>> data.pow[data.CSR, data.Dense].direct
False

The returned object is callable with the same signature as the dispatcher (except the out keyword argument is no longer there), and requires that the inputs match the types stated.

Adding New Specialisations

New specialisations can be added to a pre-existing dispatcher with the qutip.data.Dispatcher.add_specialisations() method. This is very similar in form to qutip.data.to.add_conversions(); it takes lists of tuples, where the first elements of the tuple define the types in the specialisation, and the last is the specialised function itself.

For example, a user might need to multiply Dense @ CSR frequently and get a Dense output. Currently, there is no direct specialisation for this:

>>> data.matmul[Dense, CSR, Dense]
<indirect specialisation (Dense, CSR, Dense) of matmul>

The user may then choose to define their own specialisation to handle this case efficiently:

>>> def matmul_1(left: Dense, right: CSR) -> Dense:
...     # [...]
...     return out

They would give this to matmul by calling

>>> data.matmul.add_specialisations([
...     (Dense, CSR, Dense, matmul_1),
... ])

Now we find

>>> data.matmul[Dense, CSR, Dense]
<direct specialisation (Dense, CSR, Dense) of matmul>

Additionally, the whole lookup table will be rebuilt taking this new specialisation into account, which means the indirect specialisation matmul(Dense, CSR) -> CSR will now make use of this new method, because it has a low conversion weight.

Adding New Types

Now let’s say the user wants to add a new NewDataType type all across QuTiP. The only action they must take is to tell qutip.data.to about this new type. Let’s say they define it like this:

>>> class NewDataType:
...     # [...]
>>> def new_from_dense(matrix: data.Dense) -> NewDataType:
...     # [...]
>>> def dense_from_new(matrix: NewDataType) -> data.Dense:
...     # [...]
>>> data.to.add_conversions([
...     (NewDataType, data.Dense, new_from_dense),
...     (data.Dense, NewDataType, dense_from_new),
... ])

As we saw in the previous section, this is enough to define conversions between all pairs of types. What’s more, this is also enough to define all operations in the data layer as well:

>>> data.matmul[NewDataType, data.CSR]
<indirect specialisation (NewDataType, CSR, CSR) of matmul>

All of the data layer will now work seamlessly with the new type, even though this is actually achieved by conversion to and from a known data type. There was no need to call anything other than qutip.data.to.add_conversions(). Internally, this is achieved by qutip.data.Dispatcher.__init__() storing a reference to itself in to, and to calling rebuild_lookup() as part of add_conversions().

Now the user only needs to add in the specialisations that they actually need for the bottle-neck parts of their application, and leave the dispatcher to handle all other minor components by automatic conversion. As in the previous subsection, they do this by calling add_specialisations() on the relevant operations.

Creating a New Dispatcher

In most user-defined functions which operate on qutip.Qobj.data it will be completely sufficient for them to simply call data.to(desired_type, input_data) on entry to the function, and then they can guarantee that they are always working with the type of data they support.

However, in some cases they may want to support dispatched operations in the same way that we do within the library code. For this reason, the data layer exports Dispatcher as a public symbol. The minimal amount of work that needs to be done is to call the initialiser, and then call add_specialisations(). For example, let’s say the user has defined two specialisations for their simple new function add_square:

>>> def add_square_csr(left, right):
...     return data.add_csr(left, data.matmul_csr(right, right))
...
>>> def add_square_dense(left, right):
...     return data.add_dense(left, data.matmul_dense(right, right))
...

(Ignore for now that this would be better achieved by just using the dispatchers add and matmul directly.) Now they create the dispatcher simply by doing

>>> add_square = data.Dispatcher(add_square_csr, inputs=('left', 'right'), name='add_square', out=True)
>>> add_square.add_specialisations([
...     (data.CSR, data.CSR, data.CSR, add_square_csr),
...     (data.Dense, data.Dense, data.Dense, add_square_dense),
... ])

This is enough for Dispatcher to have extracted the signature and satisfied all of the specialisations. Note that the inputs argument does not provide the signature, it tells the dispatcher which arguments are data-layer types it should dispatch on, e.g. for pow as defined above inputs = ('matrix',), but the signature is (matrix, n) -> out. See that the specialisations are now complete:

>>> add_square
<dispatcher: add_square(left, right)>
>>> add_square[data.Dense, data.CSR, data.CSR]
<indirect specialisation (Dense, CSR, CSR) of add_square>

In the initialisation, the function add_square_csr is passed as an example from which Dispatcher extracts the call signature, the module name and the docstring (if it exists). It is not actually added as a specialisation until add_square.add_specialisations is called afterwards.

If desired, the user can set or override the docstring for the resulting dispatcher by directly writing to the __doc__ attribute of the object. We always do this within the library.

Note

Within the Cython components of the library, we manually construct the signature and pass it into the qutip.data.Dispatcher constructor because Cython-compiled functions do not embed their signature in a manner in which inspect.signature can extract it (even with the cython.embedsignature directive). We also use this to cut out some arguments in the call signatures which would not work with the dispatch mechanism (like out parameters).

Other Features

The Dispatcher can operate on a function with any call signature (except ones which use *args or **kwargs), even if not all of the arguments are data-layer types. At definition, the creator of the Dispatcher says which input arguments are meant to be dispatched on, and whether the output should be dispatched on, and all other arguments are passed through like normal.

How It Works

Calling a Dispatcher happens in five stages:

  1. bind parameters to the call signature

  2. resolve all dispatch rules into the best known specialisation

  3. convert dispatch parameters to match the chosen specialisation

  4. call the specialisation

  5. if necessary, convert the output to the correct type

First, the arguments have to be bound from the generic signature (*args, **kwargs) to match the actual call signature of the underlying functions so that the arguments which need dispatching are identified. This is done by the internal class _bind, which is constructed from the signature of the specialisations when the Dispatcher is instantiated.

Once we have the arguments, we have to resolve all the dispatch rules to know which specialisation we should use. In the most common scenario, the only active dispatch rules will be either global rules defined in the QuTiP options, or instance rules defined on this particular Dispatcher—there will not be any local rules passed in the function call. If this is the case, the Dispatcher will already have a pre-built lookup table, and choosing the correct specialisation is simply a hash lookup on the types. This is linear complexity in the number of types dispatched on, whereas the naïve if/elif/else structure would go as \(\mathcal O(n^m)\), where \(n\) is the number of known data-layer types, and \(m\) is the number of dispatch parameters.

The next most common case would be that the only local rule is to fix the output type. In this case, the secondary lookup table is used to avoid having to compute a proper set of weights.

The final case for resolving rules is the most general one: we have been given some proper dispatch rules that we have to handle at call time to resolve which specialisation we must choose. Here, we have no choice but to recalculate the correct specialisation, which comes with some (hopefully small) runtime cost.

The last three steps of the dispatch operation are simple. Now that we have the specialisation we are going to use, we simply use to to convert the given inputs to the correct types, we call the function, and we return the answer (possible converting it too).

Building the lookup table is a relatively simple process, but asymptotically it has very poor complexity. This is why as much as possible is done ahead of time, at library initialisation or when a new type is added. To build the tables, we have to compare every possible set of inputs to every known specialisation, and then choose the specialisation which has the least “weight” given the defined rules.

Implementation Details

The backing specialisations can be found in qutip.Dispatcher._specialisations, and the complete lookup table is in qutip.Dispatcher._lookup. These are marked as private, because messing around with them will almost certainly cause the dispatcher to stop working.

Only one specialisation needs to be defined for a dispatcher to work with all data types known by to. We achieve this because to guarantees that all possible conversions between data types will exist, so Dispatcher can always convert its inputs into those which will match one of its known specialisations.

Within the initialisation of the data layer, we use a “magic” _defer keyword argument to add_specialisations() to break a circular dependency. This is because the “type” modules qutip.data.csr and qutip.data.dense depend on some mathematical modules (e.g. add and matmul) to provide the __add__ and similar methods on the types. For ease of development we want the dispatchers to be defined in the same modules that all the specialisations are (though this is not at all necessary), but the dispatchers require to to be populated with the types before specialisations can be added. The _defer keyword here just defers the building of the lookup table until an explicit call to rebuild_lookup(), breaking the cycle. The user will never need to do this, because by the time they receive the Dispatcher object, to is already initialised to a minimum degree.

Efficiency Notes

The specialisations returned by the __getitem__ lookups are not significantly faster than just calling the dispatcher directly, because the bulk of the heavy lifting is done when add_specialisations() or rebuild_lookup() is called. On call, the generic signature (*args, **kwargs) has to be bound to the actual signature of the underlying operation, regardless of whether the specialisation has already been found. At the Cython level there is short-circuit access to the call machinery in the specialisations themselves, but this cannot be safely exposed outside of the qutip.data.Dispatcher class.