Dispatch Operations: Dispatcher
¶
The user most commonly interacts with the data layer by calling mathematical
operations on Qobj
instances, which verify that the operation makes
mathematical sense (e.g. the Hilbert space dimensions are equal) and then call
the data-layer dispatchers to handle the particular storage formats in use.
The multiple-dispatch system implemented by Dispatcher
lets all
mathematical operations in QuTiP use every possible combination of concrete
input types, even if no developer or user actually wrote a function for that
specific combination.
A basic example of adding two matrices together, and controlling the output type using the dispatch mechanism:
>>> import scipy.sparse
>>> import numpy as np
>>> a = data.CSR(scipy.sparse.csr_matrix(np.random.rand(5, 5)))
>>> b = data.Dense(np.random.rand(5, 5))
>>> data.add(a, b)
Dense(shape=(5, 5), fortran=True)
>>> data.add(a, b, out=data.CSR)
CSR(shape=(5, 5), nnz=25)
Using key-lookup syntax to get a callable object which says whether it is a
concrete, “built-in” function (direct) or an ersatz one formed by converting the
inputs and outputs with to
:
>>> data.add[data.CSR, data.Dense]
<indirect specialisation (CSR, Dense, Dense) of add>
>>> data.add[data.CSR, data.CSR, data.CSR]
<direct specialisation (CSR, CSR, CSR) of add>
Basic Usage¶
A Dispatcher
provides a single mathematical function for all
combinations of types known by to
, regardless of whether
the particular specialisation has been defined for the input data types. In the
first example above, the operator add
currently only
knows two specialisations; it knows how to add CSR + CSR -> CSR
and Dense
+ Dense -> Dense
directly, but it is still able to produce the correct result
when asked to do CSR + Dense -> CSR
and similar. The type of the output can
be, but does not need to be, specified. The Dispatcher
will choose a suitable output type if one is not given.
For example, the objects qutip.data.add
, qutip.data.pow
and
qutip.data.matmul
are some examples of dispatchers in the data layer.
Respectively, these have the signatures
data.add(left: Data, right: Data, scale: complex = 1) -> Data
data.pow(matrix: Data, n: integer) -> Data
data.matmul(left: Data, right: Data) -> Data
These are callable functions, so the base use is to call them.
Just like to
, key-lookup syntax can be used to get a single callable
object representing a single specialisation. The callable object has an
attribute direct
which is True
if no type conversions would need to take
place, and False
is at least one would have to happen. Just like in the
regular call, you can either specify or not specify the type of the output, but
the types of the inputs must always be given.
>>> data.pow[data.CSR]
<direct specialisation (CSR, CSR) of pow>
>>> data.pow[data.CSR].direct
True
>>> data.pow[data.CSR, data.Dense].direct
False
The returned object is callable with the same signature as the dispatcher
(except the out
keyword argument is no longer there), and requires that the
inputs match the types stated.
Adding New Specialisations¶
New specialisations can be added to a pre-existing dispatcher with the
qutip.data.Dispatcher.add_specialisations()
method. This is very similar
in form to qutip.data.to.add_conversions()
; it takes lists of tuples,
where the first elements of the tuple define the types in the specialisation,
and the last is the specialised function itself.
For example, a user might need to multiply Dense @ CSR
frequently and get a
Dense
output. Currently, there is no direct specialisation for this:
>>> data.matmul[Dense, CSR, Dense]
<indirect specialisation (Dense, CSR, Dense) of matmul>
The user may then choose to define their own specialisation to handle this case efficiently:
>>> def matmul_1(left: Dense, right: CSR) -> Dense:
... # [...]
... return out
They would give this to matmul
by calling
>>> data.matmul.add_specialisations([
... (Dense, CSR, Dense, matmul_1),
... ])
Now we find
>>> data.matmul[Dense, CSR, Dense]
<direct specialisation (Dense, CSR, Dense) of matmul>
Additionally, the whole lookup table will be rebuilt taking this new
specialisation into account, which means the indirect specialisation
matmul(Dense, CSR) -> CSR
will now make use of this new method, because it
has a low conversion weight.
Adding New Types¶
Now let’s say the user wants to add a new NewDataType
type all across QuTiP.
The only action they must take is to tell qutip.data.to
about this new
type. Let’s say they define it like this:
>>> class NewDataType:
... # [...]
>>> def new_from_dense(matrix: data.Dense) -> NewDataType:
... # [...]
>>> def dense_from_new(matrix: NewDataType) -> data.Dense:
... # [...]
>>> data.to.add_conversions([
... (NewDataType, data.Dense, new_from_dense),
... (data.Dense, NewDataType, dense_from_new),
... ])
As we saw in the previous section, this is enough to define conversions between all pairs of types. What’s more, this is also enough to define all operations in the data layer as well:
>>> data.matmul[NewDataType, data.CSR]
<indirect specialisation (NewDataType, CSR, CSR) of matmul>
All of the data layer will now work seamlessly with the new type, even though
this is actually achieved by conversion to and from a known data type. There
was no need to call anything other than qutip.data.to.add_conversions()
.
Internally, this is achieved by qutip.data.Dispatcher.__init__()
storing a
reference to itself in to
, and to
calling
rebuild_lookup()
as part of
add_conversions()
.
Now the user only needs to add in the specialisations that they actually need
for the bottle-neck parts of their application, and leave the dispatcher to
handle all other minor components by automatic conversion. As in the previous
subsection, they do this by calling
add_specialisations()
on the relevant operations.
Creating a New Dispatcher¶
In most user-defined functions which operate on qutip.Qobj.data
it will
be completely sufficient for them to simply call
data.to(desired_type, input_data)
on entry to the function, and then they
can guarantee that they are always working with the type of data they support.
However, in some cases they may want to support dispatched operations in the
same way that we do within the library code. For this reason, the data layer
exports Dispatcher
as a public symbol. The minimal amount of
work that needs to be done is to call the initialiser, and then call
add_specialisations()
. For example, let’s say the
user has defined two specialisations for their simple new function
add_square
:
>>> def add_square_csr(left, right):
... return data.add_csr(left, data.matmul_csr(right, right))
...
>>> def add_square_dense(left, right):
... return data.add_dense(left, data.matmul_dense(right, right))
...
(Ignore for now that this would be better achieved by just using the dispatchers
add
and matmul
directly.) Now they create
the dispatcher simply by doing
>>> add_square = data.Dispatcher(add_square_csr, inputs=('left', 'right'), name='add_square', out=True)
>>> add_square.add_specialisations([
... (data.CSR, data.CSR, data.CSR, add_square_csr),
... (data.Dense, data.Dense, data.Dense, add_square_dense),
... ])
This is enough for Dispatcher
to have extracted the signature
and satisfied all of the specialisations. Note that the inputs
argument
does not provide the signature, it tells the dispatcher which arguments are
data-layer types it should dispatch on, e.g. for pow
as
defined above inputs = ('matrix',)
, but the signature is
(matrix, n) -> out
. See that the specialisations are now complete:
>>> add_square
<dispatcher: add_square(left, right)>
>>> add_square[data.Dense, data.CSR, data.CSR]
<indirect specialisation (Dense, CSR, CSR) of add_square>
In the initialisation, the function add_square_csr
is passed as an example
from which Dispatcher
extracts the call signature, the module
name and the docstring (if it exists). It is not actually added as a
specialisation until add_square.add_specialisations
is called afterwards.
If desired, the user can set or override the docstring for the resulting
dispatcher by directly writing to the __doc__
attribute of the object. We
always do this within the library.
Note
Within the Cython components of the library, we manually construct the
signature and pass it into the qutip.data.Dispatcher
constructor
because Cython-compiled functions do not embed their signature in a manner in
which inspect.signature
can extract it (even with the
cython.embedsignature
directive). We also use this to cut out some
arguments in the call signatures which would not work with the dispatch
mechanism (like out
parameters).
Other Features¶
The Dispatcher
can operate on a function with any call
signature (except ones which use *args
or **kwargs
), even if not all of
the arguments are data-layer types. At definition, the creator of the
Dispatcher
says which input arguments are meant to be
dispatched on, and whether the output should be dispatched on, and all other
arguments are passed through like normal.
How It Works¶
Calling a Dispatcher
happens in five stages:
bind parameters to the call signature
resolve all dispatch rules into the best known specialisation
convert dispatch parameters to match the chosen specialisation
call the specialisation
if necessary, convert the output to the correct type
First, the arguments have to be bound from the generic signature (*args,
**kwargs)
to match the actual call signature of the underlying functions so
that the arguments which need dispatching are identified. This is done by the
internal class _bind
, which is constructed from the
signature of the specialisations when the Dispatcher
is
instantiated.
Once we have the arguments, we have to resolve all the dispatch rules to know
which specialisation we should use. In the most common scenario, the only
active dispatch rules will be either global rules defined in the QuTiP
options, or instance rules defined on this particular
Dispatcher
—there will not be any local rules passed in
the function call. If this is the case, the Dispatcher
will
already have a pre-built lookup table, and choosing the correct specialisation
is simply a hash lookup on the types. This is linear complexity in the number
of types dispatched on, whereas the naïve if/elif/else
structure would go as
\(\mathcal O(n^m)\), where \(n\) is the number of known
data-layer types, and \(m\) is the number of dispatch parameters.
The next most common case would be that the only local rule is to fix the output type. In this case, the secondary lookup table is used to avoid having to compute a proper set of weights.
The final case for resolving rules is the most general one: we have been given some proper dispatch rules that we have to handle at call time to resolve which specialisation we must choose. Here, we have no choice but to recalculate the correct specialisation, which comes with some (hopefully small) runtime cost.
The last three steps of the dispatch operation are simple. Now that we have the
specialisation we are going to use, we simply use to
to
convert the given inputs to the correct types, we call the function, and we
return the answer (possible converting it too).
Building the lookup table is a relatively simple process, but asymptotically it has very poor complexity. This is why as much as possible is done ahead of time, at library initialisation or when a new type is added. To build the tables, we have to compare every possible set of inputs to every known specialisation, and then choose the specialisation which has the least “weight” given the defined rules.
Implementation Details¶
The backing specialisations can be found in
qutip.Dispatcher._specialisations
, and the complete lookup table is in
qutip.Dispatcher._lookup
. These are marked as private, because messing
around with them will almost certainly cause the dispatcher to stop working.
Only one specialisation needs to be defined for a dispatcher to work with all
data types known by to
. We achieve this because
to
guarantees that all possible conversions between data
types will exist, so Dispatcher
can always convert its inputs
into those which will match one of its known specialisations.
Within the initialisation of the data layer, we use a “magic” _defer
keyword
argument to add_specialisations()
to break a
circular dependency. This is because the “type” modules qutip.data.csr
and qutip.data.dense
depend on some mathematical modules (e.g.
add
and matmul
) to provide the __add__
and similar methods on the types. For ease of development we want the
dispatchers to be defined in the same modules that all the specialisations are
(though this is not at all necessary), but the dispatchers require
to
to be populated with the types before specialisations can
be added. The _defer
keyword here just defers the building of the lookup
table until an explicit call to rebuild_lookup()
,
breaking the cycle. The user will never need to do this, because by the time
they receive the Dispatcher
object, to
is
already initialised to a minimum degree.
Efficiency Notes¶
The specialisations returned by the __getitem__
lookups are not
significantly faster than just calling the dispatcher directly, because the bulk
of the heavy lifting is done when
add_specialisations()
or
rebuild_lookup()
is called. On call, the generic
signature (*args, **kwargs)
has to be bound to the actual signature of the
underlying operation, regardless of whether the specialisation has already been
found. At the Cython level there is short-circuit access to the call machinery
in the specialisations themselves, but this cannot be safely exposed outside of
the qutip.data.Dispatcher
class.