Development

This page contains documentation on the development of the package, including the architecture and design decisions behind the package.

Architecture

The parser is split into two components: a Rust core that implements all the lexing logic and the vast majority of the parsing logic; and a Python interpreter for a small “bytecode” domain-specific langauge the Rust component outputs. The aim is that as little as possible should be done in Python space, because Python interpretation is slow.

The Rust components produce a small Python interface using PyO3. For the most part, the only API surface between the Rust and Python components is a small DSL, bound using PyO3. Custom iterator objects are directly returned from Rust, which iterator over this DSL. The Python components assume that the bytecode produces a completely valid program; Rust is responsible for raising exceptions because of an invalid OpenQASM 2 file, or a file that cannot be converted to Qiskit format for any reason.

Both lexing and parsing are done in Rust by hand. This is largely because I wanted to do this for my own experience—I do know about nom. The lexer uses a simple LL(1) tokenisation strategy for all symbols, numbers and comments, and only changes behaviour when lexing a text-like symbol. Here, the entire identifier-like symbol is read in, and the lexer then decides whether to emit a relevant keyword token or arbitrary identifier token depeneding on the content. Technically I suppose that means it has arbitrary lookahead at that point.

The parser is principally a hand-written LL(1) recursive-descent parser. The exception to this is in expression contexts; here, a short-lived (also LL(1)) operator-precedence parser is spawned using the same token stream to parse a single expression.

Gate definitions are handled by storing the bytecode for that gate inside a Qiskit Gate object, and having its _define() method contain the very stripped-down version of the bytecode interpreter needed to evaluate this subset of the code. This lets us build the gate objects lazily; when we place calls to these gates into the circuit, we don’t need to evaluate their definitions until the user actually calls for it to happen. Unfortunately the PyO3 types in the bytecode interpreter aren’t inherently pickleable, so to handle this, we have to eagerly create the definition and throw away the bytecode at that point.

Testing

The vast majority of the tests only use loads(). This is mostly deliberate. Rust has to be responsible for opening the files when using load(), so we can’t use any of Python’s in-memory file-like objects such as io.TextIO. When we want to test load(), we have to have an actual file object. We could parametrise by having every single test case in a separate file, and for the loads() test we read in the whole string first. I don’t like this form, though, beacuse it makes the actual test hard to read; the OpenQASM code ends up in a different place to the Qiskit generating code, making it hard to quickly verify what is happening.

Instead, I mostly use loads() as the test. The implementation in Rust is generic over both, and both are immediately abstracted into a single impl BufRead in the lexer, so should have next to no differences. The tests of the examples in the arXiv paper are parametrised over both as a check, but all the rest only use loads() to make the tests more readable.

Coverage

Code coverage metrics for both the Python and Rust components can be generated with the tox environment coverage. Additionally, after this run, one can also generate a set of HTML pages graphical illustrating the coverage by running the coverage-html environment, such as by

tox -e coverage,coverage-html

These environments have some additional non-Python dependencies that must be installed separately. These are the llvm-tools-preview component for rustup, Mozilla’s grcov tool for aggregating coverage data from instrumented Rust code, and the lcov package.

  • llvm-tools-preview can be installed using rustup by running

    rustup component add llvm-tools-preview
    
  • grcov is most easily installed by running

    cargo install grcov
    
  • The lcov package (which provides the binaries lcov and genhtml) is likely available through your system package manager, if on Linux or Mac. For example, on Ubuntu it can be installed with

    sudo apt install lcov
    

    and on Mac via Homebrew it can be installed with

    brew install lcov
    

After the coverage-html environment has been successfully executed, one can open the generated HTML coverage information by opening the file coverage/index.html. The raw coverage information file (in LCOV format) will be coverage.info in the repository root.

Note

Running the coverage tox environment causes the compiled Rust code in the working directory for editable installs to be recompiled and instrumented for profiling data. You might want to manually rebuild the Rust extension module with

python setup.py build_rust --inplace [--release]

after using the coverage job, or all your uses of the compiled module will continue generating individual coverage data.