Exploratory research note. This work is being shaped toward a publishable paper, but the results described here should be read as “work under validation”. Github/codeberg link coming once I have had a strong coffee and cleaned up the repo.
Kalman filtering is fascinating and has a wide area of application. How can it be applied to QML models to provide some sort of “self-estimation” of state? This was the fundamental question here that drove me to go on a rather extensive exploration of this topic. In this post I will describe the current position of Self-Estimating Quantum Neural Networks (SEQNN).
SEQNN, is an attempt to treat the parameters of a variational quantum model as latent states to be estimated sequentially, rather than as weights managed by a global optimizer.
The current estimator implementation is distributed in the sense that each qubit keeps its own local extended kalman filter (EKF) state, while overlapping two-qubit measurement factors pass Jacobian/Fisher-derived updates into the relevant qubit filters.
Most variational quantum circuit training is framed as optimization:
theta,theta with Adam, gradient descent, SPSA, or another optimizer.SEQNN reframes the same problem as sequential estimation:
The important difference is not only “which update rule is used.” Rather, when the model is written as an estimator, uncertainty, identifiability, calibration, and measurement attribution become terms we can use to reason with and build our algorithms around. Is that helpful? Lets see what the outcome is here.
| Stage | Status |
|---|---|
Stage 1: local RY static recovery | complete |
Stage 2: local RY drift tracking | complete |
Stage 3: coupled/correlated local RY drift | implemented, not a topology-win claim |
Stage 4: local RY+RZ with XYZ observations | complete and calibrated |
| Stage 5: entangled generative SEQNN | active, publication-focused validation mostly underway |
| Stage 6: augmented noise-state EKF | not started |
The four-qubit overlapping RY+CNOT chain is now the strongest Stage 5 model. The 50-seed core run passed, and the first 20-seed drift_sigma x n_shots sensitivity grid passed the initial publication check: calibrated in all nine grid points, better than Adam/GD/SPSA tracking in all nine, and the outright tracking winner in eight of nine. The one exception is the high-drift, low-shot edge case drift_sigma=0.02, n_shots=50, where the centralized EKF is barely ahead.
For the current experiments, the quantum system is a finite register of qubits
$$ \mathcal{H}_Q = \bigotimes_{i=0}^{n-1} \mathbb{C}^2 . $$
The unknown parameter vector is a latent state
$$ x_t = \theta_t \in [0,\pi]^n . $$
In the drift experiments, the latent state follows a bounded random walk,
$$ \theta_{t+1} = \Pi_{[0,\pi]^n}\left(\theta_t + \eta_t\right), \qquad \eta_t \sim \mathcal{N}(0,Q). $$
The circuit defines an observation map
$$ h(\theta_t) = \mathbb{E}[z_t \mid \theta_t], $$
and finite-shot sampling gives a measurement
$$ z_t = h(\theta_t) + \epsilon_t, \qquad \epsilon_t \sim \mathcal{N}(0,R_t) $$
as the Gaussian approximation used by the EKF. For Bernoulli or multinomial shot counts, R_t is induced by the same probabilities used to sample the observations. In the two-marginal RY+CNOT factor used in Stage 5, for example,
$$ R_t = \frac{1}{N} \begin{bmatrix} p_0(1-p_0) & p_{00}-p_0p_1 \\ p_{00}-p_0p_1 & p_1(1-p_1) \end{bmatrix} + \epsilon I , $$
where N is the shot count, p_0 and p_1 are the two marginal readout probabilities, and p_{00} is the joint probability of outcome 00.
Each EKF update has the usual predict/update form:
$$ \hat{x}_{t|t-1} = f(\hat{x}_{t-1|t-1}), \qquad P_{t|t-1} = F_t P_{t-1|t-1} F_t^\top + Q_t . $$
The innovation is
$$ \nu_t = z_t - h(\hat{x}_{t|t-1}), $$
with innovation covariance
$$ S_t = H_t P_{t|t-1} H_t^\top + R_t . $$
The Kalman gain and posterior update are
$$ K_t = P_{t|t-1} H_t^\top S_t^{-1}, $$
$$ \hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t\nu_t, \qquad P_{t|t} = (I-K_tH_t)P_{t|t-1}. $$
In a supervised classifier this can look like a covariance-aware optimizer. In the generative experiments, however, the measurements are actually sampled from a known latent quantum model.
The project moved through increasingly demanding observation models. The key pattern is: do not add parameters, entanglement, or topology until the observation equation says what can actually be observed.
| Stage/model | Observation structure | What it tests | Claim status |
|---|---|---|---|
Local RY static | one local parameter per wire, local Z-basis probability | parameter recovery in an identifiable one-parameter model | complete |
Local RY drift | same observation, but moving latent state | whether sequential Kalman tracking matters | complete |
Coupled/correlated local RY drift | local observations with correlated latent dynamics | whether neighbor information can matter in principle | implemented, not a topology claim |
Local RY+RZ with XYZ readout | two local parameters per wire, three local bases | richer local identifiability | complete and calibrated |
Disjoint entangled RY+CNOT blocks | two-qubit CNOT blocks, but independent blocks | block attribution under entanglement | useful negative/diagnostic slice |
Overlapping entangled RY+CNOT chain | nearest-neighbor factors share qubit parameters | distributed attribution through measurement factors | current Stage 5 publication target |
The minimal entangled two-qubit factor is
$$ RY(\theta_i) \otimes RY(\theta_j) \quad\text{followed by}\quad CNOT(i,j). $$
For one two-qubit factor, the joint Z-basis probabilities are
$$ \begin{aligned} p_{00} &= c_i c_j, \\ p_{01} &= c_i s_j, \\ p_{10} &= s_i s_j, \\ p_{11} &= s_i c_j, \end{aligned} $$
where
$$ c_i = \cos^2(\theta_i/2), \qquad s_i = \sin^2(\theta_i/2). $$
The EKF observation uses two marginals from the same four-outcome sample,
$$ h_{ij}(\theta_i,\theta_j) = \begin{bmatrix} P(q_i=0) \\ P(q_j=0) \end{bmatrix}. $$
The overlapping chain uses nearest-neighbor factors
$$ (0,1),\quad (1,2),\quad (2,3) $$
so the full observation map is a stack of factor observations:
$$ h(\theta) = \begin{bmatrix} h_{01}(\theta_0,\theta_1) \\ h_{12}(\theta_1,\theta_2) \\ h_{23}(\theta_2,\theta_3) \end{bmatrix}. $$
This matters because the middle parameters appear in more than one measurement factor. The observation model is no longer simply “one node, one private measurement.” The estimator must decide how to attribute residuals across shared parameters.
| Model | Role | State/covariance structure | What it explains |
|---|---|---|---|
| Adam | optimizer baseline | point estimate, no posterior covariance | whether a standard adaptive optimizer tracks the same sampled streams well |
| Gradient descent | optimizer baseline | point estimate, no posterior covariance | whether a tuned first-order method is enough |
| SPSA | optimizer baseline | point estimate, stochastic perturbation gradient | whether low-measurement stochastic optimization is competitive |
Independent SEQNN-EKF | negative control | one scalar EKF per qubit, no measurement-factor coupling | shows that naive per-node filtering fails when observations are entangled/coupled |
SEQNN-EKF centralized | reference/control | one full EKF over all parameters | shows what a non-distributed covariance-aware estimator can do |
SEQNN-EKF Fisher factors | promoted Stage 5 estimator | one scalar EKF state per qubit, updated through overlapping two-qubit factors | tests whether distributed local filters can use entangled measurement information without becoming a full centralized EKF |
The key contrast is between the three SEQNN variants:
SEQNN-EKF is too local for the overlapping entangled chain.
The disjoint two-qubit block experiments were useful but not sufficient. They showed that the block EKF can be calibrated, but they did not produce a clean SEQNN advantage against tuned optimizer baselines. Shared drift alone was also not enough when the estimator remained block-independent.
The overlapping chain changes the structure. Adjacent factors share qubit-owned parameters:
$$ \theta_1 \text{ appears in } h_{01} \text{ and } h_{12}, \qquad \theta_2 \text{ appears in } h_{12} \text{ and } h_{23}. $$
That gives the estimator a principled path for cross-node attribution through the measurement model itself. Fisher-factor SEQNN uses each factor's local Jacobian and covariance to update the two qubit filters touched by that factor. The implementation keeps the local scalar covariance state at each qubit and records the cross-covariance that would have existed in a full factor update as a diagnostic.
This is the central Stage 5 idea: use the factor graph induced by the entangled observation model, not a generic communication topology, to decide where information should flow.
The current four-qubit overlapping RY+CNOT chain result supports a narrow but meaningful claim:
SEQNN-EKF Fisher factors;drift_sigma x n_shots sensitivity grid passed the first validation check;This is evidence for entangled generative tracking with a distributed Fisher-factor estimator. It is not yet evidence for a broad topology-diffusion claim.
The remaining work is about strengthening the evidence, not changing the core idea.
drift_sigma=0.02, n_shots=50.3×3 drift/shot grid at 50 seeds.RY+CNOT chain.Rather:
In an identifiable overlapping entangled generative circuit, a distributed SEQNN estimator using Fisher/Jacobian-derived measurement-factor updates can track drifting quantum parameters from finite-shot observations, while preserving meaningful calibration diagnostics and outperforming standard optimizer baselines on the same sampled streams in the current validation regime.