Trace: seqnn

This is an old revision of the document!


Ramble Meter

Ramble Meter
This post is very close to being completely finished. Not that rambly at all.

SEQNN Stage 5: Entangled Generative Estimation

Exploratory research note. This work is being shaped toward a publishable paper claim, but the results described here should be read as “work under validation”.

Introduction

Kalman filters are fascinating. How can they be applied to QML models to provide some sort of “self-estimation” of state to be used in parameter updates? This was the fundamental question here that drove me to go on a rather extensive exploration of this topic. In this post I will describe the current position of Self-Estimating Quantum Neural Networks (SEQNN).

SEQNN, is an attempt to treat the parameters of a variational quantum model as latent states to be estimated sequentially, rather than as ordinary weights to be pushed around by a global optimizer.

The current work is near the end of Stage 5 in FUTURE_WORK.md. Earlier stages established generative Kalman filtering on identifiable local circuits. Stage 5 asks the harder question: can a SEQNN-style estimator remain useful when the measurements come from an entangled circuit, so that one observed residual may contain information about parameters owned by more than one node?

The current promoted estimator is SEQNN-EKF Fisher factors. It is distributed in the sense that each qubit keeps its own local EKF state, while overlapping two-qubit measurement factors pass Jacobian/Fisher-derived updates into the relevant qubit filters. It is not an innovation-diffusion or topology claim yet.

The idea

Most variational quantum circuit training is framed as optimization:

  • choose parameters theta,
  • evaluate a circuit loss,
  • update theta with Adam, gradient descent, SPSA, or another optimizer.

SEQNN reframes the same problem as sequential estimation:

  • the unknown parameters are a latent state,
  • the quantum circuit is an observation model,
  • shot samples are noisy measurements,
  • each node maintains both an estimate and an uncertainty.

The important difference is not only “which update rule is used.” The claim being explored is structural: when the model is written as an estimator, uncertainty, identifiability, calibration, and measurement attribution become first-class objects instead of after-the-fact diagnostics.

Roadmap position

Stage Status
Stage 1: local RY static recovery complete
Stage 2: local RY drift tracking complete
Stage 3: coupled/correlated local RY drift implemented, not a topology-win claim
Stage 4: local RY+RZ with XYZ observations complete and calibrated
Stage 5: entangled generative SEQNN active, publication-focused validation mostly underway
Stage 6: augmented noise-state EKF not started

The practical position is this: the four-qubit overlapping RY+CNOT chain is now the strongest Stage 5 model. The 50-seed core run passed, and the first 20-seed drift_sigma x n_shots sensitivity grid passed the initial publication check: calibrated in all nine grid points, better than Adam/GD/SPSA tracking in all nine, and the outright tracking winner in eight of nine. The one exception is the high-drift, low-shot edge case drift_sigma=0.02, n_shots=50, where the centralized EKF is barely ahead.

The generative formulation

For the current experiments, the quantum system is a finite register of qubits

$$ \mathcal{H}_Q = \bigotimes_{i=0}^{n-1} \mathbb{C}^2 . $$

The unknown parameter vector is a latent state

$$ x_t = \theta_t \in [0,\pi]^n . $$

In the drift experiments, the latent state follows a bounded random walk,

$$ \theta_{t+1} = \Pi_{[0,\pi]^n}\left(\theta_t + \eta_t\right), \qquad \eta_t \sim \mathcal{N}(0,Q). $$

The circuit defines an observation map

$$ h(\theta_t) = \mathbb{E}[z_t \mid \theta_t], $$

and finite-shot sampling gives a measurement

$$ z_t = h(\theta_t) + \epsilon_t, \qquad \epsilon_t \sim \mathcal{N}(0,R_t) $$

as the Gaussian approximation used by the EKF. For Bernoulli or multinomial shot counts, R_t is induced by the same probabilities used to sample the observations. In the two-marginal RY+CNOT factor used in Stage 5, for example,

$$ R_t = \frac{1}{N} \begin{bmatrix} p_0(1-p_0) & p_{00}-p_0p_1 \\ p_{00}-p_0p_1 & p_1(1-p_1) \end{bmatrix} + \epsilon I , $$

where N is the shot count, p_0 and p_1 are the two marginal readout probabilities, and p_{00} is the joint probability of outcome 00.

EKF update in the SEQNN view

Each EKF update has the usual predict/update form:

$$ \hat{x}_{t|t-1} = f(\hat{x}_{t-1|t-1}), \qquad P_{t|t-1} = F_t P_{t-1|t-1} F_t^\top + Q_t . $$

The innovation is

$$ \nu_t = z_t - h(\hat{x}_{t|t-1}), $$

with innovation covariance

$$ S_t = H_t P_{t|t-1} H_t^\top + R_t . $$

The Kalman gain and posterior update are

$$ K_t = P_{t|t-1} H_t^\top S_t^{-1}, $$

$$ \hat{x}_{t|t} = \hat{x}_{t|t-1} + K_t\nu_t, \qquad P_{t|t} = (I-K_tH_t)P_{t|t-1}. $$

In a supervised classifier this can look like a covariance-aware optimizer. In the generative experiments, however, the measurements are actually sampled from a known latent quantum model. That is why the generative line is the cleaner foundation for publication claims about estimation, uncertainty, and calibration.

Model ladder

The project deliberately moved through increasingly demanding observation models. The key pattern is: do not add parameters, entanglement, or topology until the observation equation says what can actually be observed.

Stage/model Observation structure What it tests Claim status
Local RY static one local parameter per wire, local Z-basis probability parameter recovery in an identifiable one-parameter model complete
Local RY drift same observation, but moving latent state whether sequential Kalman tracking matters complete
Coupled/correlated local RY drift local observations with correlated latent dynamics whether neighbor information can matter in principle implemented, not a topology claim
Local RY+RZ with XYZ readout two local parameters per wire, three local bases richer local identifiability complete and calibrated
Disjoint entangled RY+CNOT blocks two-qubit CNOT blocks, but independent blocks block attribution under entanglement useful negative/diagnostic slice
Overlapping entangled RY+CNOT chain nearest-neighbor factors share qubit parameters distributed attribution through measurement factors current Stage 5 publication target

Diagram: model progression

Stage 1 local RY static recovery Stage 2 local RY drift tracking Stage 4 local RY+RZ XYZ readout identifiable phi Stage 5a disjoint CNOT blocks diagnostic slice Current Stage 5 target overlapping RY+CNOT chain per-qubit EKFs plus Fisher-factor measurement updates theta0 theta1 theta2 theta3 disjoint blocks were not enough

The Stage 5 observation model

The minimal entangled two-qubit factor is

$$ RY(\theta_i) \otimes RY(\theta_j) \quad\text{followed by}\quad CNOT(i,j). $$

For one two-qubit factor, the joint Z-basis probabilities are

$$ \begin{aligned} p_{00} &= c_i c_j, \\ p_{01} &= c_i s_j, \\ p_{10} &= s_i s_j, \\ p_{11} &= s_i c_j, \end{aligned} $$

where

$$ c_i = \cos^2(\theta_i/2), \qquad s_i = \sin^2(\theta_i/2). $$

The EKF observation uses two marginals from the same four-outcome sample,

$$ h_{ij}(\theta_i,\theta_j) = \begin{bmatrix} P(q_i=0) \\ P(q_j=0) \end{bmatrix}. $$

The overlapping chain uses nearest-neighbor factors

$$ (0,1),\quad (1,2),\quad (2,3) $$

so the full observation map is a stack of factor observations:

$$ h(\theta) = \begin{bmatrix} h_{01}(\theta_0,\theta_1) \\ h_{12}(\theta_1,\theta_2) \\ h_{23}(\theta_2,\theta_3) \end{bmatrix}. $$

This matters because the middle parameters appear in more than one measurement factor. The observation model is no longer simply “one node, one private measurement.” The estimator must decide how to attribute residuals across shared parameters.

Models compared in Stage 5

Model Role State/covariance structure What it tells us
Adam optimizer baseline point estimate, no posterior covariance whether a standard adaptive optimizer tracks the same sampled streams well
Gradient descent optimizer baseline point estimate, no posterior covariance whether a tuned first-order method is enough
SPSA optimizer baseline point estimate, stochastic perturbation gradient whether low-measurement stochastic optimization is competitive
Independent SEQNN-EKF negative control one scalar EKF per qubit, no measurement-factor coupling shows that naive per-node filtering fails when observations are entangled/coupled
SEQNN-EKF centralized reference/control one full EKF over all parameters shows what a non-distributed covariance-aware estimator can do
SEQNN-EKF Fisher factors promoted Stage 5 estimator one scalar EKF state per qubit, updated through overlapping two-qubit factors tests whether distributed local filters can use entangled measurement information without becoming a full centralized EKF

The key contrast is between the three SEQNN variants:

  • Independent SEQNN-EKF is too local for the overlapping entangled chain.
  • Centralized EKF has the cleanest covariance story, but it is not the architecture being promoted.
  • Fisher factors are the current compromise: local node ownership plus factor-level updates derived from the observation Jacobian and Fisher information.

Diagram: the three SEQNN estimator structures

Independent SEQNN-EKF one local filter per qubit EKF0 EKF1 EKF2 EKF3 good control, wrong attribution for entangled observations Fisher factors local filters plus overlapping factors 0 1 2 3 promoted distributed estimator for the current Stage 5 claim Centralized EKF one full state and covariance theta0 theta1 theta2 theta3 strong reference/control not the promoted architecture

Why Fisher factors are the current focus

The disjoint two-qubit block experiments were useful but not sufficient. They showed that the block EKF can be calibrated, but they did not produce a clean SEQNN advantage against tuned optimizer baselines. Shared drift alone was also not enough when the estimator remained block-independent.

The overlapping chain changes the structure. Adjacent factors share qubit-owned parameters:

$$ \theta_1 \text{ appears in } h_{01} \text{ and } h_{12}, \qquad \theta_2 \text{ appears in } h_{12} \text{ and } h_{23}. $$

That gives the estimator a principled path for cross-node attribution through the measurement model itself. Fisher-factor SEQNN uses each factor's local Jacobian and covariance to update the two qubit filters touched by that factor. The implementation keeps the local scalar covariance state at each qubit and records the cross-covariance that would have existed in a full factor update as a diagnostic.

This is the central Stage 5 idea: use the factor graph induced by the entangled observation model, not a generic communication topology, to decide where information should flow.

Current Stage 5 evidence

The current four-qubit overlapping RY+CNOT chain result supports a narrow but meaningful claim:

  • the observation model is identifiable for the configured chain;
  • the promoted estimator is SEQNN-EKF Fisher factors;
  • the 50-seed core run passed;
  • the 20-seed drift_sigma x n_shots sensitivity grid passed the first validation check;
  • Fisher factors were calibrated at all nine sensitivity grid points;
  • Fisher factors beat Adam/GD/SPSA on tracking in all nine grid points;
  • Fisher factors were the outright tracking winner in eight of nine grid points;
  • the one exception was the high-drift, low-shot edge case, where centralized EKF barely won.

This is evidence for entangled generative tracking with a distributed Fisher-factor estimator. It is not yet evidence for a broad topology-diffusion claim.

What remains for the Stage 5 publication path

The remaining work is about strengthening the evidence package, not changing the core idea.

  1. Decide whether to run a 50-seed sensitivity confirmation.
    • Cheap option: only rerun drift_sigma=0.02, n_shots=50.
    • Strong option: rerun the full 3×3 drift/shot grid at 50 seeds.
  2. Add the scaling slice.
    • Run a six- or eight-qubit overlapping RY+CNOT chain.
    • Keep the same Fisher-factor estimator.
    • Keep the same Adam/GD/SPSA baselines.
    • Keep the same digest structure.
  3. Keep centralized EKF as a reference/control.
    • It is useful for comparison.
    • It is not the architecture being promoted.
    • It helps show where distributed Fisher factors remain stable or attractive as the chain grows.

What we are not claiming yet

This is important for the eventual paper framing.

  • Not a quantum advantage claim.
  • Not a hardware claim; these are controlled generative simulations.
  • Not an arbitrary-entangled-circuit claim.
  • Not a topology diffusion claim.
  • Not an innovation diffusion claim.
  • Not a noise-state estimation claim.
  • Not a claim that every SEQNN variant works; the independent EKF is intentionally a negative control in Stage 5.

The narrower claim is stronger:

In an identifiable overlapping entangled generative circuit, a distributed SEQNN estimator using Fisher/Jacobian-derived measurement-factor updates can track drifting quantum parameters from finite-shot observations, while preserving meaningful calibration diagnostics and outperforming standard optimizer baselines on the same sampled streams in the current validation regime.

Publication framing

A good paper story is emerging:

  1. Start with the problem: VQC training usually treats circuit parameters as optimizer variables, not latent states.
  2. Introduce SEQNN: local EKF estimators attached to quantum circuit parameters.
  3. Distinguish supervised training from true generative estimation.
  4. Show the model ladder from local identifiable circuits to richer local observations.
  5. Use Stage 5 as the main contribution: entangled generative tracking through an overlapping measurement-factor graph.
  6. Compare against Adam, GD, SPSA, independent SEQNN-EKF, and centralized EKF.
  7. Be explicit that Fisher factors are the promoted distributed architecture.
  8. Leave topology diffusion and augmented noise-state EKF as future work.

The paper claim should be written as exploratory but concrete. The work is not finished because the scaling slice is still needed and the 50-seed sensitivity decision remains open. But Stage 5 now has a defensible center: the overlapping RY+CNOT chain shows why SEQNN is more than “EKF as another optimizer.” It is a way to make the measurement model, the uncertainty, and the information flow part of the learning architecture itself.

Suggested abstract-style summary

Self-Estimating Quantum Neural Networks frame variational quantum circuit training as sequential Bayesian estimation rather than global optimization. In the current generative benchmark line, circuit parameters are latent states, finite-shot measurements are observations, and each node maintains an Extended Kalman Filter over its local parameter uncertainty. We report exploratory Stage 5 results on an overlapping entangled RY+CNOT chain, where adjacent two-qubit measurement factors share qubit-owned parameters. A distributed Fisher-factor SEQNN estimator updates local qubit filters using Jacobian/Fisher information from the relevant measurement factors. This estimator is compared against Adam, gradient descent, SPSA, an independent per-node EKF control, and a centralized EKF reference on identical sampled streams. The current evidence supports the Fisher-factor estimator as the publication-focused Stage 5 architecture, while explicitly deferring broader topology-diffusion and augmented noise-state claims to future work.

projects/quantum/seqnn.1779340190.txt.gz · Last modified: 2026/05/21 07:09