by Glaive Team
Hello everyone, it is time for our first quarterly report of 2026! This year has already started in a very promising direction and we’re very excited to share what we’ve been up to.
During the last quarter, we wrote about generalised tensors (Videla & Gavranović 2026), an architectural choice underpinning our machine learning framework, as well as autodiff through function types (Hedges 2026a) which re-conceptualises differentiation using additive containers.
We spent a focused week evaluating our progress towards the best way to leverage our research in the learn theorem prover. Currently, our plan is to deploy lenses in a small subset of tactics. Ideally, one that we already understand. We identified that the natural number’s game provide the appropriate middle ground between “real enough” theorem proving and a small and well-defined scope for our experiments. We plan for the outcome of this project to see fruits in Q2 2026 with some changes to the interactive editor experience where tactics are endowed with “types” (informed by the container structure of proofs) to help identify what tactics are compatible with the current goal
Our work on the Poly language and compiler are continuing on schedule. We recently released the first public details in two blog posts (Hedges 2026b, Hedges 2026c) .
As we reported previously, the most difficult challenge with Poly is giving a logically sound syntax to the sequencing product of polynomial functors, which describes sequential application of tactics with the second having access to the result of the first. We initially solved this using a core calculus based on System L (a focussed sequent calculus, also known as $\mu\tilde{\mu}$-calculus), building on Zanzi’s work during Glaive’s previous grant on safeguarded AI from ARIA. This leaves open the challenge of giving an intuitive surface syntax, which we solved by developing an imperative style syntax inspired by Lean tactics themselves. The language has a stack and primitive imperative “push” and “pop” commands, representing a stack of unsolved goals that depend on each other sequentially. Technically speaking this uses the result of Spivak and Srinivasan that Poly is a so-called linearly distributive isomix category (Spivak & Srinivasan 2024).
As part of the experimental compiler methodology we are doing we released one research paper on the theory of typechecking System L (Hedges & Mihejevs 2025) (currently undergoing peer review), with at least one sequel paper planned.
We are aiming to be ready to give a public demonstration of Poly in time for the London meeting and the Applied Category Theory conference in Tallinn, and to make an “early access” public release later in summer. More specifically, our goal by this point is that Poly will be able to embed a very simple toy kernel proof system, such as a propositional natural deduction system over a small term language, and structured tactics for it. This will complement our parallel work on implementing structured tactics directly in Lean, and then later these strands will be unified with the Poly compiler backend targetting Lean. (We also plan to exploit Lean’s advanced language server and powerful reflection capabilities as much as possible in the compiler backend.)
Besides a lot of compiler engineering, the main conceptual work still required for the demo is for recursive datatypes, which means fixpoints of polynomial endofunctors on the category of polynomial endofunctors. While we have a working prototype based on a construction in (Ghani & Hancock 2014), see also (Pradic & Price 2026), it is not especially well understood and the implementation requires overriding the termination checker and using unrestricted recursion.
Besides the compiler backend to Lean, the most obvious feature we plan to omit in the first release is parametric polymorphism (known as generics in many languages). Another major feature on our future roadmap is algebraic effects based on (Grodin & Spivak 2024), something we are already exploiting elsewhere in our work on structure generation to handle probability and sampling in a differentiable program.
To enable LLMs to take full advantage of Polylang and typed tactics, an important problem needs to be solved: ensuring the LLM output is well-typed according to the tactic type. There are two existing approaches to this problem of structured decoding, and both fall short in different ways.
The most widespread approach, such as (Mündler et al 2025) or the Claude documentation, enforces types externally: it does not change what the model wants to say, it just prevents the model from saying certain things. This in turn forces the models to produce output they may have assigned a low-probability to, causing unpredictable behaviour. The more theoretical ones, such as (Vákár & Smeding 2022) or (Abadi et al 2019) enforces types internally by introducing methods for differentiating through structured data. For instance, given a program $f : X \to A + B$ (for instance, modelling that we need to solve either the tactic $A$ or $B$) they prescribe how gradients are computed in the presence of such choice. However, these methods prescribe how to differentiate through structured data, but not with respect to it. This means that require a partitioning of $X$ into $X_A$ and $X_B$, factor factor $f$ through $X \xrightarrow{\cong} X_L + X_R \xrightarrow{f_L + f_R} A + B$. In other words, which tactic is chosen must be specified at compile time, reqlinquishing the model’s ability to learn the choice.
In this project we have made significant progress on solving this problem at a
fundamental level. We have developed and implemented a framework for learning
dependent pairs, a generic structure underlying the generation of all kinds of
tactics, trees, and interactions protocols more generally. In this method, the
model first selects a constructor (i.e. Leaf vs Node for trees, or
different tactics forms), then generates the content appropriate to that
constructor, with possibly different output types per branch. The constructor
selection is done probabilistically: the model produces a distribution, and
awaits a response from the environment signalling either which constructor is
sampled (during inference time) or which constructor matches the ground truth
example (during training) Then, only the produced, and only the corresponding
gradient is received at that constructor.
This framework generalises to arbitrary inductive types by iterating this process, and our next step is integrating it with a toy tactic problem, such as the Natural Numbers Game, in which we can validate the ability of the model to learn specific tactics while being guided by types.