8 December 2025

Q4 2025 report

by Glaive Team

Glaive Research Report Q4 2025

Hello everyone, we wanted to share a progress update on our AI for Math project on Structured Representation of Tactics for Machine-Asissted Theorem Proving.

This project’s goal is to create a high level language able to natively represent proof contexts and goals as a specific kind of types. In this language, tactics are “functions” between proof states (called “dependent lenses”). The hypothesis is that it will enable programmers and machine learning models to compose tactics using types, much like that is done with ordinary functions in ordinary statically typed programming.

This is not the case right now, as all tactics in Lean share the same type, and evaluating whether two tactics can be composed in general requires runtime capabilities. This is an important issue, as the recently released AlphaProof paper shows. Its section 5.1.2. details the challenges required to execute tactics as they are currently implemented in Lean most notably requring compilation of often-used tactics to C and various monitoring of internal resource consumption to match the high throughput demands of reinforcement learning training. Many of such runtime hacks are rendered unnecessary by our approach, as the high-level algebraic language can often inform us that a particular combination of tactics need not be run at all because of their type incompatibility.

What follows is not mean to be a detailed research document, but rather a quick overview of the three angles our work is progressing from, to give you a sense of what we are working on and why.

Polylang

Polylang is the working name of both the tactic description language and its compiler that we are developing. While in most languages terms denote functions between sets, Polylang is designed so that they denote natural transformations between polynomial functors. Currently the compiler frontend is roughly 50% complete for minimal functionality (scopechecking and pattern elaboration are done, typechecking is in progress and parsing is still todo), and we also have a backend interpreter for testing purposes. Once the frontend is complete, the remaining step is a backend targetting Lean tactics. The most difficult part of the design so far was the sequencing/substitution product of polynomials, which is needed for sequential application of tactics; from the perspective of Polylang this appears as an unusual control flow operator, making our typesystem “semi-classical”. Once complete, this compiler will become our testbed for research in both the structure of tactics and structured code generation.

Tactics

While the work on Polylang progresses, we’ve already validated our primary ideas by encoding them in a dependently typed programming language Idris. This environment gives us the tools to embed different calculi, such as sequent calculus and intuitionistic logic, and study the structure around theorem provers for them.

Interestingly capabilities for those proof systems are modelled with Copara over the sequencing product, a feature that was discovered in Polylang as well. Capabilities here are akin to algebraic effects, where a notion of tag allows to track what sort of behaviour the program is meant to exhibit. What is more challenging is that we do not have any distributivity laws between Copara and the relevant monads that represent goal manipulation in a theorem prover. The upcoming work is focused on translating more of the proof state in the language in particular, elaboration and unification rules in such a way that the entire specification of the solver can be represented in $\mathbf{Poly}$, with the assumption that this expressive power gives both human an artificial agents the ability to choose at what level they wish to constrain reasoning.

Structured encoding and decoding

To enable LLMs to take the full advantage of Polylang, a tight interaction loop between the model and the compiler is required - where the generation of code is both conditioned on the existing proof state but also constrained by it. Instead of generating token sequences based only on the surface textual representation of code, exposing the internal terms elaborated by the compiler gives the model direct access to a great deal of type and structural information that it would otherwise have to infer by itself. Likewise, because goal types are now made explicit with Polylang (instead of all sharing the type TacticM Unit as it is currently the case in Lean), we can validate whether output of an LLM is well-typed without needing to run the underlying generated tactic. However, efficiently encoding the provided structured information into a transformer-based model, and correctly decoding into a well-typed algebraic data-type are still active areas of research in deep learning.

In this subproject, so far we have made significant progress on the encoding aspect. We extended and formalised the Algebraic Positional Encoding from operating on groups to operating on arbitrary inductive types, such as those found describing the internal expression types in Lean. We implemented a prototype transformer implementation which consumes such a structured datatype, and faithfully encodes its structure in a vector representation. Our plan for the future is to work both on decoding schemes – which allow us to generate arbitrary inductive types with a transformer-based model in a type-safe manner – but also implementation and prototyping of our existing encoding schemes on small datasets.

tags: theorem-proving - machine-learning - polynomial-functors