The mathematics behind the encoding layer

Datasent is built on a small number of precise mathematical ideas. This page is where we publish our work — the papers that describe the full framework, the open problems we're working on, and the technical writing that explains how specific parts of the system work and why.
Papers

The underlying research powering our technology

A Mathematical Framework for Structured Information Encoding

Discover how Datasent’s Polynomial Tokens™ allow teams to analyze sensitive data without moving it, unlocking insights safely and efficiently.

A Trusted Setup for Bandwidth-Minimal, Lossless Data Tokenization

Presents the trusted setup protocol for lossless tokenisation that enables exact reconstruction while transmitting only minimal residual information. Covers deterministic canonicalisation, integer-lossless representation, the residual-only transmission protocol, threshold and custodian reconstruction, and integrity verification. Demonstrates that raw data never needs to traverse organisational or network boundaries.
Privacy-First Data

How Datasent approaches data

Datasent replaces raw data movement with a structured encoding layer. Sender and receiver agree on a shared model basis upfront, allowing data to be represented through its structure rather than transferred in full.
Open problems

The questions we're still working on

Datasent is a complete, working system. But there are directions we know matter and haven't fully solved. We publish them here because intellectual honesty about what's open is more useful than pretending everything is settled.
1

Formal complexity analysis for multi-dimensional canonical forms

The primary mathematical analysis treats canonical data as a matrix partitioned along its first axis. Extending the formal analysis — complexity bounds, compression conditions, operator compatibility — to multi-dimensional canonical forms, including the image and video decompositions the system already handles in practice, is an evolving research direction.
2

Operator compatibility beyond linear maps

The primary mathematical analysis treats canonical data as a matrix partitioned along its first axis. Extending the formal analysis — complexity bounds, compression conditions, operator compatibility — to multi-dimensional canonical forms, including the image and video decompositions the system already handles in practice, is an evolving research direction.
3

Optimal basis selection under distribution shift

The primary mathematical analysis treats canonical data as a matrix partitioned along its first axis. Extending the formal analysis — complexity bounds, compression conditions, operator compatibility — to multi-dimensional canonical forms, including the image and video decompositions the system already handles in practice, is an evolving research direction.
4

Zero-knowledge proof integration

The primary mathematical analysis treats canonical data as a matrix partitioned along its first axis. Extending the formal analysis — complexity bounds, compression conditions, operator compatibility — to multi-dimensional canonical forms, including the image and video decompositions the system already handles in practice, is an evolving research direction.
5

End-to-end empirical evaluation

The primary mathematical analysis treats canonical data as a matrix partitioned along its first axis. Extending the formal analysis — complexity bounds, compression conditions, operator compatibility — to multi-dimensional canonical forms, including the image and video decompositions the system already handles in practice, is an evolving research direction.

Working on something related? We'd like to hear from you.

We're interested in conversations with data infrastructure engineers, ML researchers, and applied mathematicians working on problems that overlap with ours — whether that's lossless compression, efficient ML data pipelines, structured representations, trusted computation, or cryptographic verification of data properties.