Published on July 17, 2025 1:49 AM GMT
Recently, I read the paper What is a statistical model?, and this post contains my thoughts downstream of the concepts in the paper. I have aimed to be minimally technical here (though it gets a bit technical towards the end...), so go read the paper if you want more precise ideas.
The fundamental challenge when we model the world is representing real-world concepts as mathematical objects. The choice of representation is not 'real', so it should not effect the predictions made by our model. This is essentially the map-territory problem. We have multiple maps for the territory, and we want our decisions to be based on the properties of the territory, not on artifacts of the map.
McCullagh's paper partially addresses this challenge, using category theory as a tool to keep track of various maps of a given territory. One key insight is that, if we want to assign probabilities to events based on the territory and not the map, we need our assignment of probabilities to be functorial. This includes and generalizes the concept of invariance & equivariance under group actions.
My primary goal with this article is to explain what it means to assign probabilities functorially, more intuitively than rigorously. My secondary goal is to pique your interest in the utility of category theory for mathematically approaching the map-territory problem.
Warm-Up: Changing units of measurement
Consider mathematically representing the temperature of an object. Three natural representations are:
- The number of degrees Celsius
- The number of degrees Fahrenheit
- The number of Kelvin
Between any pair of these representations, we also have a conversion formula, such as
If you are trying to use a temperature to make a prediction or decision, that prediction or decision should not depend on the scale you use to measure temperature. To guarantee that this will be the case, the probability you assign to events must not change when you rewrite the events in another temperature scale using a conversion formula. For example, if I assign probability p to the event (The temperature will be above 0 °C), then I must also assign probability p to the event (The temperature will be above 32 °F). To do anything else would be logically/mathematically inconsistent.
This example easily generalizes to any quantitative measurement with units. The consistency requirement can be stated, using some jargon, as One requires that the probability measure of events is invariant under change of basis. Functoriality is a broad generalization of this concept.
Categories of Representations
To formalize this idea, I will first introduce a mathematical structure to keep track of all the various ways one can represent a real-world concept. The three temperature scales and the various conversions between them can be represented as a directed graph:
This graph satisfies some important properties.
1. Composition: Given two arrows A→B and B→C, there is an arrow A→C given by composing the two arrows.
2. Associativity: Composing arrows is associative: when composing three arrows, the order of composition does not matter.
3. Identity: For every node in the graph, there is an identity arrow A→A that does nothing. We typically don't draw these arrows, to avoid cluttering the graph.
In the temperature example, composition means that if I can convert a temperature from Celsius to Kelvin, and also from Kelvin to Fahrenheit, then I must be able to convert a temperature from Celsius to Fahrenheit (and this conversion must agree with the result of first converting to Kelvin, then to Fahrenheit).
A directed graph with these properties is called a category. To specify a category, one must specify the nodes of the graph, usually called the objects of the category, and one must specify the arrows of the graph, usually called the morphisms of the category.
To package up all the possible ways of representing a temperature, we can define the "category of temperature", whose objects are the various representations of temperature and whose morphisms are the ways of converting between those representations. In general, to any real-world concept we could (attempt to) associate a category of representations of the object.
In the map-territory framework, each object in a category is a map, and the category is like an atlas[1], containing many maps and the relationships between them.
Representations with Differing Information
As drawn above, the category of temperature is woefully incomplete. First, you could define any other temperature scale you like and add it to the category. Setting that aside, our three representations of temperature all assume we can measure temperature to infinite precision. We should distinguish temperature in °C
, temperature in °C to 1 decimal place
and temperature in °C to two decimal places
as three different representations of temperature. There are conversion maps going in one direction; we can convert °C to 2 decimals
to °C to 1 decimal
by simply dropping the last digit. However, we can't convert the other way, as °C to 1 decimal
contains less information than °C to 2 decimals
.
Let's add these representations to our category:
The (...) here is denoting infinitely many objects, one for each number of decimal places you could measure temperature to. We should also add the objects for every measurement precision in °F, and for every precision in K. There are also many arrows that I have not drawn in the graph -- for example the composition rule means that we should have an arrow from °F
to °C to 2 decimals
by composing the arrow from °F
to °C
with the arrows from °C
to °C to 2 decimals
.
When two representations contain different amounts of information, it is reasonable to assign different probabilities to the same event described in two different representations. Let's see this with a different example.
Suppose I want to represent my location
. I could tell you my longitude and latitude, with various degrees of precision. I could tell you what country I am in, or what city, or my postal code. These are all representations with various levels of information. Here is a small subset of the category of locations:
If you assign probability to the event Kaleb lives in Ontario
, then you can convert that event from Province
to Country
representation of location, where it becomes Kaleb lives in Canada
. As being in Ontario implies being in Canada, being in Canada is at least as likely as being in Ontario:
This is a compatibility condition that your assigned probabilities must satisfy to be consistent with the structure of the category of locations.
Functors and Functoriality
The compatibility conditions in the previous sections are both examples of the requirement that assignment of probabilities is functorial. Functorial is the adjective form of functor, and a functor is a kind of mapping between categories.
First, an informal definition:
A functor between two categories is a pair of two things:
1. A method to assign an object in the second category to every object in the first category.
2. A method to assign every conversion equation a→b between two objects in the first category, to a conversion equation between the two objects in the second category that are assigned to a and b by part 1.
Now a formal definition:
A functor is two maps, one map for the objects of the category and one map for the morphisms. Suppose and are two categories, with objects and morphisms . Then a functor is two maps:
1.
2.
(Both maps will be denoted from now on) such that for every subset of that forms a commuting triangle,
(Commuting means that , i.e. doing the arrow is equal to doing and then doing .)
applying to all objects and arrows in the triangle forms a commuting triangle in
Example:
Define the category of intervals to be the category with:
- Objects given by every interval inside the real numbers.Morphisms given by an inclusion map from any interval to every other interval that fully contains it. For example, there is an arrow , but there is no arrow from to .
Then we can define a functor LiquidWater from the category of temperatures described before to this temperature of intervals:
- To every object in the category of temperature, which is a way of measuring temperatures, we assign it to the interval (freezing point of water, boiling point of water) For example,
To every conversion map between temperature scales, we assign the conversion map between intervals given by applying to both endpoints of the interval.
The fact that LiquidWater is a functor means that the concept of "temperature interval where water is a liquid" changes in a consistent way when you change your representation of the concept "temperature".
Technical Example: Tensors
This example is intended to be more familiar to ML folks. Given any vector space , we can define a category with
- Objects: all vector spaces that are isomorphic to Morphisms: all linear isomorphisms
Fix some other vector space . Then we have a functor from to the category , defined by mapping an object to , and a linear isomorphism to the linear isomorphism
The fact that this is a functor implies the oft-stated maxim that a tensor is something that transforms like a tensor. 'Transforms like a tensor', means that the tensor product changes in a consistent way when you change your representation of the vector space (that is, change basis by applying a linear isomorphism).
Events and Probabilities as Functors
For Bayesian modelling, I propose that an event that you assign probability to a functor. Specifically, a functor from our category of interest to a subcategory of the category of measurable sets. Defining this subcategory properly requires a lot more work than I want to do here[2].
For to be a functor from a category to sets, means that to every object in , we are assigning a (measurable) set , and to every map we are assigning a map . When thinking of as an abstract concept, and as one way of representing that concept, is the event described using the representation .
For example, if is the category of locations from a few sections ago, with objects given by Country
, Province
and City
, the event representing kaleb is in Ontario
could be defined by:
- E(Country) = {Canada}E(Province) = {Ontario}E(City) = {all cities in Ontario}
Whereas the event representing kaleb is in Toronto
could be defined by
- E(Country) = {Canada}E(Province) = {Ontario}E(City) = {Toronto}
The fact that these events assign the same set to the Country
and Province
representations of location captures the fact that kaleb is in Ontario
and kaleb is in Toronto
are indistinguishable events when you are representing location using countries or provinces.
Now let denote the image category of the category under an event functor . That is
- the objects of are {, for every object of },the morphisms of are for every morphism in .
Let be the category with
- Objects: Every real number between 0 and 1 (inclusive)[3].
- Morphisms: Add an arrow whenever
An assignment of probabilities to the event is a another functor . That is, for every representation of the concept you have the set , and a number , which is the probability of event occurring when written in representation
The functoriality condition, that and send commutative squares to commutative squares, is a mathematical expression that enforces that the assignment of probabilities is consistent with changes to the representation of the underlying concept C. As an aphorism: Functors are properties of the entire atlas, not properties of any one map.
An arrow means that contains at least as much information as and functoriality implies that that . If and contain the same information, you'll also have an arrow , enforcing the other inequality and therefore implying .
Example: Probabilities of First Digits
So far, all the examples I've provided of functoriality requirements are a bit obvious and don't justify the fancy terminology. Here is a famous example which is a bit more surprising. For a classical discussion of this example, see Section 3.3.2 of Berger's textbook on Bayesian Analysis.
Suppose I have a random collection of various tables, filled with numbers from a variety of sources. Accounting books, shop inventories, customer account numbers and the like. Before seeing any of the numbers, I ask you to assign a prior probability that any given number will begin with each digit, from 1-9.
As there are 9 possibilities and the numbers are random, one might assign the probabilities P(1)=P(2)=...=P(9) = 1/9; a 1/9 chance of seeing each digit. However, this assignment of probabilities is not functorial.
This is because the scale of the numbers is arbitrary and doesn't affect the first digit. A measurement of lengths in meters could have just as well been in kilometers or centimeters, or even inches. Crop yields in bushels/acre could have been in kilograms/hectare. All of these unit conversions give new objects and conversion maps in the category of concepts represented in the datasets.
Changing units means multiplying numbers in the table by an arbitrary factor c. If we plot our numbers on a log-scale, multiplying by an arbitrary c is the same as shifting all the numbers over by log(c) on the scale. Therefore, for our prior probabilities to be functoral, i.e. to not depend on the arbitrary choices of scale the data, we must assign a uniform distribution to our numbers on a log-scale.
Converting this uniform distribution back to a non-log scale, we can compute that our functorial prior should be
This distribution appears empirically in real world data, and is known as Benford's Law.
Summary and Advice
The section on events and probabilities got a bit more technical then I wanted. Let me summarize my points using only words. I have tried to explain that:
- There are many maps of a given territory, and these can be organized using the mathematical framework called category theory.Something is functorial if it changes in a logically consistent way when you change your underlying representation ("map") of a concept ("territory").For your predictions about reality to be logically consistent, your assignment of probabilities must be functorial.
To assign probabilities functorially, here are two rules to follow:
- If you can convert between two representations of a concept without losing information, then for every event, you must assign the same probability when describing the using either representation.If converting from one representation 1 to representation 2 makes you lose some information, then you must assign at least as much probability to an event written in representation 2 as in representation 1.
When assigning a probability to an event, try describing the event in different ways, and make sure your assigned probability transforms according to the above rules!
I have taken up this categorical formulation of Bayesian analysis as a side-project to my PhD research, but I haven't yet had the time to say anything about Bayes theorem and conditional probabilities. Hopefully that is to come!
- ^
Unfortunately atlas is already a technical term in math, for a related concept.
- ^
I am in the process of writing this down formally.
- ^
Discuss