Published on July 17, 2025 1:49 AM GMT

Recently, I read the paper What is a statistical model?, and this post contains my thoughts downstream of the concepts in the paper. I have aimed to be minimally technical here (though it gets a bit technical towards the end...), so go read the paper if you want more precise ideas.

The fundamental challenge when we model the world is representing real-world concepts as mathematical objects. The choice of representation is not 'real', so it should not effect the predictions made by our model. This is essentially the map-territory problem. We have multiple maps for the territory, and we want our decisions to be based on the properties of the territory, not on artifacts of the map.

McCullagh's paper partially addresses this challenge, using category theory as a tool to keep track of various maps of a given territory. One key insight is that, if we want to assign probabilities to events based on the territory and not the map, we need our assignment of probabilities to be functorial. This includes and generalizes the concept of invariance & equivariance under group actions.

My primary goal with this article is to explain what it means to assign probabilities functorially, more intuitively than rigorously. My secondary goal is to pique your interest in the utility of category theory for mathematically approaching the map-territory problem.

Warm-Up: Changing units of measurement

Consider mathematically representing the temperature of an object. Three natural representations are:
- The number of degrees Celsius
- The number of degrees Fahrenheit
- The number of Kelvin
Between any pair of these representations, we also have a conversion formula, such as

(Temperature in Fahrenheit) = 9 / 5 \cdot (Temperature in Celcius) + 32

If you are trying to use a temperature to make a prediction or decision, that prediction or decision should not depend on the scale you use to measure temperature. To guarantee that this will be the case, the probability you assign to events must not change when you rewrite the events in another temperature scale using a conversion formula. For example, if I assign probability p to the event (The temperature will be above 0 °C), then I must also assign probability p to the event (The temperature will be above 32 °F). To do anything else would be logically/mathematically inconsistent.

This example easily generalizes to any quantitative measurement with units. The consistency requirement can be stated, using some jargon, as One requires that the probability measure of events is invariant under change of basis. Functoriality is a broad generalization of this concept.

Categories of Representations

To formalize this idea, I will first introduce a mathematical structure to keep track of all the various ways one can represent a real-world concept. The three temperature scales and the various conversions between them can be represented as a directed graph:

This graph satisfies some important properties.
1. Composition: Given two arrows A→B and B→C, there is an arrow A→C given by composing the two arrows.
2. Associativity: Composing arrows is associative: when composing three arrows, the order of composition does not matter.
3. Identity: For every node in the graph, there is an identity arrow A→A that does nothing. We typically don't draw these arrows, to avoid cluttering the graph.

In the temperature example, composition means that if I can convert a temperature from Celsius to Kelvin, and also from Kelvin to Fahrenheit, then I must be able to convert a temperature from Celsius to Fahrenheit (and this conversion must agree with the result of first converting to Kelvin, then to Fahrenheit).

A directed graph with these properties is called a category. To specify a category, one must specify the nodes of the graph, usually called the objects of the category, and one must specify the arrows of the graph, usually called the morphisms of the category.

To package up all the possible ways of representing a temperature, we can define the "category of temperature", whose objects are the various representations of temperature and whose morphisms are the ways of converting between those representations. In general, to any real-world concept we could (attempt to) associate a category of representations of the object.

In the map-territory framework, each object in a category is a map, and the category is like an atlas^[1], containing many maps and the relationships between them.

Representations with Differing Information

As drawn above, the category of temperature is woefully incomplete. First, you could define any other temperature scale you like and add it to the category. Setting that aside, our three representations of temperature all assume we can measure temperature to infinite precision. We should distinguish temperature in °C, temperature in °C to 1 decimal place and temperature in °C to two decimal places as three different representations of temperature. There are conversion maps going in one direction; we can convert °C to 2 decimals to °C to 1 decimal by simply dropping the last digit. However, we can't convert the other way, as °C to 1 decimal contains less information than °C to 2 decimals.

Let's add these representations to our category:

The (...) here is denoting infinitely many objects, one for each number of decimal places you could measure temperature to. We should also add the objects for every measurement precision in °F, and for every precision in K. There are also many arrows that I have not drawn in the graph -- for example the composition rule means that we should have an arrow from °F to °C to 2 decimals by composing the arrow from °F to °C with the arrows from °C to °C to 2 decimals.

When two representations contain different amounts of information, it is reasonable to assign different probabilities to the same event described in two different representations. Let's see this with a different example.

Suppose I want to represent my location. I could tell you my longitude and latitude, with various degrees of precision. I could tell you what country I am in, or what city, or my postal code. These are all representations with various levels of information. Here is a small subset of the category of locations:

If you assign probability $P (Ontario)$ to the event Kaleb lives in Ontario, then you can convert that event from Province to Country representation of location, where it becomes Kaleb lives in Canada. As being in Ontario implies being in Canada, being in Canada is at least as likely as being in Ontario:

P (Ontario) \leq P (Canada) .

This is a compatibility condition that your assigned probabilities must satisfy to be consistent with the structure of the category of locations.

Functors and Functoriality

The compatibility conditions in the previous sections are both examples of the requirement that assignment of probabilities is functorial. Functorial is the adjective form of functor, and a functor is a kind of mapping between categories.

First, an informal definition:
A functor $F$ between two categories is a pair of two things:
1. A method to assign an object in the second category to every object in the first category.
2. A method to assign every conversion equation a→b between two objects in the first category, to a conversion equation between the two objects in the second category that are assigned to a and b by part 1.

Now a formal definition:
A functor is two maps, one map for the objects of the category and one map for the morphisms. Suppose $A$ and $B$ are two categories, with objects $A^{o b j}, B^{o b j}$ and morphisms $A^{m o r}, B^{m o r}$ . Then a functor $F : A \to B$ is two maps:
1. $F^{o b j} : A^{o b j} \to B^{o b j}$
2. $F^{m o r} : A^{m o r} \to B^{m o r}$
(Both maps will be denoted $F$ from now on) such that for every subset of $A$ that forms a commuting triangle,

(Commuting means that $h = f \circ g$ , i.e. doing the $h$ arrow is equal to doing $g$ and then doing $f$ .)

applying $F$ to all objects and arrows in the triangle forms a commuting triangle in $B .$

Example:
Define the category of intervals to be the category with:

(a, b)

(0, 1) \to (0, 2)

(- 1, 1)

(100, 200)

Then we can define a functor LiquidWater from the category of temperatures described before to this temperature of intervals:

L i q u i d W a t e r (° C) = (0, 100), L i q u i d W a t e r (° F) = (32, 212)

f

f

The fact that LiquidWater is a functor means that the concept of "temperature interval where water is a liquid" changes in a consistent way when you change your representation of the concept "temperature".

Technical Example: Tensors
This example is intended to be more familiar to ML folks. Given any vector space $V$ , we can define a category $V - -$ with

V

Fix some other vector space $W$ . Then we have a functor from $V - -$ to the category $V \otimes W - ------ -$ , defined by mapping an object $V^{'}$ to $V^{'} \otimes W$ , and a linear isomorphism $L : V^{'} \to V^{''}$ to the linear isomorphism

L \otimes 1 : V^{'} \otimes W \to V^{''} \otimes W .

The fact that this is a functor implies the oft-stated maxim that a tensor is something that transforms like a tensor. 'Transforms like a tensor', means that the tensor product changes in a consistent way when you change your representation of the vector space $V$ (that is, change basis by applying a linear isomorphism).

Events and Probabilities as Functors

For Bayesian modelling, I propose that an event that you assign probability to a functor. Specifically, a functor $E$ from our category of interest to a subcategory of the category of measurable sets. Defining this subcategory properly requires a lot more work than I want to do here^[2].

For $E$ to be a functor from a category $C$ to sets, means that to every object $c$ in $C$ , we are assigning a (measurable) set $E (c)$ , and to every map $c \to c^{'}$ we are assigning a map $E (c) \to E (c^{'})$ . When thinking of $C$ as an abstract concept, and $c$ as one way of representing that concept, $E (c)$ is the event $E$ described using the representation $c$ .

For example, if $C$ is the category of locations from a few sections ago, with objects given by Country, Province and City, the event $E$ representing kaleb is in Ontario could be defined by:

E(Country) = {Canada}E(Province) = {Ontario}E(City) = {all cities in Ontario}

Whereas the event representing kaleb is in Toronto could be defined by

E(Country) = {Canada}E(Province) = {Ontario}E(City) = {Toronto}

The fact that these events assign the same set to the Country and Province representations of location captures the fact that kaleb is in Ontario and kaleb is in Toronto are indistinguishable events when you are representing location using countries or provinces.

Now let $E (C)$ denote the image category of the category $C$ under an event functor $E$ . That is

E (C)

E (c)

c

C

E (C)

E (c) \to E (c^{'})

c \to c^{'}

C

Let $[0, 1] - --- -$ be the category with
- Objects: Every real number between 0 and 1 (inclusive)^[3].
- Morphisms: Add an arrow $x \to y$ whenever $x \leq y .$

An assignment of probabilities to the event $E$ is a another functor $P : E (C) \to [0, 1] - --- -$ . That is, for every representation $c$ of the concept $C,$ you have the set $E (c)$ , and a number $P (E (c)) \in [0, 1]$ , which is the probability of event $E$ occurring when written in representation $c .$

The functoriality condition, that $E$ and $P$ send commutative squares to commutative squares, is a mathematical expression that enforces that the assignment of probabilities is consistent with changes to the representation of the underlying concept C. As an aphorism: Functors are properties of the entire atlas, not properties of any one map.

An arrow $c \to c^{'}$ means that $c$ contains at least as much information as $c^{'},$ and functoriality implies that that $P (E (c)) \geq P (E (c^{'}))$ . If $c$ and $c^{'}$ contain the same information, you'll also have an arrow $c^{'} \to c$ , enforcing the other inequality and therefore implying $P (E (c)) = P (E (c^{'}))$ .

Example: Probabilities of First Digits

So far, all the examples I've provided of functoriality requirements are a bit obvious and don't justify the fancy terminology. Here is a famous example which is a bit more surprising. For a classical discussion of this example, see Section 3.3.2 of Berger's textbook on Bayesian Analysis.

Suppose I have a random collection of various tables, filled with numbers from a variety of sources. Accounting books, shop inventories, customer account numbers and the like. Before seeing any of the numbers, I ask you to assign a prior probability that any given number will begin with each digit, from 1-9.

As there are 9 possibilities and the numbers are random, one might assign the probabilities P(1)=P(2)=...=P(9) = 1/9; a 1/9 chance of seeing each digit. However, this assignment of probabilities is not functorial.

This is because the scale of the numbers is arbitrary and doesn't affect the first digit. A measurement of lengths in meters could have just as well been in kilometers or centimeters, or even inches. Crop yields in bushels/acre could have been in kilograms/hectare. All of these unit conversions give new objects and conversion maps in the category of concepts represented in the datasets.

Changing units means multiplying numbers in the table by an arbitrary factor c. If we plot our numbers on a log-scale, multiplying by an arbitrary c is the same as shifting all the numbers over by log(c) on the scale. Therefore, for our prior probabilities to be functoral, i.e. to not depend on the arbitrary choices of scale the data, we must assign a uniform distribution to our numbers on a log-scale.

Converting this uniform distribution back to a non-log scale, we can compute that our functorial prior should be

P (d) = log (1 + 1 / d) .

This distribution appears empirically in real world data, and is known as Benford's Law.

Summary and Advice

The section on events and probabilities got a bit more technical then I wanted. Let me summarize my points using only words. I have tried to explain that:

category theory

functorial

To assign probabilities functorially, here are two rules to follow:

at least

When assigning a probability to an event, try describing the event in different ways, and make sure your assigned probability transforms according to the above rules!

I have taken up this categorical formulation of Bayesian analysis as a side-project to my PhD research, but I haven't yet had the time to say anything about Bayes theorem and conditional probabilities. Hopefully that is to come!

^{^}
Unfortunately atlas is already a technical term in math, for a related concept.
^{^}
I am in the process of writing this down formally.
^{^}
Some would argue exclusive.

Discuss

Warm-Up: Changing units of measurement

Categories of Representations

Representations with Differing Information

Functors and Functoriality

Events and Probabilities as Functors

Example: Probabilities of First Digits

Summary and Advice

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签