More thoughts about optimal control simplification

Created Friday 17 April 2026

Some more thoughts about optimal control problems inspired by the CodinGame 2026 winter challenge.

In the challenge, I control multiple robot-snakes who need to eat as many "power sources" as possible. Most of the time, the snakes are independent - and then most of the time each snake's head can just move wherever it wants and the exact shape of the snake is irrelevant. How can we use these simplifications without hurting ourselves too much – that is catching the situations where the simplification is not applicable and taking a more detailed view in those cases?

I want to look at this question through the lenses of a "projection" and an "approximation". A projection reduces a "bigger" optimal control problem to a smaller one - and we know how to solve small optimal control problems using the Bellman equation. An "approximation" reuses our ability to solve one optimal control problem to approach the solution of a different but similar one. But first, we need the notation.

Optimal control problem

An optimal control porblem will be described by:

- the set of possible states of the system
- the initial state
- a set-valued function that for each determines the set of all possible controls
- the transition function, so that the state for the control
- the reward function. The goal is to maximize

Projection

A projection of an optimal control problem into an optimal control problem is a pair of functions and such that

The meaning behind these long formulas is trivial: there is a mapping of states of into the states of , such that if I am able to solve , then the corresponding controls in are also optimal for . An example of a projection would be the optimal control problems:

: snake-bot that can pass through itself and is not allowed to fall down
: character-with-a-jet-pack

Here is how GodinGame describes the rules for the snake-bot:

The game is played on a grid.
Each player controls a team of snakebots. On each turn, the snakebots move simultaneously according to the players' commands.
🗺️ Map
The grid is seen from the side, and is made up of platforms. Platforms are impassable cells.
On this grid you may find parts of a snakebot's body and power sources.
🐍 Snakebot
Snakebots are multiple adjacent cell-sized body parts. The first cell being their head.
Snakebots are affected by gravity. Meaning at least one body part must be above something solid or they will fall.
Other snakebots are considered solid, as well as platforms and power sources .
↪️ Movement
Snakebots are perpetually moving, and will on each turn head in the last direction they were facing unless given a new direction to turn by the player.
The starting direction is up.
When moving, a snakebot will advance their head in the direction it is facing, and the rest of its body parts will follow.
[...]
Once movement and removals are resolved, the snakebots all ~~fall downwards until a body part lands on something solid~~.

In our modification of the rules (the "not allowed to fall down" part), if snakebot has to fall, it "is removed" (dies) instead.

For "character-with-a-jet-pack" I imagine the following rules:

The game is played on a grid.
The grid is seen from the side, and is made up of platforms. Platforms are impassable cells.
The player controls a character. On each turn the character can move in any of the cardinal directions except the one it just came from.
Each turn of movement consumes one unit of jetpack charge. When the character stands on a platform, the jetpack instantly recharges fully. If the jetpack charge reaches zero, however, the character is removed from the game.
The amount of the "full charge" of the jetpack is a parameter of the game at this point.

With a little effort, one can see that a "character-with-a-jetpack" who follows the head of a snakebot is equivalent to the whole snakebot. In other words, the exact shape of the snakebot does not matter, the only thing that matters is the segment of the snakebot that is supported by a platform - and about that all I want to know is how far away it is from the tail of the snake.

Approximation

The snakebots in the actual CodinGame challenge differ from the snakebots in the above "Projection" section. The main difference is that they cannot pass through themselves. (Other differences, like the ability to fall down I consider less significant). However, most of the time, the snake is not even trying to pass through itself – so, maybe, we can use the "transparent" snake from the prevois section for most of our optimal control problem?

Mathematically, we again have two optimal control problems: and . We say that is approximated by if

, ,
and there is a "small" set such that

The definition is symmetric, but here presumably is "easier to solve" and is the problem we are actually interested in.

The difficulty here is that even if the set of exceptions is small, it has the potential to affect the value function and the optimal controls for states far away from . We cannot write

even though we would really want to! It would be not too hard to find the value function for a small set of exceptional states if we knew it everywhere else...

So what can we do?

Final part of the optimal path is easy under some conditions

In the example with a snakebot approximated by a "transparent" snakebot, controlling the "real" snakebot is always harder than controlling the "transparent" one:

by induction.

This means that if for some starting state

we find an optimal control

for

such that the whole optimal trajectory is non-exceptional

then the same trajectory is feasible for

and

is optimal for

If our starting state does not have this property, we can build our decision tree as usual for an optimal control problem until we do hit a state with this property. From that point on we don't need to keep solving the full problem but can reuse the solution to the (presumably easier) instead.

Maybe subregions of state-space can be reused?

The next best thing I can think of (and I'm not sure if this idea works) is to identify subregions of the state-space where we could reuse the solution approaches to though not itself. So, let's say we identify a subset and we formulate an optimal control problem such that

the state space
transition function

reward function

Here

is a boundary condition reward: ideally,

but we would need to solve for that somehow.

In other words, is the same as as long as we are inside the region , and once we leave we assign ourselves a reward depending on our final state . I visualise as the problem of navigating a snakebot inside a room in our snakebot example.

Now, assuming that we can solve using the same methods as solving (or using some other methods that make it easier than ) and assuming that the optimal trajectory is non-exceptional (does not include state-control pairs from the set ), and also that we still have the relation , then we can reuse the optimal trajectories from for our controls within region .

In a sense, traversing becomes a single step in our optimal control problem: tell me where you want to exit and I will tell you the optimal control and the cumulative reward of (optimally) getting there through .

Diversity of paths matters, but how?

For snakebots, it is not unusual that there are multiple optimal or close-to-optimal ways to get from state to a desired state . Intuitively, the more ways there are for problem to navigate from to , the more likely is that one of those ways will be suitable for . Is there some way to formalize and use this fact to actually solve ? I don't know.

How do navigation apps do it?

When I ask OsmAnd for public-transport navigation between cities, it often chokes – presumable on the amount of options for getting around. Yet, for a human, the navigation is not hard in this case: find a train that goes from one city to the other, then find a way to get to and from the train stations. This is similar to the "subregion" approach I discuss above – I am sure that apps like OsmAnd already use it. How do they implement it?

In place of conclusion

When starting this blog, I promised that I will have more questions than answers. Here we go.

equation-001.png		395b
equation-001.tex		15b
equation-002.png		323b
equation-002.tex		7b
equation-003.png		475b
equation-003.tex		19b
equation-004.png		512b
equation-004.tex		21b
equation-005.png		462b
equation-005.tex		19b
equation-006.png		522b
equation-006.tex		28b
equation-007.png		1.45kb
equation-007.tex		147b
equation-008.png		238b
equation-008.tex		3b
equation-009.png		878b
equation-009.tex		64b
equation-010.png		865b
equation-010.tex		64b
equation-011.png		496b
equation-011.tex		25b
equation-012.png		684b
equation-012.tex		42b
equation-013.png		1.03kb
equation-013.tex		83b
equation-014.png		1.08kb
equation-014.tex		68b
equation-015.png		953b
equation-015.tex		62b
equation-016.png		1.43kb
equation-016.tex		128b
equation-017.png		296b
equation-017.tex		9b
equation-018.png		283b
equation-018.tex		9b
equation-019.png		283b
equation-019.tex		9b
equation-020.png		296b
equation-020.tex		9b
equation-021.png		296b
equation-021.tex		9b
equation-022.png		296b
equation-022.tex		9b
equation-023.png		283b
equation-023.tex		9b
equation-024.png		878b
equation-024.tex		64b
equation-025.png		865b
equation-025.tex		64b
equation-026.png		296b
equation-026.tex		9b
equation-027.png		283b
equation-027.tex		9b
equation-028.png		431b
equation-028.tex		17b
equation-029.png		476b
equation-029.tex		21b
equation-030.png		376b
equation-030.tex		27b
equation-031.png		451b
equation-031.tex		22b
equation-032.png		930b
equation-032.tex		60b
equation-033.png		926b
equation-033.tex		60b
equation-034.png		283b
equation-034.tex		9b
equation-035.png		296b
equation-035.tex		9b
equation-036.png		239b
equation-036.tex		1b
equation-037.png		239b
equation-037.tex		1b
equation-038.png		1.29kb
equation-038.tex		96b
equation-039.png		601b
equation-039.tex		26b
equation-040.png		238b
equation-040.tex		3b
equation-041.png		283b
equation-041.tex		9b
equation-042.png		775b
equation-042.tex		47b
equation-043.png		319b
equation-043.tex		9b
equation-044.png		296b
equation-044.tex		9b
equation-045.png		551b
equation-045.tex		27b
equation-046.png		319b
equation-046.tex		9b
equation-047.png		296b
equation-047.tex		9b
equation-048.png		296b
equation-048.tex		9b
equation-049.png		283b
equation-049.tex		9b
equation-050.png		283b
equation-050.tex		9b
equation-051.png		283b
equation-051.tex		9b
equation-052.png		341b
equation-052.tex		13b
equation-053.png		460b
equation-053.tex		20b
equation-054.png		917b
equation-054.tex		40b
equation-055.png		1.94kb
equation-055.tex		131b
equation-056.png		2.13kb
equation-056.tex		155b
equation-057.png		362b
equation-057.tex		4b
equation-058.png		616b
equation-058.tex		17b
equation-059.png		460b
equation-059.tex		20b
equation-060.png		283b
equation-060.tex		9b
equation-061.png		231b
equation-061.tex		1b
equation-062.png		231b
equation-062.tex		1b
equation-063.png		362b
equation-063.tex		4b
equation-064.png		200b
equation-064.tex		1b
equation-065.png		460b
equation-065.tex		20b
equation-066.png		283b
equation-066.tex		9b
equation-067.png		296b
equation-067.tex		9b
equation-068.png		460b
equation-068.tex		20b
equation-069.png		239b
equation-069.tex		1b
equation-070.png		601b
equation-070.tex		26b
equation-071.png		460b
equation-071.tex		20b
equation-072.png		231b
equation-072.tex		1b
equation-073.png		231b
equation-073.tex		1b
equation-074.png		231b
equation-074.tex		1b
equation-075.png		231b
equation-075.tex		1b
equation-076.png		238b
equation-076.tex		3b
equation-077.png		328b
equation-077.tex		10b
equation-078.png		283b
equation-078.tex		9b
equation-079.png		238b
equation-079.tex		3b
equation-080.png		328b
equation-080.tex		10b
equation-081.png		296b
equation-081.tex		9b
equation-082.png		296b
equation-082.tex		9b
equation.png		238b
equation.tex		1b