Hamilton-Jacobi-Bellman Equation: Reinforcement Learning and Diffusion Models

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.

I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.

Thank you!

Ever since the control bug bit me in my EE undergrad years I am happy to see how useful the knowledge remains. Of course the underlying math of optimization remains general but the direct applications of control theory made it much more appetizing for me to struggle through.

Nice summary, saving it. If author is around, Bellman equation label ended overlapped to eqn., and pargraph quoting signs got into HJB displayed one. Suggest changes is 404 not found. Liked the presentation overall, thank you!

I find myself completely outclassed by mathematicians in my own field. I tried to learn a little math on the side after my regular software engineer gig but I'm completely outclassed by phd's.

I am unsure of the next course of action or if software will survive another 5 years and how my career will look like in the future. Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

It's not clear or obvious why continuous semantics should be applicable on a digital computer. This might seem like nitpicking but it's not, there is a fundamental issue that is always swept under the rug in these kinds of analysis which is about reconciling finitary arithmetic over bit strings & the analytical equations which only work w/ infinite precision over the real or complex numbers as they are usually defined (equivalence classes of cauchy sequences or dedekind cuts).

There are no dedekind cuts or cauchy sequences on digital computers so the fact that the analytical equations map to algorithms at all is very non-obvious.

This is what the field of numerical analysis exists for. These details definitely have been treated, but this was done mainly early in the field's history; for example, by people like Wilkinson and Kahan...

Continuous formulations are used with digital computers all the time. Limited precision of floats sometimes causes numerical instability for some algorithms, but usually these are fixable with different (sometimes less efficient) implementations.

Discretizing e.g. time or space is perhaps a bigger issue, but the issues are usually well understood and mitigated by e.g. advanced numerical integration schemes, discrete-continuous formulations or just cranking up the discretization resolution.

Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.

Doesn't continuous time basically mean "this is what we expect for sufficiently small time steps"? Very similar to how one would for example take the first order Taylor dynamics and use them for "sufficiently small" perturbations from equilibrium. Is there any other magic to continuous time systems that one would not expect to be solved by sufficiently small time steps?

Real numbers mostly appear in calculus (e.g. the chain rule in gradient descent/backpropagation), but "discrete calculus" is then used as an approximation of infinitesimal calculus. It uses "finite differences" rather than derivatives, which doesn't require real numbers:

https://en.wikipedia.org/wiki/Finite_difference

I'm not sure about applications of real numbers outside of calculus, and how to replace them there.

Analytical tools for discrete formulations are usually a lot less developed and don't as easily admit closed-form solutions.

https://en.wikipedia.org/wiki/Finite_difference

I'm not sure about applications of real numbers outside of calculus, and how to replace them there.

You should look into condition numbers & how that applies to numerical stability of discretized optimization. If you take a continuous formulation & naively discretize you might get lucky & get a convergent & stable implementation but more often than not you will end up w/ subtle bugs & instabilities for ill-conditioned initial conditions.

I understand that much, but it seems like "your naive timestep may need to be smaller than you think or you need to do some extra work" rather than the more fundamental objection from OP?

The translation from continuous to discrete is not automatic. There is a missing verification in the linked analysis. The mapping must be verified for stability for the proper class of initial/boundary conditions. Increasing the resolution from 64 bit floats to 128 bit floats doesn't automatically give you a stable discretized optimizer from a continuous formulation.

Or you can just try stuff and see if it works

Point still stands, translation from continuous to discrete is not as simple as people think.

Numerical issues totally exist but the reason has nothing to do with the fact that Cauchy sequences don't exist on a computer imo.

I find myself completely outclassed by mathematicians in my own field. I tried to learn a little math on the side after my regular software engineer gig but I'm completely outclassed by phd's.

I've just started to try and learn the basics of RL and the Bellman Equation - are there any good books or resources I should look at? I think this post is beyond my current level.

I'm most interested in how the equation can be implemented step by step in an ML library - worked examples would be very helpful.

Thank you!

> Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

The way I like to look at it is that I'm engaged in the ice trade and they are about to invent everything else that will end mine and every other current trade. Which leaves me with two practical options: a) deep despair. b) to become a Jacks of all trades, master of none, but oftentimes better than a master of one. The Jacks can, for now, capitalize in the thing that the Machines currently lack, which is agency.

IMO Computer Science doesn't have enough mathematics in the core curriculum. I think more CS students should be double majoring or minoring in Physics and/or Math. The skills you gain in analyzing problems and constructing models in Physics, finding truth/false values and analyzing problems in math, and the algorithmic skills in CS really compliment each other.

Instead of people "hacking" university education to make them purely fotm job training centers. The real hack would be something that really drills down at the fundamentals. CS, Math, Physics, and Philosophy to get an all around education in approaching problems from fundamentals I think would be the optimal school experience.

Don't despair. The key to becoming proficient in advanced subjects like this one is to first try to understand the fundamentals in plain language and pictures in your mind. Ignore the equations. Ask AI to explain the topic at hand at the most fundamental level.

Once the fundamental concepts are understood, what problem is being solved and where the key difficulties are, only then the equations will start to make sense. If you start out with the math, you're making your life unnecessarily hard.

Also, not universally true but directionally true as a rule of thumb, the more equations a text contains the less likely it is that the author itself has truly grasped the subject. People who really grasp a subject can usually explain it well in plain language.

I guess I have the opposite experience. I have a post-graduate level of mathematical education and I am dismayed at how little there is to be gained from it, when it comes to AI/ML. Diffusion Models and Geometric Deep Learning are the only two fields where there's any math at all. Many math grads are struggling to find a job at all. They aren't outclassing programmers with their leet math skillz.

AI makes it easier to catch up. :)

The big thing that made it all click for mathematics was that I stopped thinking about mathematics the way that it was taught to me and I started thinking about it the way that it naturally felt correct to me

So in my specific case I stopped thinking about mathematics as: how to interpret a sequence of symbols

But instead I decided to start thinking about it as “the symbols tell me about the multidimensional topological coordinate space that I need to inhabit

So now when I look at a equation (or whatever) my first step is “OK how do I turn this into a topology so that I can explore the toplogical space the way that a number would”

Kind of like if you were to extend Nagle’s “what it’s like to be a bat” but instead of being a bat you’re a number

I worked thru David Silver’s RL course a while back, it’s got great explanations as he builds up the equations. It’s light on implementation, but the intuitive side really complements more code-heavy examples that lack the “why” behind the equations.

https://davidstarsilver.wordpress.com/teaching/

Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.

Read here: http://incompleteideas.net/book/the-book-2nd.html

OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/

It includes both mathematical formulas and PyTorch code.

I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.

I would recommend that you start with one of the classics (not much of deep RL)

https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...

This will have a gentler learning curve. After this you can move on to more advanced material.

The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.

Happy reading.

The bellman equations (exactly as written above) are not found in ML libraries.

This is because they work assuming you know a model of the data. Most real world RL is model-free RL. Or, like in LLMs, "model is known but too big to practically use" RL.

Apart from the resources you use (good ones in other comments already), try to get the initial mental model of the whole field right, that is important since everything you read can then fit in the right place of that mental model. I will try to give one below.

- the absolute core raison d'etre of RL as a separate field: the quality of data you train on only improves as your algorithm improves. As opposed to other ML where you have all your data beforehand.

- first basic bellman equation solving (this is code wise just solving a system of linear equations)

- an algo you will come across called policy iteration (code wise, a bunch of for loops..)

- here you will be able to see how different parts of the algo become impossible in different setups, and what approximations can be done for each of them (and this is where the first neural network - called "function approximator" in RL literature - comes into play). Here you can recognise approximate versions of the bellman equation.

- here you learn DDPG, SAC algos. Crucial. Called "actor critic" in parlance.

- you also notice problems of this approach that arise because a) you don't have much high quality data and b) learning recursivelt with neural networks is very unstable, this motivates stuff like PPO.

- then you can take a step back, look at deep RL, and re-cast everything in normal ML terms. For example, techniques like TD learning (the term you would have used so far) can be re-cast as simply "data augmentation", which you do in ML all the time.

- at this point you should get in the weeds of actually engineering at scale real RL algos. Stuff like atari benchmarks. You will find that in reality, the algos as learnt are more or less a template and you need lots of problems specific detailing to actually make it work. And you will also learn engineering tricks that are crucial. This is mostly computer science stuff (increasing throughout on gpu etc - but correctly! without changing the model assumptions)

- learn goal conditioned RL, imitation learning, some model based RL like alphazero/dreamer after all of the above. You will be able to easily understand it in the overall context at this point. First two are used in robotics quite a bit. You can run a few small robotics benchmarks at this point.

- learn stuff like HRL, offline RL as extras since they are not that practically relevant yet.

Numerical issues totally exist but the reason has nothing to do with the fact that Cauchy sequences don't exist on a computer imo.

> Seems like I am engaged in the ice trade and they are about to invent the refrigerator.

AI makes it easier to catch up. :)

So in my specific case I stopped thinking about mathematics as: how to interpret a sequence of symbols

But instead I decided to start thinking about it as “the symbols tell me about the multidimensional topological coordinate space that I need to inhabit

So now when I look at a equation (or whatever) my first step is “OK how do I turn this into a topology so that I can explore the toplogical space the way that a number would”

Kind of like if you were to extend Nagle’s “what it’s like to be a bat” but instead of being a bat you’re a number

https://davidstarsilver.wordpress.com/teaching/

Reinforcement Learning by Sutton & Barto is an excellent introduction by two of the founders of the field.

Read here: http://incompleteideas.net/book/the-book-2nd.html

I would recommend that you start with one of the classics (not much of deep RL)

https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutto...

This will have a gentler learning curve. After this you can move on to more advanced material.

The other resource I will recommend is everything by Bertsekas. In this context, his books on dynamic programming and neurodyanamic programming.

Happy reading.

OpenAI's spinning up in deep RL is free and pretty good: https://spinningup.openai.com/en/latest/

It includes both mathematical formulas and PyTorch code.

I found it a bit more practical than the Sutton & Barto book, which is a classic but doesn't cover some of the more modern methods used in deep reinforcement learning.

> People who really grasp a subject can usually explain it well in plain language.

That's very much a matter of style. An equation is often the plainest way of expressing something

Don't worry when stochastic grads get stuck math grads get going.

(One of) The value(s) that a math grad brings is debugging and fixing these ML models when training fails. Many would not have an idea about how to even begin debugging why the trained model is not working so well, let alone how to explore fixes.

The real use is in actually seeing connections. Every field has their own maths and their own terminologies, their own assumptions for theorems, etc.

More often than not this is duplicated work (mathematically speaking) and there is a lot to be gained by sharing advances in either field by running it through a "translation". This has happened many times historically - a lot of the "we met at a cafe and worked it out on a napkin" inventions are exactly that.

Math proficiency helps a lot at that. The level of abstraction you deal with is naturally high.

Recently, the problem of actually knowing every field enough, just cursorily, to make connections is easier with AI. Modern LLMs do approximate retrieval and still need a planner + verifier, the mathematician can be that.

This is somewhat adjacent to what terry tao spoke about, and the setup is sort of what alpha evolve does.

You get that impression because such advances are high impact and rare (because they are difficult). Most advances come as a sequence of field-specific assumption, field-specific empirical observation, field-specific theorem, and so on. We only see the advances that are actually made, leading to an observation bias.