{"id":2719,"date":"2025-03-29T07:02:20","date_gmt":"2025-03-29T07:02:20","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/03\/29\/from-physics-to-probability-hamiltonian-mechanics-for-generative-modeling-and-mcmc\/"},"modified":"2025-03-29T07:02:20","modified_gmt":"2025-03-29T07:02:20","slug":"from-physics-to-probability-hamiltonian-mechanics-for-generative-modeling-and-mcmc","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/03\/29\/from-physics-to-probability-hamiltonian-mechanics-for-generative-modeling-and-mcmc\/","title":{"rendered":"From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC"},"content":{"rendered":"<p>    From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/non-L-pendulum.png?ssl=1\" alt=\"\" class=\"wp-image-600546\"><figcaption class=\"wp-element-caption\">Phase space of a nonlinear pendulum. Photo by the <a href=\"https:\/\/soran-ghaderi.github.io\/\">author<\/a>.<\/figcaption><\/figure>\n<p class=\"has-text-align-justify wp-block-paragraph\"><mdspan datatext=\"el1743188451748\" class=\"mdspan-comment\">Hamiltonian<\/mdspan> mechanics is a way to describe how physical systems, like planets or pendulums, move over time, focusing on energy rather than just forces. By reframing complex dynamics through energy lenses, this 19th-century physics framework now powers cutting-edge generative AI. It uses generalized coordinates ( q ) (like position) and their conjugate momenta ( p ) (related to momentum), forming a phase space that captures the system\u2019s state. This approach is particularly useful for complex systems with many parts, making it easier to find patterns and conservation laws.<\/p>\n<\/div>\n<h2 class=\"wp-block-heading\">Table of Contents<\/h2>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#mathematical-reformation-from-second-order-to-phase-flow\">Mathematical Reformation: From Second-Order to First Order <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2699.png?ssl=1\" alt=\"\u2699\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#lagrangian-prelude-action-principles\">Lagrangian Prelude: Action Principles<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#newton-vs-lagrange-vs-hamilton-a-philosophical-showdown\">Newton vs. Lagrange vs. Hamilton: A Philosophical Showdown<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#hamiltons-equations-the-geometry-of-phase-space\">Hamilton\u2019s Equations: The Geometry of Phase Space <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2699.png?ssl=1\" alt=\"\u2699\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#symplecticity-the-sacred-invariant\">Symplecticity: The Sacred Invariant<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#how-are-we-going-to-solve-it\">How are We Going to Solve it?<\/a><\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/towardsdatascience.com\/#symplectic-numerical-integrators\">Symplectic Numerical Integrators <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4bb.png?ssl=1\" alt=\"\ud83d\udcbb\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#leapfrog-verlet\">Leapfrog Verlet<\/a><\/li>\n<\/ul>\n<\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#why-symplectic-matters\">Why Symplecticity Matters<\/a><\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/towardsdatascience.com\/#hamiltonian-monte-carlo\">Hamiltonian Monte Carlo<\/a><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#from-phase-space-to-probability-space\">From Phase Space to Probability Space<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#the-hmc-algorithm\">The HMC Algorithm<\/a><\/li>\n<\/ul>\n<\/li>\n<li class=\"wp-block-list-item\"><a style=\"color: OrangeRed;\" href=\"https:\/\/towardsdatascience.com\/#torchebm-library\"><strong>TorchEBM Library<\/strong> <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4da.png?ssl=1\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#connection-with-energy-based-models\">Connection with Energy-Based Models<\/a><\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/towardsdatascience.com\/#potential-research-directions\">Potential Research Directions <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f52e.png?ssl=1\" alt=\"\ud83d\udd2e\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#symplecticity-in-machine-learning-models\">Symplecticity in Machine Learning Models<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#hmc-for-complex-distributions\">HMC for Complex Distributions<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#combining-hamiltonian-mechanics-with-other-ml-techniques\">Combining Hamiltonian Mechanics with Other ML Techniques<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#hamiltonian-gans\">Hamiltonian GANs<\/a><\/li>\n<\/ul>\n<\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#research-collab-call-out\">Wanna Team Up on This? <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f913.png?ssl=1\" alt=\"\ud83e\udd13\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#conclusion\">Conclusion<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/towardsdatascience.com\/#references-and-useful-links\">References and Useful Links <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4da.png?ssl=1\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/a><\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\" id=\"mathematical-reformation-from-second-order-to-phase-flow\">Mathematical Reformation: From Second-Order to First Order <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2699.png?ssl=1\" alt=\"\u2699\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h2>\n<p class=\"wp-block-paragraph\">Newton\u2019s ( F = mddot{q} ) requires solving second-order differential equations, which become unwieldy for constrained systems or when identifying conserved quantities.<\/p>\n<p class=\"wp-block-paragraph\"><strong>The Core Idea<\/strong><\/p>\n<p class=\"wp-block-paragraph\"><strong>Hamiltonian mechanics splits<\/strong> ( ddot{q} = F(q)\/m ) <strong>into two first-order equations<\/strong> by introducing conjugate momentum ( p ):<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>begin{align*}<br \/>dot{q} = frac{partial H}{partial p} &amp; text{(Position)}, quad dot{p} = -frac{partial H}{partial q} &amp; text{(Momentum)}<br \/>end{align*}<br \/>]<\/p>\n<p class=\"wp-block-paragraph\">It decomposes acceleration into complementary momentum\/position flows. This phase space perspective reveals hidden geometric structure.<\/p>\n<h2 class=\"wp-block-heading\" id=\"lagrangian-prelude-action-principles\">Lagrangian Prelude: Action Principles<\/h2>\n<p class=\"wp-block-paragraph\">The Lagrangian ( mathcal{L}(q, dot{q}) = K \u2013 U ) leads to Euler-Lagrange equations via variational calculus:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nfrac{d}{dt}left( frac{partial mathcal{L}}{partial dot{q}} right) \u2013 frac{partial mathcal{L}}{partial q} = 0<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Kinetic Energy Symbol<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Note that the ( K ) in the ( mathcal{L}(q, dot{q}) = K \u2013 U ) is also represented as <strong>( T )<\/strong>.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">\n<p class=\"wp-block-paragraph\">But these remain second-order. The critical leap comes through <strong>Legendre Transformation ( (dot{q} rightarrow p) )<\/strong>. The Hamiltonian is derived from the Lagrangian through a <strong>Legendre transformation<\/strong> by defining the conjugate momentum as ( p_i = frac{partial mathcal{L}}{partial dot{q}_i} ); then the Hamiltonian can be written as:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nH(q,p) = sum_i p_i dot{q}_i \u2013 mathcal{L}(q, dot{q})<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">We can write ( H(q,p) ) more intuitively as:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nH(q,p) = K(p) + U(q)<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">This flips the script: instead of ( dot{q} )-centric dynamics, we get <strong>symplectic phase flow<\/strong>.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Why This Matters<\/strong><\/p>\n<p class=\"wp-block-paragraph\">The Hamiltonian becomes the system\u2019s total energy ( H = K + U ) for many physical systems. It also provides a framework where time evolution is a <strong>canonical transformation<\/strong> \u2013 a symmetry preserving the fundamental Poisson bracket structure ( {q_i, p_j} = delta_{ij} ).<\/p>\n<p class=\"wp-block-paragraph\">For more about canonical, non-canonical transformations, and Poisson bracket, including detailed math and examples, check out the <a href=\"https:\/\/soran-ghaderi.github.io\/torchebm\/latest\/blog\/hamiltonian-mechanics\/#lagrangian-prelude-action-principles\"><span style=\"color: OrangeRed;\">TorchEBM<\/span> post on Hamiltonian mechanics<\/a>.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">This transformation is not canonical because it does not preserve the Poisson bracket structure.<\/p>\n<h2 class=\"wp-block-heading\" id=\"newton-vs-lagrange-vs-hamilton-a-philosophical-showdown\">Newton vs. Lagrange vs. Hamilton: A Philosophical Showdown<\/h2>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Aspect<\/th>\n<th>Newtonian<\/th>\n<th>Lagrangian<\/th>\n<th>Hamiltonian<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td><strong>State Variables<\/strong><\/td>\n<td>Position ( x ) and velocity ( dot{x} )<\/td>\n<td>Generalized coordinates ( q ) and velocities ( dot{q} )<\/td>\n<td>Generalized coordinates ( q ) and conjugate momenta ( p )<\/td>\n<\/tr>\n<tr>\n<td><strong>Formulation<\/strong><\/td>\n<td>Second-order differential equations ( (F=ma) )<\/td>\n<td>Principle of least action (( delta int L , dt = 0 )): ( L = K \u2013 U )<\/td>\n<td>First-order differential equations from Hamiltonian function (Phase flow ( (dH) )): ( H = K + U )<\/td>\n<\/tr>\n<tr>\n<td><strong>Identifying Symmetries<\/strong><\/td>\n<td>Manual identification or through specific methods<\/td>\n<td>Noether\u2019s theorem<\/td>\n<td>Canonical transformations and Poisson brackets<\/td>\n<\/tr>\n<tr>\n<td><strong>Machine Learning Connection<\/strong><\/td>\n<td>Physics-informed neural networks, simulations<\/td>\n<td>Optimal control, reinforcement learning<\/td>\n<td>Hamiltonian Monte Carlo (HMC) sampling, energy-based models<\/td>\n<\/tr>\n<tr>\n<td><strong>Energy Conservation<\/strong><\/td>\n<td>Not inherent (must be derived)<\/td>\n<td>Built-in through conservation laws<\/td>\n<td>Central (Hamiltonian is energy)<\/td>\n<\/tr>\n<tr>\n<td><strong>General Coordinates<\/strong><\/td>\n<td>Possible, but often cumbersome<\/td>\n<td>Natural fit<\/td>\n<td>Natural fit<\/td>\n<\/tr>\n<tr>\n<td><strong>Time Reversibility<\/strong><\/td>\n<td>Yes<\/td>\n<td>Yes<\/td>\n<td>Yes, especially in symplectic formulations<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<h2 class=\"wp-block-heading\" id=\"hamiltons-equations-the-geometry-of-phase-space\">Hamilton\u2019s Equations: The Geometry of Phase Space <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2699.png?ssl=1\" alt=\"\u2699\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h2>\n<p class=\"wp-block-paragraph\">The phase space is a mathematical space where we can represent the set of possible states of a physical system. For a system with ( n ) degrees of freedom, the phase space is a ( 2n )-dimensional space, often visualized as a map where each point ( (q, p) ) represents a unique state. The evolution of the system is described by the motion of a point in this space, governed by Hamilton\u2019s equations.<\/p>\n<figure class=\"wp-block-video\"><video controls src=\"https:\/\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/pendulum_phase_space-1.mp4\"><\/video><figcaption class=\"wp-element-caption\">Phase space portrait of a nonlinear pendulum showing oscillatory motion (closed orbits), rotational motion (wavy trajectories), and separatrices (red curves) connecting unstable equilibrium points. Position (q) and momentum (p) dynamics illustrate energy conservation principles fundamental to Hamiltonian systems. The animation is done by the <a href=\"https:\/\/soran-ghaderi.github.io\/\">author<\/a>.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">This formulation offers several advantages. It makes it straightforward to identify conserved quantities and symmetries through canonical transformations and Poisson brackets, which provides deeper insights into the system\u2019s behavior. For instance, Liouville\u2019s theorem states that the volume in phase space occupied by an ensemble of systems remains constant over time, expressed as:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nfrac{partial rho}{partial t} + {rho, H} = 0<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">or equivalently:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nfrac{partial rho}{partial t} + sum_i left(frac{partial rho}{partial q_i}frac{partial H}{partial p_i} \u2013 frac{partial rho}{partial p_i}frac{partial H}{partial q_i}right) = 0<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">where ( rho(q, p, t) ) is the density function. This helps us to represent the phase space flows and how they preserve area under symplectic transformations. Its relation to <a href=\"https:\/\/en.wikipedia.org\/wiki\/Symplectic_geometry\">symplectic geometry<\/a> enables mathematical properties that are directly relevant to many numerical methods. For instance, it enables <a href=\"https:\/\/towardsdatascience.com\/tag\/hamiltonian-monte-carlo\/\" title=\"hamiltonian monte carlo\">hamiltonian monte carlo<\/a> to perform well in high-dimensions by defining MCMC strategies that increases the chances of accepting a sample (particle).<\/p>\n<h2 class=\"wp-block-heading\" id=\"symplecticity-the-sacred-invariant\">Symplecticity: The Sacred Invariant<\/h2>\n<p class=\"wp-block-paragraph\">Hamiltonian flows preserve the symplectic 2-form ( omega = sum_i dq_i wedge dp_i ).<\/p>\n<p class=\"wp-block-paragraph\"><strong>Symplectic 2-form ( omega )<\/strong><br \/>The symplectic 2-form, denoted by ( omega = sum_i dq_i wedge dp_i ), is a mathematical object used in symplectic geometry. It measures the area of parallelograms formed by vectors in the tangent space of a phase space.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">( dq_i ) and ( dp_i ): Infinitesimal changes in position and momentum coordinates.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>( wedge )<\/strong>: The wedge product, which combines differential forms in an antisymmetric way meaning that ( dq_i wedge dp_i = -dp_i wedge dq_i ).<\/li>\n<li class=\"wp-block-list-item\">\n<strong>( sum_i )<\/strong>: Sum over all degrees of freedom.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">Imagine a phase space where each point represents a state of a physical system. The symplectic form assigns a value to each pair of vectors, effectively measuring the area of the parallelogram they span. This area is preserved under Hamiltonian flows.<br \/><strong>Key Properties<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Closed<\/strong>: ( domega = 0 ) which means its exterior derivative is zero ( domega=0 ). This property ensures that the form does not change under continuous transformations.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Non-degenerate<\/strong>: The form is non-degenerate if ( domega(X,Y)=0 ) for all ( Y )s, then ( X=0 ). This ensures that every vector has a unique \u201cpartner\u201d vector such that their pairing under ( omega ) is non-zero.<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\">\n<p><strong>Example<\/strong><br \/>For a simple harmonic oscillator with one degree of freedom, ( omega = dq wedge dp ). This measures the area of parallelograms in the phase space spanned by vectors representing changes in position and momentum.<br \/><strong>A Very Simplistic PyTorch Code<\/strong>:<br \/>While PyTorch doesn\u2019t directly handle differential forms, you can conceptually represent the symplectic form using tensors:<\/p>\n<p><script src=\"https:\/\/gist.github.com\/soran-ghaderi\/267c7d1877e36ec014aa7e9397e9eede.js\"><\/script><\/p>\n<p class=\"wp-block-paragraph\">This code illustrates the antisymmetric nature of the wedge product.<\/p>\n<p class=\"wp-block-paragraph\">Numerically, this means good integrators must respect:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nfrac{partial (q(t + epsilon), p(t + epsilon))}{partial (q(t), p(t))}^T J frac{partial (q(t + epsilon), p(t + epsilon))}{partial (q(t), p(t))} = J \\ text{where } J = begin{pmatrix} 0 &amp; I \\ -I &amp; 0 end{pmatrix}<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\"><strong>Breaking Down the Formula<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Geometric numerical integration<\/strong>: Solves differential equations while preserving geometric properties of the system.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Symplecticity<\/strong>: A geometric property inherent to Hamiltonian systems. It ensures that the area of geometric structures (e.g., parallelograms) in phase space ( (q, p) ) remains constant over time. This is encoded in the symplectic form ( omega = sum_i dq_i wedge dp_i ).<\/li>\n<li class=\"wp-block-list-item\">A numerical method is <strong>symplectic<\/strong>: If it preserves ( omega ). The Jacobian matrix of the transformation from ( (q(t), p(t)) ) to ( (q(t + epsilon), p(t + epsilon)) ) must satisfy the condition above.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Jacobian matrix<\/strong> ( frac{partial (q(t + epsilon), p(t + epsilon))}{partial (q(t), p(t))} ): Quantifies how small changes in the initial state ( (q(t), p(t)) ) propagate to the next state ( (q(t + epsilon), p(t + epsilon)) ).<\/li>\n<li class=\"wp-block-list-item\">( q(t) ) and ( p(t) ): Position and momentum at time ( t ).<\/li>\n<li class=\"wp-block-list-item\">( q(t + epsilon) ) and ( p(t + epsilon) ): Updated position and momentum after one time step ( epsilon ).<\/li>\n<li class=\"wp-block-list-item\">\n<strong>( frac{partial}{partial (q(t), p(t))} )<\/strong>: Partial derivatives with respect to the initial state.<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\" id=\"how-are-we-going-to-solve-it\">How are We Going to Solve it?<\/h2>\n<p class=\"wp-block-paragraph\">Numerical solvers for differential equations inevitably introduce errors that affect solution accuracy. These errors manifest as deviations from the true trajectory in phase space, particularly noticeable in energy-conserving systems like the harmonic oscillator. The errors fall into two main categories: local truncation error, arising from the approximation of continuous derivatives with discrete steps (proportional to ( mathcal{O}(epsilon^n+1) ) where ( epsilon ) is the step size and n depends on the method); and global accumulation error, which compounds over integration time.<\/p>\n<p class=\"wp-block-paragraph\"><strong>Forward Euler Method Fails at This!<\/strong><\/p>\n<p class=\"wp-block-paragraph\"><strong>Key Issue: Energy Drift from Non-Symplectic Updates<\/strong><\/p>\n<p class=\"wp-block-paragraph\">The forward Euler method (FEM) violates the geometric structure of Hamiltonian systems, leading to <strong>energy drift<\/strong> in long-term simulations. Let\u2019s dissect why.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">For a detailed exploration of how methods like Forward Euler perform in Hamiltonian systems and why they don\u2019t preserve symplecticity\u2014including mathematical breakdowns and practical examples\u2014check out this post on Hamiltonian mechanics from the <a href=\"https:\/\/soran-ghaderi.github.io\/torchebm\/latest\/blog\/hamiltonian-mechanics\/#how-are-we-going-to-solve-it\"><span style=\"color: OrangeRed;\">TorchEBM<\/span> library documentation<\/a>.<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">To overcome this, we turn to symplectic integrators\u2014methods that respect the underlying geometry of Hamiltonian systems, leading us naturally to the Leapfrog Verlet method, a powerful symplectic alternative. <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f680.png?ssl=1\" alt=\"\ud83d\ude80\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/p>\n<h2 class=\"wp-block-heading\" id=\"symplectic-numerical-integrators\">Symplectic Numerical Integrators <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4bb.png?ssl=1\" alt=\"\ud83d\udcbb\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h2>\n<h3 class=\"wp-block-heading\" id=\"leapfrog-verlet\">Leapfrog Verlet<\/h3>\n<p class=\"wp-block-paragraph\">For a separable Hamiltonian ( H(q,p) = K(p) + U(q) ), where the corresponding probability distribution is given by:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nP(q,p) = frac{1}{Z} e^{-U(q)} e^{-K(p)},<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">the Leapfrog Verlet integrator proceeds as follows:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nbegin{aligned}<br \/>\np_{i}left(t + frac{epsilon}{2}right) &amp;= p_{i}(t) \u2013 frac{epsilon}{2} frac{partial U}{partial q_{i}}(q(t)) \\<br \/>\nq_{i}(t + epsilon) &amp;= q_{i}(t) + epsilon frac{partial K}{partial p_{i}}left(pleft(t + frac{epsilon}{2}right)right) \\<br \/>\np_{i}(t + epsilon) &amp;= p_{i}left(t + frac{epsilon}{2}right) \u2013 frac{epsilon}{2} frac{partial U}{partial q_{i}}(q(t + epsilon))<br \/>\nend{aligned}<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">This St\u00f6rmer-Verlet scheme preserves symplecticity exactly, with local error ( mathcal{O}(epsilon^3) ) and global error ( mathcal{O}(epsilon^2) ). You can read more about <cite><a href=\"https:\/\/lemesurierb.people.charleston.edu\/numerical-methods-and-analysis-python\/main\/ODE-IVP-6-multi-step-methods-introduction-python.html\">numerical methods and analysis in Python here<\/a><\/cite>.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>How Exactly?<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Want to know exactly how the Leapfrog Verlet method ensures symplecticity with detailed equations and proofs? The <a href=\"https:\/\/soran-ghaderi.github.io\/torchebm\/latest\/blog\/hamiltonian-mechanics\/#leapfrog-verlet\"><span style=\"color: OrangeRed;\">TorchEBM library<\/span> documentation on Leapfrog Verlet<\/a> breaks it down step-by-step.<\/p>\n<\/blockquote>\n<h2 class=\"wp-block-heading\" id=\"why-symplectic-matters\">Why Symplecticity Matters<\/h2>\n<p class=\"wp-block-paragraph\">They\u2019re the <strong>reversible neural nets<\/strong> of physics simulations!<\/p>\n<p class=\"wp-block-paragraph\">Symplectic integrators like Leapfrog Verlet are critical for <strong>long-term stability<\/strong> in Hamiltonian systems.<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Phase space preservation<\/strong>: The volume in ( (q, p) )-space is conserved exactly, avoiding artificial energy drift.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Approximate energy conservation<\/strong>: While energy ( H(q,p) ) is not perfectly conserved (due to ( mathcal{O}(epsilon^2) ) error), it oscillates near the true value over exponentially long timescales.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Practical relevance<\/strong>: This makes symplectic integrators indispensable in molecular dynamics and Hamiltonian Monte Carlo (HMC), where accurate sampling relies on stable trajectories.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"1024\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/numerical_err-1-1024x1024.png?resize=1024%2C1024&#038;ssl=1\" alt=\"\" class=\"wp-image-600550\"><figcaption class=\"wp-element-caption\">Comparison of numerical integration methods for a simple harmonic oscillator in phase space. Color gradients indicate error magnitude with brighter colors showing larger divergence from the exact solution (white). Euler\u2019s method (a) exhibits energy growth, Modified Euler\u2019s method (b) shows improved stability, while Leapfrog maintains excellent energy conservation at small stepsize (c) but develops geometric distortion at larger stepsize (d). Photo by the <a href=\"https:\/\/soran-ghaderi.github.io\/\">author<\/a>.<br \/><\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Euler\u2019s method (first-order) systematically injects energy into the system, causing the characteristic outward spiral seen in the plots. Modified Euler\u2019s method (second-order) significantly reduces this energy drift. Most importantly, symplectic integrators like the Leapfrog method preserve the geometric structure of Hamiltonian systems even with relatively large step sizes by maintaining phase space volume conservation. This structural preservation is why Leapfrog remains the preferred method for long-time simulations in molecular dynamics and astronomy, where energy conservation is critical despite the visible polygon-like discretization artifacts at large step sizes.<\/p>\n<p class=\"wp-block-paragraph\">Non-symplectic methods (e.g., Euler-Maruyama) often fail catastrophically in these settings.<\/p>\n<figure class=\"wp-block-table\">\n<table class=\"has-fixed-layout\">\n<thead>\n<tr>\n<th>Integrator<\/th>\n<th>Symplecticity<\/th>\n<th>Order<\/th>\n<th>Type<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Euler Method<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>1<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Symplectically Euler<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>1<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Leapfrog (Verlet)<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>2<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Runge-Kutta 4<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>4<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Forest-Ruth Integrator<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>4<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Yoshida 6th-order<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>6<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Heun\u2019s Method (RK2)<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>2<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Third-order Runge-Kutta<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>3<\/td>\n<td>Explicit<\/td>\n<\/tr>\n<tr>\n<td>Implicit Midpoint Rule<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2705.png?ssl=1\" alt=\"\u2705\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>2<\/td>\n<td>Implicit (solving equations)<\/td>\n<\/tr>\n<tr>\n<td>Fourth-order Adams-Bashforth<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>4<\/td>\n<td>Multi-step (explicit)<\/td>\n<\/tr>\n<tr>\n<td>Backward Euler Method<\/td>\n<td><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/274c.png?ssl=1\" alt=\"\u274c\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><\/td>\n<td>1<\/td>\n<td>Implicit (solving equations)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/figure>\n<p class=\"wp-block-paragraph\">For more details on things like local and global errors or what these integrators are best suited for, there\u2019s a handy write-up over at <a href=\"https:\/\/soran-ghaderi.github.io\/torchebm\/latest\/blog\/hamiltonian-mechanics\/#why-symplectic-matters\">Hamiltonian mechanics: Why Symplecticity Matters<\/a> that covers it all.<\/p>\n<h2 class=\"wp-block-heading\" id=\"hamiltonian-monte-carlo\">Hamiltonian Monte Carlo<\/h2>\n<p class=\"wp-block-paragraph\">Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) method that leverages Hamiltonian dynamics to efficiently sample from complex probability distributions, particularly in Bayesian statistics and <a href=\"https:\/\/towardsdatascience.com\/tag\/machine-learning\/\" title=\"Machine Learning\">Machine Learning<\/a>.<\/p>\n<h3 class=\"wp-block-heading\" id=\"from-phase-space-to-probability-space\">From Phase Space to Probability Space<\/h3>\n<p class=\"wp-block-paragraph\">HMC interprets target distribution ( P(z) ) as a Boltzmann distribution:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nP(z) = frac{1}{Z} e^{frac{-E(z)}{T}}<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">Substituting into this formulation, the Hamiltonian gives us a joint density:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\nP(q,p) = frac{1}{Z} e^{-U(q)} e^{-K(p)} \\ text{where } U(q) = -log[p(q), p(q|D)]<br \/>\n]<\/p>\n<p class=\"wp-block-paragraph\">where ( p(q|D) ) is the likelihood of the given data ( D ) and T=1 and therefore removed. We estimate our posterior distribution using the potential energy ( U(q) ) since ( P(q,p) ) consists of two independent probability distributions.<\/p>\n<p class=\"wp-block-paragraph\">Augment with artificial momentum ( p sim mathcal{N}(0,M) ), then simulate Hamiltonian dynamics to propose new ( q\u2019 ) based on the distribution of the position variables ( U(q) ) which acts as the \u201cpotential energy\u201d of the target distribution ( P(q) ), thereby creating valleys at high-probability regions.<\/p>\n<p class=\"wp-block-paragraph\">For more on HMC, check out <a href=\"https:\/\/arxiv.org\/abs\/1701.02434\">this explanation<\/a> or <a href=\"https:\/\/mc-stan.org\/docs\/2_19\/reference-manual\/hamiltonian-monte-carlo.html\">this tutorial<\/a>.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"> <strong>Physical Systems<\/strong>: ( H(q,p) = U(q) + K(p) ) represents total energy<\/p>\n<p class=\"wp-block-paragraph\"> <strong>Sampling Systems<\/strong>: ( H(q,p) = -log P(q) + frac{1}{2}p^T M^{-1} p ) defines exploration dynamics<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">The kinetic energy with the popular form of ( K(p) = frac{1}{2}p^T M^{-1} p ), often Gaussian, injects momentum to traverse these landscapes. Crucially, the mass matrix ( M ) plays the role of a <strong>preconditioner<\/strong> \u2013 diagonal ( M ) adapts to parameter scales, while dense ( M ) can align with correlation structure. ( M ) is symmetric, positive definite and typically diagonal.<\/p>\n<p class=\"wp-block-paragraph\"><strong>What is Positive Definite?<\/strong><\/p>\n<p class=\"wp-block-paragraph\"><strong>Positive Definite<\/strong>: For any non-zero vector ( x ), the expression ( x^T M x ) is always positive. This ensures stability and efficiency.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" height=\"512\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/quadratics-1-1024x512.png?resize=1024%2C512&#038;ssl=1\" alt=\"\" class=\"wp-image-600551\"><figcaption class=\"wp-element-caption\">Illustration of different quadratic forms in two variables that shows how different covariance matrices<br \/>influence the shape of these forms. The plots depict:<\/p>\n<p>a) <strong>Positive Definite Form<\/strong>: A bowl-shaped surface where all eigenvalues are positive, indicating a minimum.<\/p>\n<p>b) <strong>Negative Definite Form<\/strong>: An inverted bowl where all eigenvalues are negative, indicating a maximum.<\/p>\n<p>c) <strong>Indefinite Form<\/strong>: A saddle-shaped surface with both positive and negative eigenvalues, indicating neither a maximum nor a minimum.<\/p>\n<p>Each subplot includes the matrix ( M ) and the corresponding quadratic form (Q(x) = x^T M x). Photo by the <a href=\"https:\/\/soran-ghaderi.github.io\/\">author<\/a>.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">[<br \/>x^T M x &gt; 0<br \/>]<\/p>\n<p class=\"wp-block-paragraph\"><strong>Kinetic Energy Choices<\/strong><\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Gaussian (Standard HMC)<\/strong>: ( K(p) = frac{1}{2}p^T M^{-1} p )<br \/>Yields Euclidean trajectories, efficient for moderate dimensions.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Relativistic (Riemannian HMC)<\/strong>: ( K(p) = sqrt{p^T M^{-1} p + c^2} )<br \/>Limits maximum velocity, preventing divergences in ill-conditioned spaces.<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Adaptive (Surrogate Gradients)<\/strong>: Learn ( K(p) ) via neural networks to match target geometry.<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Key Intuition<\/strong><\/p>\n<p class=\"wp-block-paragraph\">The Hamiltonian ( H(q,p) = U(q) + frac{1}{2}p^T M^{-1} p ) creates an energy landscape where momentum <strong>carries the sampler through high-probability regions<\/strong>, avoiding random walk behavior.<\/p>\n<\/blockquote>\n<h3 class=\"wp-block-heading\" id=\"the-hmc-algorithm\">The HMC Algorithm<\/h3>\n<p class=\"wp-block-paragraph\">The algorithm involves:<\/p>\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Initialization<\/strong>: Start with an initial position ( q_0 ) and sample momentum ( p_0 sim mathcal{N}(0,M) ).<\/li>\n<li class=\"wp-block-list-item wp-block-list\">\n<strong>Leapfrog Integration<\/strong>: Use the leapfrog method to approximate Hamiltonian dynamics. For a step size ( epsilon ) and L steps, update:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Half-step momentum: ( p(t + frac{epsilon}{2}) = p(t) \u2013 frac{epsilon}{2} frac{partial U}{partial q}(q(t)) )<\/li>\n<li class=\"wp-block-list-item\">Full-step position: ( q(t + epsilon) = q(t) + epsilon frac{partial K}{partial p}(p(t + frac{epsilon}{2})) ), where ( K(p) = frac{1}{2} p^T M^{-1} p ), so ( frac{partial K}{partial p} = M^{-1} p )<\/li>\n<li class=\"wp-block-list-item\">Full-step momentum: ( p(t + epsilon) = p(t + frac{epsilon}{2}) \u2013 frac{epsilon}{2} frac{partial U}{partial q}(q(t + epsilon)) )<\/li>\n<\/ul>\n<p>This is repeated L times to get proposed ( dot{q} ) and ( dot{p} ).<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Metropolis-Hastings Acceptance<\/strong>: Accept the proposed ( dot{q} ) with probability ( min(1, e^{H(q_0,p_0) \u2013 H(dot{q},dot{p})}) ), where ( H(q,p) = U(q) + K(p) ).<\/li>\n<\/ol>\n<p class=\"wp-block-paragraph\">This process generates a Markov chain with stationary distribution ( P(q) ), leveraging Hamiltonian dynamics to take larger, more efficient steps compared to random-walk methods.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Why Better Than Random Walk?<\/strong><\/p>\n<p class=\"wp-block-paragraph\">HMC navigates high-dimensional spaces along energy contours \u2013 like following mountain paths instead of wandering randomly!<\/p>\n<\/blockquote>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Recap of the Hamilton\u2019s equations?<\/strong><\/p>\n<p class=\"wp-block-paragraph\">[<br \/>begin{cases}<br \/>dot{q} = nabla_p K(p) = M^{-1}p &amp; text{(Guided exploration)} \\<br \/>dot{p} = -nabla_q U(q) = nabla_q log P(q) &amp; text{(Bayesian updating)}<br \/>end{cases}<br \/>]<\/p>\n<\/blockquote>\n<p class=\"wp-block-paragraph\">This coupled system drives ( (q,p) ) along iso-probability contours of ( P(q) ), with momentum <strong>rotating<\/strong> rather than <strong>resetting<\/strong> at each step like in Random Walk Metropolis\u2013think of following mountain paths instead of wandering randomly! The key parameters \u2013 integration time ( tau = Lepsilon ) and step size ( epsilon ) \u2013 balance exploration vs. computational cost:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">\n<strong>Short ( tau )<\/strong>: Local exploration, higher acceptance<\/li>\n<li class=\"wp-block-list-item\">\n<strong>Long ( tau )<\/strong>: Global moves, risk of U-turns (periodic orbits)<\/li>\n<\/ul>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><strong>Key Parameters and Tuning<\/strong><\/p>\n<p class=\"wp-block-paragraph\">Tuning ( M ) to match the covariance of ( P(q) ) (e.g., via warmup adaptation) and setting ( tau sim mathcal{O}(1\/lambda_{text{max}}) ), where ( lambda_{text{max}} ) is the largest eigenvalue of ( nabla^2 U ), often yields optimal mixing.<\/p>\n<\/blockquote>\n<h4 class=\"wp-block-heading\" id=\"torchebm-library\" style=\"color: OrangeRed;\">TorchEBM Library <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4da.png?ssl=1\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h4>\n<p class=\"wp-block-paragraph\">Oh, by the way, I\u2019ve been messing around with this stuff in Python and threw together a library called <a href=\"https:\/\/soran-ghaderi.github.io\/torchebm\/latest\/\" target=\"_blank\" style=\"color: OrangeRed;\" rel=\"noopener\"><strong>TorchEBM<\/strong><\/a>. It\u2019s got some tools for energy-based, score matching, diffusion- and flow-based models and HMC bits I\u2019ve been playing with. Nothing fancy, just a researcher\u2019s sandbox for testing ideas like these. If you\u2019re into coding this kind of thing, poke around on <a href=\"https:\/\/github.com\/soran-ghaderi\/torchebm\" target=\"_blank\" style=\"color: OrangeRed;\" rel=\"noopener\">TorchEBM GitHub<\/a> and lemme know what you think\u2014PRs welcome! Been fun tinkering with it while writing this post.<\/p>\n<h2 class=\"wp-block-heading\" id=\"connection-with-energy-based-models\">Connection with Energy-Based Models<\/h2>\n<p class=\"wp-block-paragraph\">Energy-based models (EBMs) are a class of generative models that define a probability distribution over data points using an energy function. The probability of a data point is proportional to ( e^{-E(x)} ), where ( E(x) ) is the energy function. This formulation is directly analogous to the Boltzmann distribution in statistical physics, where the probability is related to the energy of a state. In Hamiltonian mechanics, the Hamiltonian function ( H(q, p) ) represents the total energy of the system, and the probability distribution in phase space is given by ( e^{-H(q,p)\/T} ), where ( T ) is the temperature.<\/p>\n<p class=\"wp-block-paragraph\">In EBMs, Hamiltonian Monte Carlo (HMC) is often used to sample from the model\u2019s distribution. HMC leverages Hamiltonian dynamics to propose new states, which are then accepted or rejected based on the Metropolis-Hastings criterion. This method is particularly effective for high-dimensional problems, as it reduces the correlation between samples and allows for more efficient exploration of the state space. For instance, in image generation tasks, HMC can sample from the distribution defined by the energy function, facilitating the generation of high-quality images.<\/p>\n<p class=\"wp-block-paragraph\">EBMs define probability through Hamiltonians:<\/p>\n<p class=\"wp-block-paragraph\">[<br \/>\np(x) = frac{1}{Z}e^{-E(x)} quad leftrightarrow quad H(q,p) = E(q) + K(p)<br \/>\n]<\/p>\n<h2 class=\"wp-block-heading\" id=\"potential-research-directions\">Potential Research Directions <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f52e.png?ssl=1\" alt=\"\ud83d\udd2e\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h2>\n<h3 class=\"wp-block-heading\" id=\"symplecticity-in-machine-learning-models\">Symplecticity in Machine Learning Models<\/h3>\n<figure class=\"wp-block-image aligncenter size-large\"><img data-recalc-dims=\"1\" height=\"536\" width=\"1024\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/contributor.insightmediagroup.io\/wp-content\/uploads\/2025\/03\/hamilNN-1024x536.png?resize=1024%2C536&#038;ssl=1\" alt=\"\" class=\"wp-image-600641\"><figcaption class=\"wp-element-caption\">Overview of the Hamiltonian Neural Networks architecture. Image from the <a href=\"https:\/\/arxiv.org\/abs\/1906.01563\">HNN<\/a> paper.<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\">Incorporate the symplectic structure of Hamiltonian mechanics into machine learning models to preserve properties like energy conservation, which is crucial for long-term predictions. Generalizing Hamiltonian Neural Networks (HNNs), as discussed in <a href=\"https:\/\/greydanus.github.io\/2019\/05\/15\/hamiltonian-nns\/\">Hamiltonian Neural Networks<\/a>, to more complex systems or developing new architectures that preserve symplecticity<\/p>\n<h3 class=\"wp-block-heading\" id=\"hmc-for-complex-distributions\">HMC for Complex Distributions<\/h3>\n<p class=\"wp-block-paragraph\">HMC for sampling from complex, high-dimensional, and multimodal distributions, such as those encountered in deep learning. Combining HMC with other techniques, like parallel tempering, could handle distributions with multiple modes more effectively.<\/p>\n<h3 class=\"wp-block-heading\" id=\"combining-hamiltonian-mechanics-with-other-ml-techniques\">Combining Hamiltonian Mechanics with Other ML Techniques<\/h3>\n<p class=\"wp-block-paragraph\">Integrate Hamiltonian mechanics with reinforcement learning to guide exploration in continuous state and action spaces. Using it to model the environment could improve exploration strategies, as seen in potential applications in robotics. Additionally, using Hamiltonian mechanics to define approximate posteriors in variational inference could lead to more flexible and accurate approximations.<\/p>\n<h3 class=\"wp-block-heading\" id=\"hamiltonian-gans\">Hamiltonian GANs<\/h3>\n<p class=\"wp-block-paragraph\">Employing Hamiltonian formalism as an inductive bias for the generation of physically plausible videos with neural networks.<\/p>\n<h2 class=\"wp-block-heading\" id=\"research-collab-call-out\" style=\"color: OrangeRed;\">Wanna Team Up on This? <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f913.png?ssl=1\" alt=\"\ud83e\udd13\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"><br \/>\n<\/h2>\n<p class=\"wp-block-paragraph\">If some of you brilliant folks wherever you\u2019re doing high-level wizardry are into research collaboration, I\u2019d love to chat generative models over coffee <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/2615.png?ssl=1\" alt=\"\u2615\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> (virtual or IRL (London)). If you\u2019re into pushing these ideas further, hit me up! Follow me on <a href=\"https:\/\/twitter.com\/soranghadri\" target=\"_balnk\" style=\"color: OrangeRed;\" rel=\"noopener\">Twitter<\/a>\/<a href=\"https:\/\/bsky.app\/profile\/soranghaderi.bsky.social\" target=\"_balnk\" rel=\"noopener\">BlueSky<\/a> or <a href=\"https:\/\/github.com\/soran-ghaderi\" target=\"_balnk\" style=\"color: OrangeRed;\" rel=\"noopener\">GitHub<\/a>\u2014I\u2019m usually rambling about this stuff there. Also on <a href=\"https:\/\/www.linkedin.com\/in\/soran-ghaderi\" target=\"_blank\" rel=\"noopener\">LinkedIn<\/a> and <a href=\"https:\/\/soran-ghaderi.medium.com\/\" target=\"_blank\" rel=\"noopener\">Medium<\/a>\/<a href=\"https:\/\/towardsdatascience.com\/author\/soran-ghaderi\/\" target=\"_blank\" rel=\"noopener\">TDS<\/a> if you\u2019re curious. To find more about my research interests, check out my <a href=\"https:\/\/soran-ghaderi.github.io\/\" style=\"color: OrangeRed;\" target=\"_blank\" rel=\"noopener\">personal website<\/a>.<\/p>\n<hr class=\"wp-block-separator has-css-opacity\">\n<h2 class=\"wp-block-heading\" id=\"conclusion\">Conclusion<\/h2>\n<p class=\"wp-block-paragraph\">Hamiltonian mechanics reframes physical systems through energy, using phase space to reveal symmetries and conservation laws via first-order equations. Symplectic integrators like Leapfrog Verlet preserve this structure, ensuring stability in simulations\u2014crucial for applications like molecular dynamics and Hamiltonian Monte Carlo (HMC). HMC leverages these dynamics to sample complex distributions efficiently, bridging classical physics with modern machine learning.<\/p>\n<hr class=\"wp-block-separator has-css-opacity\">\n<h2 class=\"wp-block-heading\" id=\"references-and-useful-links\">References and Useful Links <img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/s.w.org\/images\/core\/emoji\/15.0.3\/72x72\/1f4da.png?ssl=1\" alt=\"\ud83d\udcda\" class=\"wp-smiley\" style=\"height: 1em; max-height: 1em;\"> <\/h2>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/lemesurierb.people.charleston.edu\/numerical-methods-and-analysis-python\/main\/ODE-IVP-6-multi-step-methods-introduction-python.html\">An Introduction to Multistep Methods: Leap-frog<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/bayesianbrad.github.io\/posts\/2019_hmc.html\">The beginners guide to Hamiltonian Monte Carlo<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/bjlkeng.io\/posts\/hamiltonian-monte-carlo\/\">Hamiltonian Monte Carlo<\/a><\/li>\n<li class=\"wp-block-list-item\">\n<a href=\"https:\/\/mc-stan.org\/docs\/2_19\/reference-manual\/hamiltonian-monte-carlo.html\">Hamiltonian Monte Carlo \u2013 Stan<\/a> \u2013 Stan explained<\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/profoundphysics.com\/hamiltonian-mechanics-for-dummies\/\">Hamiltonian Mechanics For Dummies: An Intuitive Introduction<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/Hamiltonian_mechanics\">Hamiltonian mechanics Wikipedia page<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/www.macs.hw.ac.uk\/~simonm\/mechanics.pdf\">An introduction to Lagrangian and Hamiltonian mechanics Lecture notes<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/phys.libretexts.org\/Bookshelves\/Classical_Mechanics\/Classical_Mechanics_%28Tatum%29\/14:_Hamiltonian_Mechanics\">Hamiltonian Mechanics \u2013 Jeremy Tatum, University of Victoria<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/greydanus.github.io\/2019\/05\/15\/hamiltonian-nns\/\">Hamiltonian Neural Networks \u2013 Blog<\/a><\/li>\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/arxiv.org\/abs\/1906.01563\">Hamiltonian Neural Networks<\/a><\/li>\n<li class=\"wp-block-list-item\">Other: <a href=\"https:\/\/greydanus.github.io\/\">Natural Intelligence \u2013 A blog by Sam Greydanus<\/a> \u2013 Many interesting topics<\/li>\n<\/ul>\n<p class=\"wp-block-shortcode\">[]<\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/from-physics-to-probability-hamiltonian-mechanics-for-generative-modeling-and-mcmc\/\">From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Soran Ghaderi<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/from-physics-to-probability-hamiltonian-mechanics-for-generative-modeling-and-mcmc\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>From Physics to Probability: Hamiltonian Mechanics for Generative Modeling and MCMC Phase space of a nonlinear pendulum. Photo by the author. Hamiltonian mechanics is a way to describe how physical systems, like planets or pendulums, move over time, focusing on energy rather than just forces. By reframing complex dynamics through energy lenses, this 19th-century physics [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[62,67,2186,2187,70,229,983],"tags":[2189,2188,607],"class_list":["post-2719","post","type-post","status-publish","format-standard","hentry","category-aimldsaimlds","category-deep-dives","category-generative-modeling","category-hamiltonian-monte-carlo","category-machine-learning","category-math","category-mcmc-sampling","tag-hamiltonian","tag-partial","tag-space"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2719"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=2719"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/2719\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=2719"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=2719"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=2719"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}