Kepler Orbit Reconstruction: PINN vs Hamiltonian Neural Networks
1. Kepler's Problem: The Physics
The two-body gravitational problem is one of the oldest and most elegant problems in classical mechanics. A planet of mass \(m\) orbits a much heavier star of mass \(M\) under Newton's law of universal gravitation. In the center-of-mass frame, the reduced two-body problem collapses to a single particle of reduced mass \(\mu\) moving in a central force field.
Newton's Law of Gravitation
Working in 2D with \(GM=1\) for simplicity, the equations of motion are:
Conic Section Solutions
The general solution to the two-body problem is a conic section. The orbit type is determined by total energy \(E\):
- \(E < 0\) → Ellipse (bound orbit) — the Keplerian case
- \(E = 0\) → Parabola (marginally unbound)
- \(E > 0\) → Hyperbola (unbound, fly-by)
For a bound elliptical orbit, the key orbital elements are:
| Parameter | Symbol | Description |
|---|---|---|
| Semi-major axis | \(a\) | Half the longest diameter of the ellipse |
| Eccentricity | \(e\) | Shape parameter: 0 = circle, 0<e<1 = ellipse |
| Orbital period | \(T = 2\pi\sqrt{a^3/GM}\) | Time for one complete orbit |
| Periapsis | \(r_{\min} = a(1-e)\) | Closest approach to star |
| Apoapsis | \(r_{\max} = a(1+e)\) | Farthest distance from star |
Kepler's Three Laws
Each planet moves in an ellipse with the star at one focus.
\[r(\theta) = \frac{a(1-e^2)}{1 + e\cos\theta}\]The radius vector sweeps equal areas in equal times.
\[\frac{dA}{dt} = \frac{L_z}{2m} = \text{const}\]The square of the orbital period is proportional to the cube of the semi-major axis.
\[T^2 \propto a^3 \quad\Rightarrow\quad T = 2\pi\sqrt{\frac{a^3}{GM}}\]The Hamiltonian (Total Energy)
The Kepler problem is a Hamiltonian system. The Hamiltonian \(H\) equals the total mechanical energy and is conserved along every trajectory:
Here \(p_x = \dot x = v_x\) and \(p_y = \dot y = v_y\) are the canonical momenta (for unit mass).
Angular Momentum Conservation
By Noether's theorem, the rotational symmetry of the gravitational potential (\(V \sim 1/r\)) implies conservation of the z-component of angular momentum:
This is why Kepler's 2nd law holds: area swept per unit time \(= L_z/2m\) is constant.
✏ Analytical Solution — Pen & Paper
Before touching any code, let's derive the exact solution to the 2-D Kepler problem from first principles. Every symbol the neural networks later try to approximate has a closed form here. The derivation proceeds in six logical steps.
- Write Newton's law in Cartesian & switch to polar
- Identify two first integrals: angular momentum \(L\) and energy \(E\)
- Derive the orbit equation via Binet's substitution \(u=1/r\)
- Link orbital elements \((a,e)\) to \((E,L)\)
- Introduce eccentric anomaly → Kepler's equation \(M = E_a - e\sin E_a\)
- Recover Cartesian coordinates \(x(t),\,y(t)\)
Step 1 — Equations of Motion
A planet of mass \(m\) orbits a fixed star of mass \(M\) (with \(GM=\mu\)). Newton's second law gives:
In polar coordinates \((r,\theta)\) the same equations become the radial and tangential equations:
The tangential equation immediately gives the first integral → Step 2.
Step 2 — Two Conservation Laws
From \(\frac{d}{dt}(r^2\dot\theta)=0\):
\[L \;=\; r^2\dot\theta \;=\; x\,v_y - y\,v_x \;=\; \text{const}\]Kepler's 2nd law: \(\frac{dA}{dt}=\frac{L}{2}=\text{const}\)
Multiply radial eq. by \(\dot r\), integrate:
\[E \;=\; \tfrac{1}{2}(\dot r^2+r^2\dot\theta^2) - \frac{\mu}{r} \;=\; \text{const}\]Bound orbit: \(E<0\), parabola: \(E=0\), hyperbola: \(E>0\)
Step 3 — Orbit Shape: Binet's Equation
Substitute \(u=1/r\) and use \(\dot\theta=Lu^2\) to convert the time derivatives to \(\theta\)-derivatives:
The radial equation becomes the Binet equation — a simple harmonic oscillator:
General solution:
\[u(\theta) = \frac{\mu}{L^2}\bigl(1 + e\cos(\theta-\omega)\bigr)\]Revert to \(r\) (choosing periapsis at \(\theta=0\)):
This is the conic section equation. \(e<1\): ellipse, \(e=1\): parabola, \(e>1\): hyperbola. \(p\) is the semi-latus rectum.
Step 4 — Orbital Elements from \(E\) and \(L\)
For an ellipse \((e<1,\;E<0)\), the geometry gives semi-major axis \(a\) and semi-minor axis \(b\):
Kepler's 3rd law gives the orbital period:
For our benchmark (\(\mu=1,\;a=1,\;e=0.5\)): \(E=-0.5\), \(L=\sqrt{p}=\sqrt{3}/2\approx0.866\), \(T=2\pi\approx6.283\).
Step 5 — Time Dependence & Kepler's Equation
We have \(r(\theta)\) but need \(r(t)\). From \(L=r^2\dot\theta\):
\[dt = \frac{r^2}{L}d\theta\]This integral is awkward. Kepler's brilliant trick: introduce the eccentric anomaly \(E_a\) via:
Substituting into the time integral and simplifying gives the famous transcendental equation:
where the mean anomaly \(M = \frac{2\pi}{T}(t-t_0)\) is linear in time — a perfect clock.
Draw a circumscribed circle of radius \(a\) around the ellipse. Project the planet's position vertically onto this circle. The angle from the centre of the ellipse to that projected point is \(E_a\). \(\theta\) (true anomaly) is the angle from the focus.
Derived by equating the two forms of \(r\): the orbit equation and \(a(1-e\cos E_a)\).
Given \(M\), find \(E_a\). Define \(f(E_a)=E_a-e\sin E_a - M\). Then \(f'(E_a)=1-e\cos E_a\).
import numpy as np
def eccentric_anomaly(M, e, tol=1e-12):
"""Newton-Raphson solver for Kepler's equation M = Ea - e*sin(Ea)."""
Ea = M.copy() # initial guess: Ea ≈ M (good for small e)
for _ in range(50): # ~3-5 iterations sufficient for e < 0.99
f = Ea - e * np.sin(Ea) - M
fp = 1.0 - e * np.cos(Ea)
delta = f / fp
Ea -= delta
if np.max(np.abs(delta)) < tol:
break
return Ea # converges quadratically
Step 6 — Cartesian Reconstruction
Once \(E_a(t)\) is known, Cartesian position and velocity follow directly:
Here the origin is at the focus (the Sun / star). The planet moves faster near periapsis (\(E_a\approx 0\), \(r=a(1-e)\)) and slower near apoapsis (\(E_a\approx\pi\), \(r=a(1+e)\)).
def kepler_cartesian(t, a, e, mu=1.0):
"""Exact analytic trajectory: returns (x, y, vx, vy) at times t."""
T = 2*np.pi * np.sqrt(a**3 / mu) # Kepler 3rd law
M = 2*np.pi / T * t # mean anomaly
Ea = eccentric_anomaly(M, e) # solve Kepler's eq.
b = a * np.sqrt(1 - e**2) # semi-minor axis
r = a * (1 - e * np.cos(Ea))
x = a * (np.cos(Ea) - e)
y = b * np.sin(Ea)
fac = 2*np.pi*a / (T * (1 - e*np.cos(Ea)))
vx = -fac * np.sin(Ea)
vy = fac * b/a * np.cos(Ea)
return x, y, vx, vy
📋 Complete Recipe (cheat-sheet)
| Quantity | Formula | Value (benchmark) |
|---|---|---|
| Semi-major axis | \(a=-\mu/(2E)\) | \(1.0\) |
| Eccentricity | \(e=\sqrt{1-b^2/a^2}\) | \(0.5\) |
| Period | \(T=2\pi\sqrt{a^3/\mu}\) | \(2\pi\approx6.283\) |
| Semi-latus rectum | \(p=L^2/\mu=a(1-e^2)\) | \(0.75\) |
| Mean anomaly | \(M=2\pi t/T\) | linear in \(t\) |
| Kepler's equation | \(M=E_a-e\sin E_a\) | solve by N–R |
| Radial distance | \(r=a(1-e\cos E_a)\) | \(0.5\leq r\leq1.5\) |
| Cartesian pos. | \(x=a(\cos E_a-e),\;y=b\sin E_a\) | exact orbit |
Why is this hard for neural networks? The mapping \(t\mapsto(x,y)\) involves solving a transcendental equation at each time step, the output is multi-scale (slow near apoapsis, fast near periapsis), and small errors in phase accumulate quadratically in trajectory distance. That is why PINNs struggle and structure-preserving architectures like HNNs are needed.
2. Solving with Numerical Integration (RK45 Reference)
Before training any neural network, we generate a high-accuracy reference trajectory using scipy.integrate.odeint (Fortran LSODA, tolerance \(10^{-11}\)). The second-order ODE is split into a first-order system:
Initial Conditions at Periapsis
We launch from periapsis (closest approach), where the velocity is purely tangential:
The last expression is the vis-viva equation evaluated at periapsis. For \(a=1,\,e=0.5,\,GM=1\): \(x_0=0.5\), \(v_{y,0}=\sqrt{3}\approx1.732\).
Python Code
from scipy.integrate import odeint
import numpy as np
GM, a, e = 1.0, 1.0, 0.5
x0 = a * (1 - e) # periapsis distance = 0.5
vy0 = np.sqrt(GM * (1 + e) / (a * (1 - e))) # vis-viva at periapsis ≈ 1.732
T = 2 * np.pi * np.sqrt(a**3 / GM) # orbital period ≈ 6.283
def kepler_ode(state, t):
x, y, vx, vy = state
r = np.sqrt(x**2 + y**2)
return [vx, vy, -GM * x / r**3, -GM * y / r**3]
t = np.linspace(0, 1.5 * T, 1000)
sol = odeint(kepler_ode, [x0, 0, 0, vy0], t, rtol=1e-11, atol=1e-13)
# Conservation check
x_s, y_s, vx_s, vy_s = sol[:,0], sol[:,1], sol[:,2], sol[:,3]
r_s = np.sqrt(x_s**2 + y_s**2)
E = 0.5*(vx_s**2 + vy_s**2) - GM/r_s # should be -0.5
L = x_s*vy_s - y_s*vx_s # should be sqrt(3)/2 ≈ 0.866
print(f"E = {E.mean():.6f} ± {E.std():.2e}")
print(f"L = {L.mean():.6f} ± {L.std():.2e}")
Conservation Checks (Exact Values)
| Conserved Quantity | Formula | Exact Value (\(a=1,e=0.5,GM=1\)) | RK45 Drift |
|---|---|---|---|
| Total energy \(E\) | \(-GM/(2a)\) | \(-0.500000\) | \(<10^{-10}\) |
| Angular momentum \(L\) | \(\sqrt{GM\,a(1-e^2)}\) | \(0.866025\) | \(<10^{-10}\) |
The RK45 integrator with tight tolerances conserves both quantities to machine precision over 1.5 orbital periods. This gives us the ground truth against which all neural network methods are judged.
3. Why is this Hard for Vanilla PINN?
A naïve PINN with default hyperparameters catastrophically fails on the Kepler problem. Below are the seven root causes:
7 Classic Failure Modes of Vanilla PINN on Kepler
- Phase drift on periodic dynamics. The mapping \(t \mapsto (x,y)\) must be exact — any small error in period accumulates over orbits, and the network has no mechanism to lock phase.
- No initial condition (IC) enforcement. Without an explicit IC loss, infinitely many trajectories (all with the same orbit shape, different phases) satisfy the ODE residual \(= 0\). The network picks an arbitrary one.
- Physics loss massively underweighted. Default \(\lambda_\text{phys}=10^{-3}\) turns the ODE constraint into mere regularization; the network fits the training data points but ignores the governing equations.
- Too few collocation points. 50 random time samples are grossly insufficient for a nonlinear Hamiltonian system that needs to enforce two coupled second-order equations continuously in time.
- Noisy second-order autograd gradients. Computing \(\ddot x = \partial^2 f / \partial t^2\) via two passes of automatic differentiation accumulates numerical noise, especially with shallow networks and step activations.
- Singularity at \(r \to 0\). The gravitational force \(\sim r^{-2}\) and the ODE right-hand side \(\sim r^{-3}\) diverge near the origin. Any collocation point near the star produces gradient explosion, destabilizing training.
- No inductive bias for conservation laws. The neural network has no architectural reason to conserve energy or angular momentum. It will drift off the correct energy shell during extrapolation.
A vanilla PINN treats the Kepler problem as pure function approximation with a soft physics penalty. But Kepler orbits live on a specific level set of the Hamiltonian, enforced by two independent conservation laws. Without architectural inductive bias, the network has no reason to stay on that manifold during extrapolation.
🎬 Standard NN training — fits training window (blue), diverges in extrapolation (grey). No physics constraint.
🎬 Improved PINN training — IC + physics losses guide the trajectory. Slower convergence, better physical consistency.
4. Improved PINN: The Fixes
Five Key Improvements
- ✅ N_COLLOC = 500 (was 50) — dense coverage of the time domain to enforce ODE continuously
- ✅ \(\lambda_\text{phys} = 1.0\) (was \(10^{-3}\)) — equal weight for physics and data losses
- ✅ IC loss with \(\lambda_\text{IC} = 10.0\) — strongly penalizes deviation from the correct starting position and velocity
- ✅ Time normalized to \([0,1]\) — substitution \(\tau = t/T_\text{sim}\) eliminates scale mismatch in autograd gradients
- ✅ LR decay schedule (StepLR) — reduces learning rate by ×0.5 every 2000 epochs for stable convergence
Composite Loss Function
Each component has a precise role:
| Loss Term | Formula | Purpose | Weight |
|---|---|---|---|
| \(\mathcal{L}_\text{data}\) | \(\frac{1}{N}\sum_i\|\hat{\mathbf{r}}(t_i)-(x_i,y_i)\|^2\) | Fit observed trajectory points | 1.0 |
| \(\mathcal{L}_\text{IC}\) | \(|\hat{\mathbf{r}}(0)-(x_0,y_0)|^2 + |\hat{\dot{\mathbf{r}}}(0)-(v_{x0},v_{y0})|^2\) | Pin starting position & velocity | 10.0 |
| \(\mathcal{L}_\text{phys}\) | \(\text{MSE}(\ddot x + GM x/r^3,\; \ddot y + GM y/r^3)\) | Enforce Newton's gravity law | 1.0 |
Physics Residual with Time Normalization
Normalizing time to \(\tau = t/T_\text{sim}\) transforms the ODE: \(\partial^2 x/\partial\tau^2 = T_\text{sim}^2\,\partial^2 x/\partial t^2\). The residual becomes:
# τ = t / T_sim → d²x/dτ² = T_sim² · d²x/dt²
# Network predicts (x, y) as a function of τ ∈ [0, 1]
def physics_loss(model, tau_colloc, T_sim, GM=1.0):
tau_c = tau_colloc.requires_grad_(True)
xy = model(tau_c) # (N, 2)
x, y = xy[:, 0:1], xy[:, 1:2]
# First derivatives w.r.t. τ
vx_tau = torch.autograd.grad(x.sum(), tau_c, create_graph=True)[0]
vy_tau = torch.autograd.grad(y.sum(), tau_c, create_graph=True)[0]
# Second derivatives w.r.t. τ (= T²·acceleration in real time)
ax_tau = torch.autograd.grad(vx_tau.sum(), tau_c, create_graph=True)[0]
ay_tau = torch.autograd.grad(vy_tau.sum(), tau_c, create_graph=True)[0]
r = torch.sqrt(x**2 + y**2).clamp(min=1e-3) # avoid singularity
T2 = T_sim**2
# ODE residuals: d²x/dτ² = -T² · GM·x/r³
res_x = ax_tau + T2 * GM * x / r**3
res_y = ay_tau + T2 * GM * y / r**3
return (res_x**2 + res_y**2).mean()
Without normalization, the orbital period \(T\approx6.28\) creates a mismatch between the input scale (\(t\in[0,9.4]\)) and the typical NN weight initialization (designed for unit-scale inputs). Normalizing \(\tau\in[0,1]\) makes gradient magnitudes consistent across all autograd passes, dramatically improving training stability.
5. Hamiltonian Neural Networks (HNN)
Concept and Reference
The Hamiltonian Neural Network (HNN) framework was introduced by Greydanus, Dzamba & Yosinski (NeurIPS 2019) [1]. The key insight: every Hamiltonian system has the form
If we parameterize \(H\) with a neural network \(H_\theta\), we can compute the right-hand sides via autograd and train to match observed accelerations. This enforces the symplectic structure at the architectural level.
For orbital mechanics specifically, Schiassi et al. (Springer 2023) [2] demonstrated that physics-informed methods (using X-TFC) can solve Kepler's initial value problem far faster than traditional iterative methods, with applications to asteroid tracking and satellite orbit determination.
Architecture
The HNN takes the full phase-space state \(\mathbf{z} = (x, y, p_x, p_y)\in\mathbb{R}^4\) and outputs a scalar \(H_\theta(\mathbf{z})\in\mathbb{R}\):
Hamilton's Equations via Autograd
These four equations are the time derivatives of the state vector, computed purely from the learned Hamiltonian — no explicit dynamics model is needed.
HNN Training Loss
Given a dataset of states \(\{(x_i, y_i, v_{xi}, v_{yi})\}\) and accelerations \(\{(a_{xi}, a_{yi})\}\) from the RK45 reference, we train by enforcing Hamilton's equations:
PyTorch Implementation
import torch
import torch.nn as nn
class HamiltonianNN(nn.Module):
def __init__(self, n_hidden=128, n_layers=3):
super().__init__()
layers = [nn.Linear(4, n_hidden), nn.Tanh()]
for _ in range(n_layers - 1):
layers += [nn.Linear(n_hidden, n_hidden), nn.Tanh()]
layers.append(nn.Linear(n_hidden, 1))
self.net = nn.Sequential(*layers)
def forward(self, z):
"""z: (N, 4) = [x, y, px, py] → H: (N, 1)"""
return self.net(z)
def time_derivative(self, z):
"""Compute dz/dt from Hamilton's equations via autograd."""
z_r = z.detach().requires_grad_(True)
H = self.net(z_r).sum()
dH = torch.autograd.grad(H, z_r, create_graph=True)[0] # (N, 4)
# z = [x, y, px, py]: dq/dt = +∂H/∂p, dp/dt = -∂H/∂q
dzdt = torch.stack([
dH[:, 2], # dx/dt = +∂H/∂px
dH[:, 3], # dy/dt = +∂H/∂py
-dH[:, 0], # dpx/dt = -∂H/∂x
-dH[:, 1], # dpy/dt = -∂H/∂y
], dim=1)
return dzdt
def hnn_loss(model, z_data, dzdt_data):
"""
z_data: (N, 4) phase-space states [x, y, vx, vy]
dzdt_data: (N, 4) true time derivatives [vx, vy, ax, ay]
"""
dzdt_pred = model.time_derivative(z_data)
return nn.functional.mse_loss(dzdt_pred, dzdt_data)
# Training loop
model = HamiltonianNN(n_hidden=128, n_layers=3)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2000, gamma=0.5)
for epoch in range(10000):
optimizer.zero_grad()
loss = hnn_loss(model, z_train, dzdt_train)
loss.backward()
optimizer.step()
scheduler.step()
Why HNN Conserves Energy: The Symplectic Identity
Define the symplectic matrix \(J = \begin{pmatrix}0 & I \\ -I & 0\end{pmatrix}\). Hamilton's equations read \(\dot{\mathbf{z}} = J\nabla H\). Then:
\[\frac{d}{dt}H(\mathbf{z}(t)) = \nabla H \cdot \dot{\mathbf{z}} = \nabla H \cdot J\nabla H = 0\]The last equality holds because \(J\) is antisymmetric: \(\mathbf{v}\cdot J\mathbf{v} = 0\) for any \(\mathbf{v}\). This is the symplectic identity — it is independent of the specific form of \(H\). Therefore, any neural network \(H_\theta\), however imperfectly trained, will conserve its own learned energy \(H_\theta\) exactly along the integrated trajectory.
Consequence: If \(H_\theta \approx H_\text{true}\), then the integrated trajectory will automatically lie near the correct Keplerian energy shell — with no special loss terms or regularization required.
Inference: Integrating the Learned HNN
After training, we use scipy.solve_ivp to integrate the HNN's learned dynamics forward in time:
from scipy.integrate import solve_ivp
import torch
hnn_model.eval()
def hnn_rhs(t, state):
"""Right-hand side for scipy integrator: uses HNN gradients."""
with torch.enable_grad():
z = torch.tensor(state, dtype=torch.float32).unsqueeze(0).requires_grad_(True)
H = hnn_model(z).sum()
dH = torch.autograd.grad(H, z)[0].squeeze().detach().numpy()
# Hamilton's equations: dq/dt = +∂H/∂p, dp/dt = -∂H/∂q
return [dH[2], dH[3], -dH[0], -dH[1]]
# Integrate from periapsis over 1.5 orbital periods
z0 = [x0, 0.0, 0.0, vy0]
t_span = (0, 1.5 * T)
t_eval = np.linspace(0, 1.5 * T, 1000)
sol = solve_ivp(hnn_rhs, t_span, z0, t_eval=t_eval,
method='RK45', rtol=1e-9, atol=1e-11)
x_hnn, y_hnn = sol.y[0], sol.y[1]
# Conservation check
r_hnn = np.sqrt(x_hnn**2 + y_hnn**2)
E_hnn = 0.5*(sol.y[2]**2 + sol.y[3]**2) - GM/r_hnn
L_hnn = x_hnn*sol.y[3] - y_hnn*sol.y[2]
print(f"HNN energy drift ΔE = {E_hnn.std():.4e}")
print(f"HNN ang-mom drift ΔL = {L_hnn.std():.4e}")
6. The True Hamiltonian: What HNN Should Learn
The closed-form Hamiltonian for the Kepler problem is:
By the virial theorem for bound Keplerian orbits, the time-averaged kinetic and potential energies satisfy \(\langle\text{KE}\rangle = -\frac{1}{2}\langle\text{PE}\rangle\), giving total energy \(E = -GM/(2a)\). For our benchmark parameters \((a=1, e=0.5, GM=1)\):
Benchmark Values
| Quantity | Formula | Numerical Value (\(a=1, e=0.5, GM=1\)) |
|---|---|---|
| Orbital energy \(E\) | \(-GM/(2a)\) | \(-0.500\) |
| Angular momentum \(L\) | \(\sqrt{GM\,a(1-e^2)}\) | \(\sqrt{0.75} \approx 0.866\) |
| Periapsis \(r_\min\) | \(a(1-e)\) | \(0.500\) |
| Apoapsis \(r_\max\) | \(a(1+e)\) | \(1.500\) |
| Orbital period \(T\) | \(2\pi\sqrt{a^3/GM}\) | \(\approx 6.283\) |
| Periapsis speed | \(\sqrt{GM(1+e)/(a(1-e))}\) | \(\sqrt{3}\approx1.732\) |
| Apoapsis speed | \(\sqrt{GM(1-e)/(a(1+e))}\) | \(\sqrt{1/3}\approx0.577\) |
Verification Code
# Verify HNN has learned H_true accurately
with torch.no_grad():
z_test = torch.tensor(np.column_stack([x_rk, y_rk, vx_rk, vy_rk]),
dtype=torch.float32)
H_learned = hnn_model(z_test).numpy().flatten()
# True Hamiltonian
r_rk = np.sqrt(x_rk**2 + y_rk**2)
H_true = 0.5*(vx_rk**2 + vy_rk**2) - GM/r_rk # should be -0.500 everywhere
print(f"H_true mean = {H_true.mean():.6f} std = {H_true.std():.2e}")
print(f"H_learned mean = {H_learned.mean():.6f} std = {H_learned.std():.2e}")
print(f"Correlation r² = {np.corrcoef(H_true, H_learned)[0,1]**2:.6f}")
A well-trained HNN will show \(r^2 > 0.999\) correlation between \(H_\theta\) and \(H_\text{true}\), with the learned value differing by at most a constant offset (which is physically irrelevant since only \(\nabla H\) enters the equations of motion).
7. Three-Way Comparison
We compare three approaches on the same Kepler benchmark (\(a=1, e=0.5, GM=1\), 27 training points from 1000-point ground truth, extrapolating to 1.5× the training window). Metrics: mean absolute error (MAE) on trajectory and drift in conserved quantities during extrapolation.
| Method | Train MAE | Extrapol. MAE | \(|\Delta E|\) (extrapol.) | \(|\Delta L|\) (extrapol.) | Energy Conserved? |
|---|---|---|---|---|---|
| Standard NN | \(0.00130\) | \(1.188\) | \(0.205\) — large | \(0.686\) — large | ✗ |
| Improved PINN | \(0.975\) ⚠️ | \(2.794\) ❌ worst | \(0.254\) — moderate | \(0.116\) ✅ best | ≈ (via soft ODE) |
| HNN 🏆 | \(0.00953\) | \(1.043\) ✅ best | \(0.667\) ‡ (see note) | \(0.458\) | ✓ \(H_\theta\) exactly conserved |
‡ HNN's \(|\Delta E|\) uses physical energy \(E = \frac{1}{2}v^2 - GM/r\) evaluated on the HNN trajectory. The learned \(H_\theta\) is a different scalar that IS exactly conserved; the discrepancy reflects that \(H_\theta \neq E\) exactly without additional supervision.
Property Comparison Cards
Standard NN
- Learns \(t\mapsto(x,y)\) directly
- No physics constraints
- ❌ Energy NOT conserved
- ❌ Angular momentum drifts
- ❌ Extrapolation fails badly
- ❌ Phase drift accumulates
- ✅ Fast training (no autograd)
Improved PINN
- Learns \(\tau\mapsto(x,y)\) with ODE penalty
- Soft physics: \(\lambda_\text{IC}=10\), \(\lambda_\text{phys}=1\)
- ✅ Best \(|\Delta L|\) = 0.116 — ODE enforces it
- ⚠️ Train MAE = 0.975 — stiff loss landscape
- ❌ Worst Extrapol. MAE = 2.794
- 📖 Physics loss competes with data loss
- ⚠️ Slow (2nd-order autograd)
Hamiltonian NN 🏆
- Learns \(H_\theta(\mathbf{z})\) — scalar energy
- Hard physics via architecture
- ✅ \(H_\theta\) conserved exactly by construction
- ✅ Best Extrapol. MAE = 1.043
- ✅ Symplectic structure built-in
- ✅ Generalizes to any IC on same E-shell
- ⚠️ \(H_\theta \neq E_\text{physical}\) without supervision
Figure: Eight-panel comparison — trajectories (train & extrapolation), x(t), y(t) error, energy E(t), angular momentum L(t), and training loss curves for all three methods. Generated by Kepler_PINN.py.
8. Discussion: Why PINNs Struggle with Kepler
The Conceptual Problem
The difficulty is not merely numerical — it is structural. Kepler orbits are the paradigmatic example of a Hamiltonian (conservative) dynamical system: they live on a precise energy shell in phase space, with two independent conservation laws (\(H\) and \(L_z\)) constraining the trajectory to a one-dimensional curve in 4D phase space.
- Hamiltonian systems have no dissipation. Standard neural networks have an implicit dissipative bias — they are universal approximators of functions, but the function space they explore during gradient descent tends to favor smooth, damping-like solutions. Without explicit symplectic constraints, the network will tend to slowly collapse the orbit inward or outward.
- Non-unique phase problem. The parameterization \(t\mapsto(x,y)\) is highly ambiguous: there are infinitely many valid trajectories (all with the correct orbit shape but different initial phases) that satisfy the ODE residual \(= 0\). Without a strong IC loss, the network has no way to select the correct orbit from this continuous family.
- PINN works best in limited regimes. PINNs succeed when: the time horizon is short, the system is weakly nonlinear, there is known dissipation (which breaks the phase ambiguity), and the ODE is not stiff. Kepler orbits violate all of these: they are periodic over long times, strongly nonlinear near periapsis, conservative, and near-singular at \(r\to0\).
- HNN has the correct inductive bias. By learning \(H_\theta\) instead of the trajectory, the HNN operates in a function space that is guaranteed to be symplectic. The symmetry is baked into the architecture: Hamilton's equations are antisymmetric in \((q, p)\), which means any \(H_\theta\) automatically satisfies \(\nabla H_\theta \cdot J\nabla H_\theta = 0\).
- Noether's theorem and neural networks. Conservation laws are the fingerprints of symmetries (Noether's theorem, 1915). The gravitational potential \(V\sim 1/r\) has time-translation symmetry (\(\to\) energy conservation) and rotational symmetry (\(\to\) angular momentum conservation). A PINN enforces the ODE, so if it works perfectly, these emerge automatically. But the HNN's symplectic architecture directly embeds the time-translation symmetry — it is not possible for the HNN to violate energy conservation even with imperfect training.
Architectural inductive bias matters more than loss engineering. Adding more physics loss terms (PINN improvements) is fighting an uphill battle against the fundamental mismatch between the dissipative function space explored by gradient descent and the conservative manifold of Keplerian orbits. HNN solves this by operating in a Hamiltonian-compatible function space from the outset.
9. Connection to Real Space Applications
The Kepler IVP is not an academic exercise — it is the foundation of modern astrodynamics. Physics-informed machine learning has direct applications in operational space missions:
🌍 Orbit Determination
Given 3 angular observations of an asteroid or satellite, determine the full 6-element orbital state (classical Gauss problem). Traditional methods (Gauss–Gibbs iteration) require dozens of iterations per solution. PINN/X-TFC approaches (Schiassi et al. 2023) solve the same IVP in a single forward pass after training, enabling real-time orbit determination.
🚀 Satellite Orbit Propagation
Ground-tracking stations need fast, accurate orbit propagation for collision avoidance and station-keeping. HNNs trained on perturbed Keplerian orbits (J2 perturbation, atmospheric drag) can propagate orbits 10–100× faster than full numerical integration while maintaining long-term energy accuracy.
☄️ Asteroid Tracking
Planetary defense requires rapid orbit determination from sparse photometric data (often only 3–5 observations). Physics-informed networks constrained by Kepler's laws can dramatically reduce the observation cadence needed for reliable orbit determination, improving early warning times.
🛸 Space Mission Design
Trajectory optimization for interplanetary missions requires solving boundary-value problems over Keplerian arcs (Lambert's problem). Differentiable HNNs enable gradient-based optimization of transfer trajectories, opening new possibilities for automated mission planning and fuel-optimal maneuver design.
🛰️ Maneuvering Satellite Tracking arXiv 2024
Ruprecht et al. (MIT Lincoln Laboratory, 2024) applied PINNs to track GEO satellites using continuous low-thrust electric propulsion — a regime where classical SGP4 completely fails. Their residual PINN learned only the unmodeled thrust \(\boldsymbol{a_T}(t)\) on top of the known Keplerian physics, reducing observation residuals from 123 arcsec → 1 arcsec and 5-day position error from 3860 km → 164 km.
📡 GNSS Error Prediction Open Source
GNSS_Error_Predictor extends the PINN idea to satellite clock errors and ephemeris errors. Its Physics-Informed Transformer encodes Kepler orbital periods and Allan clock variance directly into the attention mechanism, while a Neural Diffusion Model provides calibrated uncertainty estimates — predicting navigation errors 15 min to 24 hours ahead.
⚡ Key Insight from Ruprecht et al. 2024 — The Residual PINN Approach
Instead of learning the full dynamics from scratch, the DNN learns only the deviation from known physics (Cowell's formulation):
- The DNN only needs to approximate a small residual — far easier than learning all of orbital mechanics from angles-only observations.
- 30 angular observations over 2 days were enough to identify an acceleration of order \(10^{-8}\) km/s² (electric propulsion thrust).
- For our Kepler toy problem: if we added a small perturbation (J₂, drag, SRP), this approach would learn the perturbation while the Keplerian backbone keeps the orbit stable.
The X-TFC Method (Schiassi et al. 2023)
The Extreme Theory of Functional Connections (X-TFC) method combines:
- Theory of Functional Connections (TFC): A mathematical framework that analytically satisfies boundary and initial conditions by construction — no IC loss term needed, the IC is embedded in the functional representation.
- Extreme Learning Machines (ELM): Single-hidden-layer networks with randomly fixed input weights and only the output layer trained, enabling fast closed-form (least-squares) training.
Applied to Kepler's equation, X-TFC achieved solution times orders of magnitude faster than classical iterative methods, with machine-precision accuracy — demonstrating that physics-informed ML is approaching operational readiness for astrodynamics.
References
- Greydanus, S., Dzamba, M., & Yosinski, J. (2019). Hamiltonian Neural Networks. Advances in Neural Information Processing Systems (NeurIPS), 32. arXiv:1906.01563
- Schiassi, E., De Florio, M., D'Ambrosio, A., Mortari, D., & Furfaro, R. (2023). Physics-Informed Neural Networks and Functional Interpolation for Solving the Matrix Differential Riccati Equation & Kepler's Orbit Determination. In Advances in Space Research / Springer. DOI:10.1016/j.asr.2023.01.023
- Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686–707. DOI:10.1016/j.jcp.2018.10.045
- Noether, E. (1918). Invariante Variationsprobleme. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, 235–257. (The foundational paper proving conservation laws ↔ symmetries.)
- Ruprecht, J. D., Tierney, M., & Sullenberger, R. (2024). Physics-Informed Neural Networks for Satellite State Estimation. IEEE Aerospace Conference. MIT Lincoln Laboratory. arXiv:2403.19736 — Demonstrates residual PINN (Cowell's formulation) reducing GEO satellite tracking error from 3860 km → 164 km over 5-day propagation.
- Jha, A. (2024). GNSS Error Predictor — Hybrid Physics-Informed Neural Network for GNSS Error Prediction with Uncertainty Quantification. Open-source project, Apache 2.0. github.com/Amit-jha98/GNSS_Error_Predictor — Physics-Informed Transformer + Neural Diffusion Model for multi-horizon satellite clock & ephemeris error prediction.
10. Complete Script Structure
Below is the full Python script Kepler_PINN.py, organized into 12 labeled sections. Each section can be toggled independently for ablation studies.
"""
Kepler_PINN.py — Kepler orbit reconstruction: Standard NN vs Improved PINN vs HNN
Sections 1–12 labeled for easy reference.
"""
# ══════════════════════════════════════════════════════════════════════════════
# § 1 IMPORTS & CONFIGURATION
# ══════════════════════════════════════════════════════════════════════════════
import numpy as np
import torch
import torch.nn as nn
from scipy.integrate import odeint, solve_ivp
import matplotlib.pyplot as plt
SEED = 42
torch.manual_seed(SEED)
np.random.seed(SEED)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Physical parameters
GM, a, e = 1.0, 1.0, 0.5
x0 = a * (1 - e)
vy0 = np.sqrt(GM * (1 + e) / (a * (1 - e)))
T = 2 * np.pi * np.sqrt(a**3 / GM) # orbital period ≈ 6.283
T_SIM = 1.5 * T # simulate 1.5 orbits
# ══════════════════════════════════════════════════════════════════════════════
# § 2 GENERATE REFERENCE TRAJECTORY (RK45)
# ══════════════════════════════════════════════════════════════════════════════
def kepler_ode(state, t):
x, y, vx, vy = state
r = np.sqrt(x**2 + y**2)
return [vx, vy, -GM * x / r**3, -GM * y / r**3]
t_ref = np.linspace(0, T_SIM, 1000)
sol = odeint(kepler_ode, [x0, 0, 0, vy0], t_ref, rtol=1e-11, atol=1e-13)
x_ref, y_ref, vx_ref, vy_ref = sol.T
r_ref = np.sqrt(x_ref**2 + y_ref**2)
ax_ref = -GM * x_ref / r_ref**3
ay_ref = -GM * y_ref / r_ref**3
# Training vs extrapolation split (1 orbit train, 0.5 orbit extrapolate)
n_train = int(1000 * (T / T_SIM))
# ══════════════════════════════════════════════════════════════════════════════
# § 3 STANDARD NEURAL NETWORK (BASELINE)
# ══════════════════════════════════════════════════════════════════════════════
class StandardNN(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(1, 64), nn.Tanh(),
nn.Linear(64, 64), nn.Tanh(),
nn.Linear(64, 64), nn.Tanh(),
nn.Linear(64, 2) # output: (x, y)
)
def forward(self, t):
return self.net(t)
# Train with data loss only
nn_model = StandardNN().to(device)
nn_optim = torch.optim.Adam(nn_model.parameters(), lr=1e-3)
nn_sched = torch.optim.lr_scheduler.StepLR(nn_optim, step_size=2000, gamma=0.5)
# ══════════════════════════════════════════════════════════════════════════════
# § 4 IMPROVED PINN
# ══════════════════════════════════════════════════════════════════════════════
class PINNModel(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(1, 64), nn.Tanh(),
nn.Linear(64, 64), nn.Tanh(),
nn.Linear(64, 64), nn.Tanh(),
nn.Linear(64, 2) # output: (x, y)
)
def forward(self, tau): # tau ∈ [0, 1] (normalized time)
return self.net(tau)
def pinn_physics_loss(model, tau_c, T_sim, GM=1.0, eps=1e-3):
tau_c = tau_c.requires_grad_(True)
xy = model(tau_c)
x, y = xy[:, 0:1], xy[:, 1:2]
vx_τ = torch.autograd.grad(x.sum(), tau_c, create_graph=True)[0]
vy_τ = torch.autograd.grad(y.sum(), tau_c, create_graph=True)[0]
ax_τ = torch.autograd.grad(vx_τ.sum(), tau_c, create_graph=True)[0]
ay_τ = torch.autograd.grad(vy_τ.sum(), tau_c, create_graph=True)[0]
r = (x**2 + y**2).sqrt().clamp(min=eps)
T2 = T_sim**2
res_x = ax_τ + T2 * GM * x / r**3
res_y = ay_τ + T2 * GM * y / r**3
return (res_x**2 + res_y**2).mean()
# Hyperparameters
LAMBDA_PHYS = 1.0
LAMBDA_IC = 10.0
N_COLLOC = 500
# ══════════════════════════════════════════════════════════════════════════════
# § 5 HAMILTONIAN NEURAL NETWORK
# ══════════════════════════════════════════════════════════════════════════════
class HamiltonianNN(nn.Module):
def __init__(self, n_hidden=128, n_layers=3):
super().__init__()
layers = [nn.Linear(4, n_hidden), nn.Tanh()]
for _ in range(n_layers - 1):
layers += [nn.Linear(n_hidden, n_hidden), nn.Tanh()]
layers.append(nn.Linear(n_hidden, 1))
self.net = nn.Sequential(*layers)
def forward(self, z):
return self.net(z)
def time_derivative(self, z):
z_r = z.detach().requires_grad_(True)
H = self.net(z_r).sum()
dH = torch.autograd.grad(H, z_r, create_graph=True)[0]
return torch.stack([dH[:,2], dH[:,3], -dH[:,0], -dH[:,1]], dim=1)
# ══════════════════════════════════════════════════════════════════════════════
# § 6 BUILD TRAINING DATASETS
# ══════════════════════════════════════════════════════════════════════════════
# Common: normalize time to [0,1] for PINN
tau_train = torch.tensor(t_ref[:n_train] / T_SIM, dtype=torch.float32).unsqueeze(1).to(device)
xy_train = torch.tensor(np.column_stack([x_ref[:n_train], y_ref[:n_train]]),
dtype=torch.float32).to(device)
tau_all = torch.tensor(t_ref / T_SIM, dtype=torch.float32).unsqueeze(1).to(device)
xy_all = torch.tensor(np.column_stack([x_ref, y_ref]), dtype=torch.float32).to(device)
# HNN: phase-space state + time derivatives
z_train = torch.tensor(np.column_stack([x_ref[:n_train], y_ref[:n_train],
vx_ref[:n_train], vy_ref[:n_train]]),
dtype=torch.float32).to(device)
dzdt_train = torch.tensor(np.column_stack([vx_ref[:n_train], vy_ref[:n_train],
ax_ref[:n_train], ay_ref[:n_train]]),
dtype=torch.float32).to(device)
# IC tensors for PINN
ic_pos = torch.tensor([[x0, 0.0]], dtype=torch.float32).to(device)
ic_vel = torch.tensor([[0.0, vy0]], dtype=torch.float32).to(device)
tau_0 = torch.zeros(1, 1, dtype=torch.float32).to(device)
# ══════════════════════════════════════════════════════════════════════════════
# § 7 TRAIN STANDARD NN
# ══════════════════════════════════════════════════════════════════════════════
nn_model = StandardNN().to(device)
nn_optim = torch.optim.Adam(nn_model.parameters(), lr=1e-3)
nn_sched = torch.optim.lr_scheduler.StepLR(nn_optim, step_size=2000, gamma=0.5)
for epoch in range(8000):
nn_optim.zero_grad()
pred = nn_model(tau_train)
loss = nn.functional.mse_loss(pred, xy_train)
loss.backward()
nn_optim.step()
nn_sched.step()
# ══════════════════════════════════════════════════════════════════════════════
# § 8 TRAIN IMPROVED PINN
# ══════════════════════════════════════════════════════════════════════════════
pinn_model = PINNModel().to(device)
pinn_optim = torch.optim.Adam(pinn_model.parameters(), lr=1e-3)
pinn_sched = torch.optim.lr_scheduler.StepLR(pinn_optim, step_size=2000, gamma=0.5)
tau_colloc = torch.rand(N_COLLOC, 1).to(device) # random collocation points
for epoch in range(10000):
pinn_optim.zero_grad()
# Data loss
pred_train = pinn_model(tau_train)
loss_data = nn.functional.mse_loss(pred_train, xy_train)
# IC loss
pred_ic = pinn_model(tau_0)
loss_pos = nn.functional.mse_loss(pred_ic, ic_pos)
# IC velocity via autograd
tau_0_g = tau_0.requires_grad_(True)
xy_0 = pinn_model(tau_0_g)
vxy_pred = torch.autograd.grad(xy_0.sum(), tau_0_g, create_graph=True)[0] / T_SIM
loss_vel = nn.functional.mse_loss(vxy_pred, ic_vel)
loss_ic = loss_pos + loss_vel
# Physics loss
loss_phys = pinn_physics_loss(pinn_model, tau_colloc, T_SIM)
# Total
loss = loss_data + LAMBDA_PHYS * loss_phys + LAMBDA_IC * loss_ic
loss.backward()
pinn_optim.step()
pinn_sched.step()
# ══════════════════════════════════════════════════════════════════════════════
# § 9 TRAIN HNN
# ══════════════════════════════════════════════════════════════════════════════
hnn_model = HamiltonianNN(n_hidden=128, n_layers=3).to(device)
hnn_optim = torch.optim.Adam(hnn_model.parameters(), lr=1e-3)
hnn_sched = torch.optim.lr_scheduler.StepLR(hnn_optim, step_size=2000, gamma=0.5)
for epoch in range(10000):
hnn_optim.zero_grad()
dzdt_pred = hnn_model.time_derivative(z_train)
loss = nn.functional.mse_loss(dzdt_pred, dzdt_train)
loss.backward()
hnn_optim.step()
hnn_sched.step()
# ══════════════════════════════════════════════════════════════════════════════
# § 10 INTEGRATE HNN WITH SCIPY
# ══════════════════════════════════════════════════════════════════════════════
hnn_model.eval()
def hnn_rhs(t, state):
with torch.enable_grad():
z = torch.tensor(state, dtype=torch.float32).unsqueeze(0).requires_grad_(True)
H = hnn_model(z).sum()
dH = torch.autograd.grad(H, z)[0].squeeze().detach().cpu().numpy()
return [dH[2], dH[3], -dH[0], -dH[1]]
sol_hnn = solve_ivp(hnn_rhs, [0, T_SIM], [x0, 0, 0, vy0],
t_eval=t_ref, method='RK45', rtol=1e-9, atol=1e-11)
# ══════════════════════════════════════════════════════════════════════════════
# § 11 EVALUATE CONSERVATION LAWS
# ══════════════════════════════════════════════════════════════════════════════
def conservation_metrics(x, y, vx, vy, label):
r = np.sqrt(x**2 + y**2)
E = 0.5 * (vx**2 + vy**2) - GM / r
L = x * vy - y * vx
print(f"{label}: ΔE = {E.std():.4e} | ΔL = {L.std():.4e}")
return E, L
# RK45 reference
E_ref, L_ref = conservation_metrics(x_ref, y_ref, vx_ref, vy_ref, "RK45 ref")
# HNN
E_hnn, L_hnn = conservation_metrics(
sol_hnn.y[0], sol_hnn.y[1], sol_hnn.y[2], sol_hnn.y[3], "HNN ")
# ══════════════════════════════════════════════════════════════════════════════
# § 12 PLOT RESULTS
# ══════════════════════════════════════════════════════════════════════════════
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
with torch.no_grad():
nn_pred = nn_model(tau_all).cpu().numpy()
pinn_pred = pinn_model(tau_all).cpu().numpy()
hnn_x, hnn_y = sol_hnn.y[0], sol_hnn.y[1]
for ax, (x_p, y_p), title, color in zip(
axes,
[(nn_pred[:,0], nn_pred[:,1]),
(pinn_pred[:,0], pinn_pred[:,1]),
(hnn_x, hnn_y)],
['Standard NN', 'Improved PINN', 'HNN'],
['tab:red', 'tab:orange', 'tab:green']):
ax.plot(x_ref, y_ref, 'k--', lw=1.5, label='RK45 (truth)')
ax.plot(x_p, y_p, color=color, lw=2, label=title)
ax.scatter([0], [0], c='gold', s=120, zorder=5, label='Star')
ax.set_aspect('equal')
ax.set_title(title, fontsize=13, fontweight='bold')
ax.legend(fontsize=9)
ax.set_xlabel('x [AU]'); ax.set_ylabel('y [AU]')
plt.tight_layout()
plt.savefig('kepler_comparison.svg', bbox_inches='tight')
plt.show()
Back to: Pendulum PINN | PINN Overview | ML Overview