Intelligence is foundation
Subscribe
  • Luma
  • About
  • Sources
  • Ecosystem
  • Nura
  • Marbl Codes
00:00
Contact
[email protected]
Connect
  • YouTube
  • LinkedIn
  • GitHub
Legal
Privacy Cookies Terms
  1. Home›
  2. Featured›
  3. Builders & Makers›
  4. Evolution Strategies: The Old Optimisation Method Competing With RL
Builders & Makers Monday, 4 May 2026

Evolution Strategies: The Old Optimisation Method Competing With RL

Share: LinkedIn
Evolution Strategies: The Old Optimisation Method Competing With RL

Evolution Strategies (ES) isn't new. It's been around since the 1970s - a way to optimise systems by mutating parameters, testing variants, and keeping what works. Black-box optimisation without gradients. For decades, it lived in the background of AI research, useful in niche cases but overshadowed by gradient-based methods. Now it's back, and it's competitive with reinforcement learning for fine-tuning large language models.

The reason? ES doesn't need perfect credit assignment. Reinforcement learning struggles when it's hard to tell which action caused which outcome - especially in long sequences where cause and effect are distant. ES sidesteps that problem entirely by treating the model as a black box. Perturb the weights, test performance, keep the good mutations. No gradients required.

Why This Matters for Post-Training

Fine-tuning LLMs after pre-training is messy. You're often optimising for fuzzy objectives - things like "generate more helpful responses" or "follow instructions better". These are hard to capture in a clean loss function. Reinforcement learning from human feedback (RLHF) is the standard approach, but it's complicated. You need reward models, policy gradients, careful tuning to avoid instability.

Evolution Strategies offers a simpler path. EGGROLL, a recent implementation, makes ES GPU-efficient by using low-rank perturbations. Instead of mutating millions of parameters individually, it perturbs a small subspace and projects those changes across the model. This keeps memory overhead low and makes ES viable at the scale of modern LLMs.

The trade-off is that ES is sample-inefficient. You need to test many variants to find good ones. But in post-training scenarios - where you're fine-tuning on specific tasks with clear evaluation metrics - that's often acceptable. You're not training from scratch. You're adjusting a pre-trained model, and ES can explore that adjustment space effectively without needing the infrastructure complexity of RLHF.

When to Use ES Over RL

Evolution Strategies works best when:

Credit assignment is hard. If your task involves long sequences where it's unclear which part of the output caused success or failure, gradients become noisy. ES doesn't care - it evaluates the whole output and adjusts accordingly.

Your reward function is simple but non-differentiable. Maybe you're optimising for human preference scores, or task completion rates, or some other metric that doesn't have clean gradients. ES treats the reward as a black box and optimises directly.

You want to avoid RL infrastructure. RLHF requires reward models, policy networks, value functions, and careful hyperparameter tuning. ES is conceptually simpler - generate variants, test them, keep the best ones. Less moving parts.

The downside is sample efficiency. RL can learn from fewer examples when gradients are informative. ES needs more evaluations because it's exploring blindly. But for tasks where evaluation is cheap and gradients are messy, that trade-off works.

What This Unlocks

EGGROLL's low-rank perturbation approach makes ES practical for large models. Previously, mutating millions of parameters was prohibitively expensive in both memory and compute. By constraining mutations to a low-dimensional subspace, EGGROLL keeps costs manageable while still exploring effectively.

This opens up post-training workflows that don't depend on RLHF. You can fine-tune models for specific tasks using simpler infrastructure. You can optimise for objectives that are hard to express as differentiable loss functions. And you can do it without needing deep RL expertise on your team.

Evolution Strategies won't replace gradient-based methods entirely. But for a specific class of problems - post-training tasks with fuzzy objectives and hard credit assignment - it's proving competitive. And the simplicity matters. Less infrastructure complexity means more teams can experiment with fine-tuning without needing RL specialists.

Old methods don't die. They just wait for the right moment to be useful again.

More Featured Insights

Robotics & Automation
The Manufacturing Problem Isn't the AI - It's the Hardware
Voices & Thought Leaders
GPU Prices Just Spiked 114% - And That's Not the Real Problem

Video Sources

AI Engineer
Context Is the New Code: Engineering the Prompt Layer
AI Engineer
Mergeable by Default: Building the Context Engine
AI Engineer
TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM
Theo (t3.gg)
Microsoft and OpenAI Partnership Ends (Amazon Benefits)

Today's Sources

DEV.to AI
Evolution Strategies: A New Way to Fine-Tune LLMs at Scale
Towards Data Science
Inference Scaling: Why Reasoning Models Raise Your Compute Bill
Hacker News Best
Agentic Coding Is a Trap
The Robot Report
Why Physical AI Is the Real Manufacturing Revolution
The Robot Report
Closing the Latency Gap: Why Physical AI Requires Edge-First Architectures
The Robot Report
Launchpad Build AI Launches Manufacturing Language Model for Automation Design
ROS Discourse
Open-RMF Zones Feature: Dynamic Facility Management for Robots
ROS Discourse
RVizSplat: 3D Gaussian Splatting Visualization for ROS 2
Azeem Azhar
Data to Start Your Week: AI Boom, Nowhere Near the Ceiling
Ben Thompson Stratechery
Google Earnings, Meta Earnings: Different Paths to Monetization
Addy Osmani
Agent Skills: Encoding Senior Engineer Behavior Into AI Workflows
Gary Marcus
Have LLMs Improved Patient Outcomes? Evidence Suggests Otherwise

About the Curator

Richard Bland
Richard Bland
Founder, Marbl Codes

27+ years in software development, curating the tech news that matters.

Subscribe RSS Feed
View Full Digest Today's Intelligence
Richard Bland
About Sources Privacy Cookies Terms Thou Art That
MEM Digital Ltd t/a Marbl Codes
Co. 13753194 (England & Wales)
VAT: 400325657
3-4 Brittens Court, Clifton Reynes, Olney, MK46 5LG
© 2026 MEM Digital Ltd