This week’s Tech Deep Dive looks at where LLMs fall short, using a deceptively simple example: the Subset Sum Problem.
The Problem
Imagine you have a group of T people that you need to allocate across N houses. Each house has a different capacity, and your goal is to pick the right combination of houses to fit everyone perfectly.
Mathematically, this turns into the subset sum problem, finding a subset of numbers that add up to a target (in this case, the total number of people).
Sounds easy? It’s not.
Subset sum is what computer scientists call NP-complete, meaning there’s no known efficient (polynomial-time) algorithm to solve it. The only guaranteed way is to check every possible combination... However, the number of combinations grows exponentially with the size of your list.
Even with just 50 or 60 elements, a brute-force approach could take years on a normal computer.
Why This Matters
NP problems show up everywhere, from the travelling salesman problem (TSP) to routing and resource allocation. These are real-world challenges that can’t easily be cracked by pattern recognition alone.
And that’s where LLMs hit their limits.
Testing LLMs vs Classical Methods
In a recent internal experiment, we compared three approaches to solving a version of this problem:
GPT (LLM-based reasoning)
Google OR-Tools (CPSAT solver)
A dynamic programming method
The results were clear:
For small or medium-sized cases (e.g. 100 houses and 1,000 people), GPT gave reasonable answers in about 30 seconds.
As problem size grew, response times shot up (up to four minutes) and accuracy dropped.
For large instances, API calls simply timed out.
Meanwhile, the solver and dynamic programming methods handled large cases within seconds, remained stable, and didn’t require GPUs or API calls.
The solver even provided partial solutions when interrupted, something LLMs can’t yet do.
The Takeaway
LLMs are powerful reasoning engines, but they’re not all-purpose problem solvers.
When it comes to hard computational problems, especially those with exponential complexity, traditional algorithms and well-designed heuristics still win.
Sometimes, the best AI solution isn’t about scaling up the model it’s about picking the right tool for the job.








