The biggest changes for me in 2025 were:
- Health: I was diagnosed as pre-diabetic. In response, I increased the amount of vegetables in my diet and drastically reduced the amount of sugars and carbohydrates. As a result, I lost 30 pounds, and my pre-diabetes is in remission. So far so good!
- AI Agents: I started the year by copying-and-pasting code into chatbots. Then I wrote my own primitive agents that called LLM APIs. By the end of the year I was heavily using commercial agents.
As I write this in late December, I find myself cycling between Antigravity, Gemini CLI, Jules, and several internal-to-my-company agents. Each one has its strengths. All of them regularly receive boosts in performance as better prompts and better base models are introduced.
Advent of Code
This year I thoroughly investigated the ability of LLMs to solve Advent of Code puzzles.
At the beginning of the year, I had to write my own agentic harness to solve the Advent of Code puzzles. But over the year, the standard agents available to me started to support running tools and code. By the end of the year multiple agents were able to solve all the puzzles by themselves, with only a simple prompt that explained the rules of the contest. To minimize the load on the Advent of Code servers, I cached all the puzzle data locally.
| Agent | Model | Wall Time (minutes) | Notes |
|---|---|---|---|
| Antigravity | gemini-3-flash-preview | 32 | Had to be encouraged to continue. Struggled to solve 2025-10-2, but eventually got it. |
| Gemini CLI | gemini-3-flash-preview | 13 | One shot solved all the puzzles. |
| Jules | Gemini 3 | 40 | Had to be encouraged to continue. |
That level of performance means that Advent of Code is “saturated” as a benchmark. All the agents can solve all the problems using a wide variety of base models and implementation languages. The only topic left to explore is the agents’ ability to navigate the Advent of Code website and create their own contest accounts. I have no doubt several agents could accomplish that, but it’s orthogonal to solving the actual puzzles. And I don’t want to bother the Advent of Code website with synthetic activity.
Programming Languages
Because of the power, speed, and flexibility of agents, I find myself using English as my programming language. I direct the LLMs to use whatever implementation language is appropriate for the task. This year I vibe-coded the most in Python, but for many tasks I’m starting to shift towards more strongly-typed languages like Go and Rust. In my experience, strongly-typed languages reduce the number of rounds of debugging required to implement features.
Looking forward to 2026
I expect the performance of LLMs and agents to keep improving.
I hope to keep improving my ability to use LLMs and agents to solve interesting problems.
Best wishes to you and your family for a happy and prosperous 2026!