← Field Notes · Panoptia · February 2026

Something Big Is Happening

February 9, 2026 · David Latz

80M+ views · Fortune special edition · Wikipedia entry within 2 weeks

TL;DR

AI agents now autonomously complete multi-hour expert tasks. The capability curve doubles every 4–7 months. Shumer compares this moment to the 'this seems overblown' phase of Covid — but with far greater implications.

Reasoning Seed

Tension: If the doubling rate holds — who decides which expert tasks fall next?

Lab context: Directly relevant to how fractional teams structure projects: when AI agents autonomously complete multi-hour expert tasks, this changes team sizes, roles, and cadence.

Key Insights

1 — The February 2026 Quality Leap: A New Era

By late 2025, according to Shumer, the best engineers had already delegated the majority of their coding work to AI. On February 5, 2026, models arrived that “make everything before look like a different epoch.” Anyone who hasn’t tried AI in recent months wouldn’t recognize today’s state of the art.

2 — METR Data: The Doubling Rate Is Accelerating

METR measures how long real-world tasks take that a model can solve end-to-end without human help. A year ago: ~10 minutes. Then 1 hour, then several hours. The latest result (Claude Opus 4.5, Nov 2025): tasks that experts take nearly 5 hours to complete. Doubling rate: ~7 months, trending toward 4 months.

3 — AI Builds Itself: GPT-5.3 Codex

OpenAI wrote in the technical documentation for GPT-5.3 Codex (Feb 5, 2026): “Our first model that was instrumental in creating itself.” Early versions debugged their own training and managed deployment. For Shumer, this is a symbolically decisive threshold — self-improving systems have arrived.

4 — Judgment, Not Just Correctness

The latest models make decisions that feel like judgment — “an intuitive sense for the right call, not just the technically correct one.” Shumer describes his own workflow: he formulates in plain English what he wants, leaves for 4 hours, and comes back to finished output — not a draft, but the final product.

5 — The Covid Analogy

“I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid.” He explicitly addresses the text to “non-tech friends and family” — making it accessible, but also vulnerable to accusations of alarmism.

6 — Recommendation: Experiment, Now

Core message on CNBC: “People in the workforce should start to use and experiment with AI tools so they can understand what’s coming.” He implies that access to premium models becomes a differentiating factor — those who use paid tools will be faster than those who don’t.

Critical Assessment

What Holds Up

The capability curve is real and data-backed (METR)
The self-improvement threshold at GPT-5.3 is documented, not speculative
The call to experiment is pragmatic and responsible
Fortune, Microsoft, DEV Community confirm: “The conversation the industry needed”

What Needs Context

Conflict of interest: Shumer is an AI CEO — Forbes calls parts of the text “a sales pitch”
Tone: Fortune criticizes “doomsday packaging” that stifles innovative energy
Track record: The Guardian recalls his “world’s top open-source model” claim, which didn’t hold up
Agency question: DEV Community emphasizes — there is “still a human hand on the tiller.” Trajectory depends on human decisions (funding, regulation, infrastructure)

Discussion Questions for the Next Lab

01 Matching Our Own Experience: Does the described quality curve align with what we see in our projects? Where are the gaps between Shumer’s portrayal and our reality?

02 Service Model Implications: If 5-hour tasks become autonomously solvable — what changes in pricing, staffing, and scoping of our fractional engagements?

03 Judgment vs. Craft: Shumer’s “AI now has judgment” thesis — does this apply to design decisions? Where does human judgment remain irreplaceable?

04 Client Enablement: How do we prepare our clients for these shifts without falling into the alarmism criticized by Fortune?

Sources

Glossary

METR An organization that measures AI model capabilities through real-world tasks. The metric captures how long a task takes that a model can solve autonomously — without human assistance.

Self-Improvement The ability of an AI system to contribute to its own improvement — such as debugging its own training or managing its own deployment. GPT-5.3 Codex is considered the first documented example.

Doubling Rate The time interval at which measurable AI model capabilities double. According to METR data, currently at approximately 7 months, trending toward 4 months.

Curated by David Latz · Panoptia February 2026

Related Field Notes

LLM Knowledge Bases: Why Everyone Lands on the Same Stack

Apr 3, 2026 · Andrej Karpathy

Claude Code's Source Code Leaked — What the Architecture Reveals About the Future of AI Agents

Apr 1, 2026 · Carl Franzen (VentureBeat)

Agent Memory: Why Your AI Has Amnesia and How to Fix It

Mar 27, 2026 · Casius Lee (Oracle)