Something Big Is Happening
80M+ views · Fortune special edition · Wikipedia entry within 2 weeks
TL;DR
AI agents now autonomously complete multi-hour expert tasks. The capability curve doubles every 4–7 months. Shumer compares this moment to the 'this seems overblown' phase of Covid — but with far greater implications.
Reasoning Seed
A Reasoning Seed is a structured prompt you can copy into your AI reasoning tool (Claude, ChatGPT, Obsidian, Notion). It contains the article's thesis, its core tension, and our lab context — ready for your own analysis.
Click the button below to copy as Markdown. More ways to interact with this content in the discussion questions below.
Tension: If the doubling rate holds — who decides which expert tasks fall next?
Lab context: Directly relevant to how fractional teams structure projects: when AI agents autonomously complete multi-hour expert tasks, this changes team sizes, roles, and cadence.
Key Insights
1 — The February 2026 Quality Leap: A New Era
By late 2025, according to Shumer, the best engineers had already delegated the majority of their coding work to AI. On February 5, 2026, models arrived that “make everything before look like a different epoch.” Anyone who hasn’t tried AI in recent months wouldn’t recognize today’s state of the art.
2 — METR Data: The Doubling Rate Is Accelerating
METR measures how long real-world tasks take that a model can solve end-to-end without human help. A year ago: ~10 minutes. Then 1 hour, then several hours. The latest result (Claude Opus 4.5, Nov 2025): tasks that experts take nearly 5 hours to complete. Doubling rate: ~7 months, trending toward 4 months.
3 — AI Builds Itself: GPT-5.3 Codex
OpenAI wrote in the technical documentation for GPT-5.3 Codex (Feb 5, 2026): “Our first model that was instrumental in creating itself.” Early versions debugged their own training and managed deployment. For Shumer, this is a symbolically decisive threshold — self-improving systems have arrived.
4 — Judgment, Not Just Correctness
The latest models make decisions that feel like judgment — “an intuitive sense for the right call, not just the technically correct one.” Shumer describes his own workflow: he formulates in plain English what he wants, leaves for 4 hours, and comes back to finished output — not a draft, but the final product.
5 — The Covid Analogy
“I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid.” He explicitly addresses the text to “non-tech friends and family” — making it accessible, but also vulnerable to accusations of alarmism.
6 — Recommendation: Experiment, Now
Core message on CNBC: “People in the workforce should start to use and experiment with AI tools so they can understand what’s coming.” He implies that access to premium models becomes a differentiating factor — those who use paid tools will be faster than those who don’t.
Critical Assessment
What Holds Up
- The capability curve is real and data-backed (METR)
- The self-improvement threshold at GPT-5.3 is documented, not speculative
- The call to experiment is pragmatic and responsible
- Fortune, Microsoft, DEV Community confirm: “The conversation the industry needed”
What Needs Context
- Conflict of interest: Shumer is an AI CEO — Forbes calls parts of the text “a sales pitch”
- Tone: Fortune criticizes “doomsday packaging” that stifles innovative energy
- Track record: The Guardian recalls his “world’s top open-source model” claim, which didn’t hold up
- Agency question: DEV Community emphasizes — there is “still a human hand on the tiller.” Trajectory depends on human decisions (funding, regulation, infrastructure)
Discussion Questions for the Next Lab
01 Matching Our Own Experience: Does the described quality curve align with what we see in our projects? Where are the gaps between Shumer’s portrayal and our reality?
02 Service Model Implications: If 5-hour tasks become autonomously solvable — what changes in pricing, staffing, and scoping of our fractional engagements?
03 Judgment vs. Craft: Shumer’s “AI now has judgment” thesis — does this apply to design decisions? Where does human judgment remain irreplaceable?
04 Client Enablement: How do we prepare our clients for these shifts without falling into the alarmism criticized by Fortune?
Sources
- Original: Matt Shumer — Something Big Is Happening
- Fortune — Something big is happening in AI
- Fortune — Counterpoint: the only thing he got right
- DEV Community — A Response
- Wikipedia — Something Big Is Happening
Glossary
METR An organization that measures AI model capabilities through real-world tasks. The metric captures how long a task takes that a model can solve autonomously — without human assistance.
Self-Improvement The ability of an AI system to contribute to its own improvement — such as debugging its own training or managing its own deployment. GPT-5.3 Codex is considered the first documented example.
Doubling Rate The time interval at which measurable AI model capabilities double. According to METR data, currently at approximately 7 months, trending toward 4 months.
Curated by David Latz · Panoptia February 2026
Related Field Notes
LLM Knowledge Bases: Why Everyone Lands on the Same Stack
Apr 3, 2026 · Andrej Karpathy
Claude Code's Source Code Leaked — What the Architecture Reveals About the Future of AI Agents
Apr 1, 2026 · Carl Franzen (VentureBeat)
Agent Memory: Why Your AI Has Amnesia and How to Fix It
Mar 27, 2026 · Casius Lee (Oracle)