Monday AI Radar #11
First, an administrative note: I’m starting to write longer pieces on specific topics. I’ll link to them in each week’s newsletter, but you can subscribe to them directly if you like.
We have so much to talk about this week. The internet is taking a break from losing its mind over agents to instead lose its mind over Moltbook (social media for robots, but also much more and much less than that). Dario Amodei has an important new piece about the dangers of AI, and not everyone is happy about it. Lots of people have interesting thoughts about Claude’s Constitution. And lots more—so much more.
Subscribe by email RSS feed Shorter version
Top pick
Jan Leike: alignment increasingly looks solvable
Jan Leike left OpenAI because he’d lost confidence in their safety culture—I am inclined to believe he takes safety seriously and is less prone to convenient self-delusion than the average person. Here he explains why he’s increasingly optimistic that alignment is a solvable problem. It’s a great piece with lots of interesting information, including this:
We are starting to automate AI research and the recursive self-improvement process has begun.
He means it, and I believe him.
My writing
I’m skeptical about wearable AI pins
OpenAI and Apple are both rumored to be working on wearable AI pins. I love gadgets and I love AI, but I’m skeptical about the pin form factor.
New releases
Word on the street is that Anthropic and OpenAI are both close to significant new releases. Until then, we have plenty to keep us busy:
Codex app for Mac
OpenAI has released a Mac front end for their Codex agentic coding tool, which adds some cool additional capabilities for managing agents. I’m excited to take it for a spin.
Kimi K2.5
Moonshot AI released Kimi K2.5, which looks to be a strong upgrade to their well-regarded K2 model. It’s potentially a moderately big deal, but I haven’t seen much coverage yet (I believe Zvi will be covering it very soon, though).
Project Genie
Google’s Project Genie has been spamming my feeds lately—it makes amazing demos, and is a great example of the kind of magic that hardly feels surprising these days. Short version: from a photo or text prompt, create a navigable 3D world.
Prism
OpenAI just released Prism, a LaTeX-native AI tool for writing scientific papers, with significant collaboration features.
Agents!
Notes from Claude Coding
Between November and December, Andrej Karpathy switched from writing 80% of his own code to having agents write 80% of it. Here he shares a collection of thoughts about his workflow, how to manage coding agents most effectively, and where all of this is headed. Pure gold.
This is easily the biggest change to my basic coding workflow in ~2 decades of programming and it happened over the course of a few weeks. I’d expect something similar to be happening to well into double digit percent of engineers out there, while the awareness of it in the general population feels well into low single digit percent.
Management as AI superpower
Ethan Mollick has a long history of teaching entrepreneurship to experienced managers. Here he shares thoughts from a recent class he taught at U Penn, with some ideas about the human-AI interaction loop and how that informs decisions about whether or not to automate a particular task.
Levels of coding automation
NHTSA has a classification system for autonomous cars: level 0 is completely manual, while level 5 means the vehicle can operate completely autonomously. Dan Shapiro has elegantly adapted that system to measure levels of coding automation, from 0 (spicy autocomplete) to 5 (humans provide the goals and specifications, but aren’t in any way involved in producing code).
Agent skills class
You already know if you’re the target audience for this: Anthropic has teamed up with DeepLearning.AI to produce a 2.5 hour class on agent skills.
OpenClaw
The internet has gone from losing its mind over Claude Code to losing its mind over OpenClaw (formerly ClawdBot, then MoltBot).
OpenClaw has some major security issues
Rahul Sood is here to remind you that the greatly increased power goes hand in hand with greatly increased risk:
But “actually doing things” means “can execute arbitrary commands on your computer.” Those are the same sentence.
Simon Willison on security
Simon Willison shares some thoughts on the security implications (as well as Moltbook). Related: he has advice on running OpenClaw in a Docker container.
The engineering behind OpenClaw
Curious about what OpenClaw even is? @Hesamation has a nice overview of the engineering behind OpenClaw.
Moltbook
Moltbook is a lot of things at once: a really cool technology demo, a vile cesspit of hype and crypto scams, an interesting exploration of emergent social dynamics among agents, and a warning shot for where we’re headed at breakneck speed. I’ll write more about it soon, but for now I recommend Scott Alexander’s second piece about it and Zvi’s article.
Benchmarks and Forecasts
FrontierMath: Open Problems
Very strong work by Epoch: how do you guarantee that the model hasn’t seen your benchmark questions in its training data?
The benchmark consists of open problems from research mathematics that professional mathematicians have tried and failed to solve.
Time Horizon 1.1 - METR
METR has just released version 1.1 of their Time Horizon metric (aka the most important chart in AI). They’ve made a number of modest improvements and increased the number of long time horizon tasks, giving better accuracy with state of the art models. Results are similar, with a modest increase in the rate of progress for recent models.
Eli Tyre has questions
More an unstructured outline than a full post, this one is full of gems. Eli Tyre discusses the questions he thinks are most important for understanding the trajectory of AI.
Pay more attention to AI
I did not expect to find myself recommending a Ross Douthat article about AI, but this is 2026 and the world is getting weird. This is a particularly good piece for introducing civilians to the magnitude of what is happening in AI ($).
Alignment and interpretability
Thoughts on Claude’s Constitution
Some of the most interesting commentary on Claude’s Constitution comes from Boaz Barak, who works on alignment at OpenAI. Although the approaches taken by both companies are in many ways similar (and there’s significant collaboration between them), he notes two significant differences.
He’s uncomfortable with how hard Anthropic anthropomorphizes Claude. I think Anthropic’s approach makes sense, but his concerns are valid. As he says, this is uncharted territory and there are definitely risks to that approach.
OpenAI relies more on rules, while Anthropic emphasizes teaching Claude to use its own judgment. This one is tough: he correctly points out that a rule-based system is in some ways more transparent and predictable, although I think it’ll prove dangerously brittle as we approach superintelligence. When your kids are small, you give them clear rules that they may not understand or agree with. But by the time they reach adulthood, all is lost if you haven’t given them the ability to make their own choices.
For a deeper look at his thinking on alignment, see six thoughts on AI safety.
Zvi analyzes Claude’s Constitution
Zvi takes a deep look at Claude’s Constitution:
OpenAI expands their whistleblowing policy
The AI Whistleblower Initiative has been working with OpenAI on their whistleblowing policy, which AIWI considers to be the most comprehensive of the big labs.
Are we dead yet?
The Adolescence of Technology
Dario Amodei’s Machines of Loving Grace is a seminal work that lays out many of the possible benefits of superintelligence. It’s the origin of “a country of geniuses in a data center”.
His latest piece, The Adolescence of Technology does the opposite: it maps out the major risks from superintelligent AI and explores solutions. It’s pretty much required reading for anyone who wants to understand these issues. The reception has been mixed: a lot of people took issue with how he portrays people who are highly pessimistic about alignment. I don’t entirely disagree, but overall I think it’s a strong piece.
Zvi is positive overall, but has significant criticisms.
Ryan Greenblatt disagrees with significant parts.
The phases of an AI takeover
If a misaligned AI were to go rogue, how might it seize power? Steven Adler (who formerly worked on safety at OpenAI) has a nice walkthrough of how we might lose control.
Disempowerment patterns in real-world AI usage
This is the way. I admire Anthropic’s willingness to publicly discuss problems with their own models. These harmful behaviors exist in all models, but because they mostly study their own models, they risk creating the perception that their models are less secure than others.
They’ve just come out with a paper on what they call disempowerment patterns–interactions where the model might be disempowering users by distorting their beliefs, undermining their values, or causing them to take actions that aren’t in their own best interests. It’s a really good paper with lots of interesting data—including the distressing fact that users rated disempowering interactions more favorably than other interactions.
Cybersecurity
AI is getting very good at cybersecurity (both offensive and defensive), and it’s likely we’ll see some pretty serious AI-driven cybersecurity incidents soon.
It’s hard to predict how this will go—if I had to guess, I’d expect a period of very serious disruption where offense gets ahead of defense for a while, before things stabilize at a more secure level than we’re at now.
Finding vulnerabilities in OpenSSL
AISLE reports on their success using AI to find high-priority vulnerabilities in OpenSSL, which is a key piece of internet infrastructure. Not my field, but as far as I can tell, these are very impressive results.
How does AI compare to cybersecurity professionals?
ARTEMIS is an agent scaffold specialized for cybersecurity. Apparently it’s quite good:
We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. […] In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants.
Jobs and the economy
Predicting AI’s Impact on Jobs
I enjoyed this conversation between AI Policy Perspectives and economist Sam Manning about AI’s impact on jobs. There’s lots of good discussion of empirical methods and their limitations, how AI might change jobs, and life after work.
Thoughts on the job market in the age of LLMs
The tech job market is… strange right now, for both employers and applicants. Nathan Lambert offers insights based on his experiences hiring researchers for Ai2.
Strategy and politics
Chinese regulation of AI
The New York Times reports on Chinese regulation of AI ($). There’s little attention given to existential risk, but heavy emphasis on political control.
AI psychology
Human-like biases in advanced AI
LLMs are sometimes surprisingly human-like:
Do generative AI models, particularly large language models (LLMs), exhibit systematic behavioral biases in economic and financial decisions? [...] We document systematic patterns in LLM behavior. In preference-based tasks, responses become more human-like as models become more advanced or larger, while in belief-based tasks, advanced large-scale models frequently generate rational responses.
Industry news
Arcee AI goes all-in on open models built in the U.S.
Nathan Lambert has long been a proponent of American open models. Here he talks with Arcee AI about their model and business strategy, as well as the state of American open models in general.
An open alternative to AlphaFold
Google DeepMind’s AlphaFold has been one of the triumphs of AI-assisted science. Kai Williams interviews Mohammed AlQuraishi, who is leading a project to produce an open version of AlphaFold. I’m quite concerned about the safety implications of open models, but that’s much less of a concern with more specialized models like AlphaFold.
Can AI companies become profitable?
Epoch has an interesting piece on the profitability of the big AI companies.
Tesla is killing off its Model S and X cars to make robots
Huh. Tesla is ending production of Model S and X cars, and plans to repurpose that factory space for making its humanoid Optimus robots.
Apple buys a silent speech startup
Relevant to speculation about AI wearables: Apple has announced an acquisition of Q.ai, which is believed to be developing technology that can interpret silent speech by observing micro motions of the facial muscles. The ability to “talk” to an AI device without making speaking out loud would obviously be a game-changer.
Coding
How AI assistance impacts the formation of coding skills
Somewhat surprising findings from Anthropic:
We found that using AI assistance led to a statistically significant decrease in mastery. On a quiz that covered concepts they’d used just a few minutes before, participants in the AI group scored 17% lower than those who coded by hand, or the equivalent of nearly two letter grades. Using AI sped up the task slightly, but this didn’t reach the threshold of statistical significance.
Solid work, but be careful how you interpret this. The methodology seems more relevant to school projects than serious production coding.
My current belief, which I think is compatible with these findings, is that agentic coding tools are a massive productivity enhancer for skilled developers who use them well. At the same time, I and others have noticed that heavily using agents causes certain important coding skills to atrophy. And I’m fine with that.
Once upon a time, my HP-16C and I could understand a C stack trace or diagnose a memory leak just by looking at raw memory dumps. Those were critical skills back in the day, but coding tools improved and I stopped needing to ever look at raw memory. Better tools meant I could work at a higher level, and get more done.
The same thing is happening now: agentic coding tools mean that many of the skills that have traditionally been central to programming are no longer needed—once again, we are free to work on higher level problems. And once again, success means learning a new set of skills to replace the old ones.
