Anthropic Amin built a model too risky to release
Ai inovation

Anthropic Amin built a model too risky to release

Admin
Admin
3 min read

Anthropic built a model too risky to release

Hey folks, Keshav here. Ben is at AI Engineer this week, so I’m covering the intro.

A mis-timed blog last week leaked Anthropic’s next model - Claude Mythos. Well, it is real and has massive improvements on benchmarks over Opus 4.6:

  • 53.4% → 77.8% on SWE-bench Pro

  • 65.4% → 82% on Terminal-Bench 2.0

  • but we are not getting access to it anytime soon. Why? because it is really good at finding and exploiting software vulnerabilities. On Firefox exploit generation, Opus managed 2 working exploits out of hundreds of attempts. Mythos hit 181.

    It found many-decades-old bugs in critical software projects like OpenBSD (27-year-old bug), FFmpeg (16-year-old bug) and more.

    Instead of releasing it publicly, Anthropic is giving 12 companies access to a preview version of Mythos under “Project Glasswing” to find vulnerabilities in critical software. Anthropic is committing $100M in model usage credits and $4M in donations to open-source security orgs under this project.

    Theo made a video on this, and I like his point: “Mythos is to Opus what Opus is to Sonnet.”

    Advertisement

    I tweeted a list of companies that Meta has acquired in the past year without anything to show for it, and soon after, Meta released details about their latest model - Muse Spark. At a glance, it sits somewhere between Sonnet 4.6 and Opus 4.6. Not usable yet: API access is coming, and there are promises about open-source too (rip llama).

    Many people are dunking on Meta for its not-so-frontier model release after spending billions and a year of silence, but I think it’s a good step ahead. Plus, have you used Instagram search over the past couple of months? It’s gotten really good courtesy of AI.

    As always, good recap from Ethan Mollick on the state of frontier models: Google, OpenAI and Anthropic lead, Meta joins the pack for now while xAI has fallen off, and the best Chinese models are still 7-9 months behind.

    ps: Factory’s desktop app is now out of beta. It comes with a cloud computer, the ability to use other apps on your device, and, of course, the ability to run and manage multiple Droid sessions easily.


    Ben’s Bites is brought to you by Attio, the AI CRM

    Honestly, no one gets excited about a CRM. But then they try Attio. It connects to Claude Code and n8n through its MCP server, completely bridging the gap between my customer data and apps. Wait, there's more, like flagging churn risk and turning customer feedback into Linear projects. Try it now.


    Headlines

    • Claude Managed Agents - You can use Claude’s developer console to build and deploy agents and let anthropic handle the infra for it, vs building it yourself. For example, Notion is using managed agents to build a “delegate tasks to Claude” feature. (Anthropic’s engineering blog on building this).

    Advertisement

  • Cursor has a new design mode to annotate and target UI elements in the browser. Plus, run Cursor on any machine and control it from anywhere, including your phone.

  • Gemini app finally has projects - they call it notebooks. Similar features as Claude/ChatGPT projects - move chats in/out of notebooks, notebook-specific files and memories, with the additional feature to sync these notebooks between the Gemini app and NotebookLM.

  • Clicky is an ambient AI buddy on your Mac. It sees your screen, talks to you and points at things to guide you (demo). Farza built (and open-sourced) it as a learning tool, but people are using it for everything.

  • Choosing an accurate speech-to-text model is harder than it looks. Benchmarking one is even harder. See why standard word error rate falls short, and what better STT evaluation actually looks like.*


  • Share this article

    Admin

    About Admin

    Admin is a tech writer specializing in PC hardware and component price analysis. With over 5 years of experience in the tech industry, they provide insights into market trends and help consumers make informed purchasing decisions.

    Subscribe to our newsletter

    Get the latest PC component price drops and tech tips delivered to your inbox weekly.

    We respect your privacy. Unsubscribe at any time.

    Advertisement