Quick scan before we dig in. Google DeepMind dropped an AI co-mathematician built on Gemini 3.1 Pro that just set a new record on FrontierMath Tier 4, and it actually helped a human solve a 60-year-old open problem. OpenAI countered Anthropic's Mythos with GPT-5.5-Cyber, a defender-only cyber model locked behind a vetted access program. Bitcoin is hovering near 80,000 dollars with a rising wedge and traders getting loud on social media, which usually ends one way. Network difficulty just dropped 2.3% on May 1, the sixth cut of the year, and seven major mining pools including Foundry, Antpool, and F2pool joined the Stratum V2 working group. And in Europe, Adam Back wrote another check into Capital B and got the convertible terms repriced in his favor. Let's get into it.
The headline number is 47.9% on FrontierMath Tier 4. That's the hardest tier of a benchmark designed to break frontier models, and DeepMind's new AI co-mathematician solved 23 of 48 problems, including 3 that no prior model had cracked. GPT-5.5 Pro held the previous record at 39.6%. Epoch AI ran the test blind, with up to 48 hours of compute per question.
But the benchmark isn't really the story. The architecture is. This isn't one giant model brute-forcing answers. It's a hierarchy. A project coordinator delegates to workstream coordinators, which in turn dispatch specialist agents for literature retrieval, code execution, counterexample search, and proof verification using Gemini Deep Think. Reviewer agents audit the proofs before anything ships. It's basically a small research lab simulated in software.
The proof of concept is what matters. Topologist Marc Lackenby used the system to attack Kourovka Notebook problem 21.10, an open question about whether every finite group has what's called a simply finite presentation. The system pursued proof and disproof in parallel. It produced a draft, the reviewer agents flagged a gap, Lackenby steered it toward the right strategy, and they assembled a complete proof together. A second pass caught two minor errors. End result: a 60-year-old open problem, closed.
The limitations are real. Reviewers can converge on plausible but wrong reasoning. The team describes death spirals where the system endlessly revises without progress. Access is restricted to a small group of mathematicians. But pair this with AlphaEvolve, DeepMind's other coding agent that's now improving DNA error correction by 30%, lifting power-grid optimization feasibility from 14% to 88%, and proposing quantum circuits with 10x lower error on Google's Willow chip, and the picture is clear. The agentic-research thesis is starting to deliver hard, verifiable wins in domains where you can't fake the answer. Math doesn't care about vibes.
While DeepMind chases proofs, the rest of the agent ecosystem is getting more practical and a lot more interesting. Anthropic shipped a meaningful update to Claude Managed Agents this week. Three things stand out. First, dreaming, which is a scheduled process that reviews past sessions and memory stores, extracts patterns, and refines what the agent has learned. Think of it as the agent consolidating memory while off the clock. Second, outcomes, where you define a rubric and a separate grader, in its own context window, evaluates whether the work meets the bar and tells the agent what to fix. Internal benchmarks show 8 to 10 point gains on hard tasks. Third, multi-agent orchestration, where a lead agent delegates to subagents working in parallel on a shared filesystem. Netflix is already using it in production for their platform team.
Meanwhile OpenAI launched the Operator API. Autonomous web actions starting at 15 cents per completed task, up to 100 tasks per minute, with logs and screenshots for debugging. Zapier and Notion are early integrators. And Google is doing something interesting and a little awkward at the same time. They quietly shut down Project Mariner, their vision-based browser agent, because it cost roughly 85x more compute than text-based coding agents. The capabilities are being folded into Gemini, including a new macOS agent that can drive your Mac, organize files, scan invoices into Sheets, and draft follow-ups from your Meet transcripts.
The pattern across all three labs: vision-first browser agents are losing to code-first agents because the unit economics don't work. Anthropic and OpenAI both leaned heavily into structured tool-use and code execution. Google is now retreating to the same playbook. If you're building on agents, the lesson is to bet on architectures where the model writes and runs code rather than staring at pixels. Cheaper, faster, more reliable, and easier to verify. The pixel-pushing era of agents was a brief detour.
Bitcoin mining is in a strange place. Difficulty fell 2.3% on May 1, the sixth downward adjustment of 2026. Year to date, difficulty is down 10.7%, from 148.3 trillion to 132.47 trillion. Hashrate slipped below 1 zettahash per second, sitting between 899 and 958 exahash. The next adjustment around May 17 could go down again.
Why is hashrate falling? Two reasons, both structural. First, public miners are at or below breakeven. Production costs sit around 80,000 to 90,000 dollars per coin, and Bitcoin is right there at 80,000. Public miners sold a record 32,000 BTC in Q1, more than all four quarters of 2025 combined. Second, AI and HPC are eating their lunch on the energy side. Over 70 billion dollars in AI and HPC contracts have been announced in the mining sector. TeraWulf is the poster child this week, posting a 427 million dollar quarterly loss but doubling AI revenue, with HPC lease income up 117% quarter-on-quarter to 21 million dollars. They're literally repurposing power infrastructure away from hashing.
The one bright spot is hashprice, up to roughly 38 dollars per petahash per day, with fees per block up 12% week-over-week. Falling difficulty plus stable price equals fatter margins for whoever's still plugged in.
The more interesting story is structural. Seven major pools, Antpool, F2pool, Foundry, Spiderpool, Block Inc., MARA Foundation, and DMND, joined the Stratum V2 Working Group on May 7. Stratum V2 lets individual miners choose their own block templates instead of accepting whatever the pool operator dictates. End-to-end encryption, lower bandwidth, and Braiins testing shows up to 7.4% higher profitability from faster template delivery and better fee capture. This is the most decentralization-positive mining news in years. If individual miners actually start picking their own templates at scale, you push back on a real censorship vector and you push back on the OFAC-compliant block construction problem. The fact that Foundry and Antpool are at the table, given they mined 48% of last week's blocks between them, is what makes this credible.
Adam Back wrote another check, this one for 1.1 million euros, into Capital B, the Europe-listed Bitcoin treasury company formerly known as The Blockchain Group. The structure is what makes this worth talking about, because it's a different model from the Strategy playbook everyone copies.
The deal: Back subscribes to 10 million share warrants at 11 cents each, exercisable at the higher of 84 cents or what Capital B calls mNAV 1.1, which is 110% of the euro value of Bitcoin held per fully diluted share. In other words, the strike price floats with Bitcoin. At the same time, Capital B cut the conversion price on Back's existing convertible bonds in half, from 5.17 euros to 2.59 euros, and removed the share-price condition that previously gated conversion. Bonds carry zero coupon and can be redeemed in Bitcoin, euros, or shares.
Back now sits at roughly 9.97% fully diluted. Capital B holds about 2,943 BTC, around 234 million dollars worth.
The interesting bit isn't the size, it's the explicit metric: Bitcoin per fully diluted share. That's what they're optimizing. Strategy uses BTC yield as a similar concept, but Capital B has hard-coded it into the actual instruments. The warrant strike, the conversion price, the whole capital structure is denominated against Bitcoin held per share rather than stock price. If you're a shareholder, you don't care if the equity rallies, you care if the Bitcoin-per-share number goes up.
This is the second wave of treasury company design. The first wave was simple: raise dollars, buy Bitcoin, watch the multiple expand. We saw what happens when that goes wrong this week, with Trump Media reporting a 406 million dollar quarterly loss, mostly unrealized markdowns on Bitcoin bought near last summer's peak and Cronos tokens from the Crypto.com deal. Buying high with leverage and crossing your fingers is not a strategy. Capital B is trying something more disciplined, with instruments that align everyone around the per-share Bitcoin metric. Worth watching whether other European treasuries copy the structure, and worth watching whether Adam Back keeps stacking equity in vehicles that are basically Bitcoin-denominated by design.
One prediction. Within 18 months, at least one major Bitcoin mining pool will run Stratum V2 by default, and at least one mid-tier public miner will quietly delist its mining segment to become a pure AI infrastructure company. The economic gravity is too strong in both directions.