Skip to the content.

Summary:

News / Update

Google’s Gemini 3 family dominated headlines, with Pro rolling out broadly across the Gemini App, AI Studio/API, Vertex, VS Code/Copilot CLI, and even shipping into Google Search’s AI Mode on day one—an unusually fast path from model to user impact. Anthropic, Microsoft, and NVIDIA announced a deep partnership to run Claude on Azure (including Microsoft 365 Copilot and Excel’s Agent Mode), signaling multi-cloud reach and more compute behind Anthropic’s roadmap. Funding remained hot: Tenstorrent is raising $800M to challenge NVIDIA, Gamma secured $68M at a $2.1B valuation, and open-source agent project OpenHands landed an $18.8M Series A. You.com and Databricks are launching a Model Context Protocol Marketplace with governed, real-time web access, while xAI released Grok 4.1 as a frontier conversational model. Elsewhere, Baidu’s AI bet showed strain as Erniebot lagged and revenue slipped; a Meta AI researcher departed to found a human-centered startup; Accel named finalists for its US AI 100 amid an 84% surge in AI funding this year; and industry events proliferated, from a New York hackathon on agent reasoning to DeepMind’s seminar on automating AI research and Synthesia’s awards for AI video creators. Rumors also pointed to turbulence at OpenAI, underscoring the sector’s volatility.

New Tools

A wave of agentic and developer tooling arrived. Google unveiled Antigravity, a next-gen, agent-first IDE and platform that orchestrates task-centric AI agents across editor, terminal, and browser with parallel workspaces—early users praised its autonomy with human oversight. Allen Institute introduced Deep Research Tulu, an open toolkit for training agents that plan, search, and synthesize long-form research. Nous Research connected its Tinker API to Atropos to run scalable RL jobs on managed clusters without GPUs, lowering cost barriers to experimentation. A new Bubblewrap-based code sandbox boots a full Alpine Linux environment in under 2ms, enabling safer, near-instant LLM code execution. Sourceful’s Riverflow 2 Preview topped image editing leaderboards by combining an advanced reasoning engine with open diffusion. Databricks and You.com announced an MCP Marketplace that brings governed, real-time web context into enterprise workflows—another sign of tools converging around standardized agent interfaces.

LLMs

Gemini 3 Pro and Deep Think set an aggressive new bar in model performance. Reported results include a 2x state-of-the-art jump on ARC v2 and ARC-AGI-2, leadership on long-context evals and LiveBench, top marks on agentic browsing benchmarks (e.g., Vending-Bench Arena, Stagehands), and a surge to the top of LMArena with 1500+ Elo. On specialized tests, claims cite a 2.2x advantage over GPT-5 on LisanBench, large margins on MathArena Apex (e.g., 23.4% vs GPT-5.1’s 1%), and strong code and reasoning wins against Claude and Grok. Pro also demonstrated unusually deep encoding/decoding skill (handling Base64 nested four layers) and competitive price-performance. Google attributes the leap to improved pre- and post-training, countering narratives that simple scaling has peaked. Deep Think, credited with medal-winning math/programming performance, underpins these gains and is said to be still improving. Early community takes describe Gemini 3 as a step beyond Claude and Gemini 2.5, with Kimi K2 emerging as a promising rival on high-level reasoning. Meanwhile, xAI’s Grok 4.1 emphasized conversational and emotional intelligence, keeping competitive pressure high across modalities.

Features

Product integrations accelerated. Gemini 3 is live inside Google Search’s AI Mode from day one and now plugs into VS Code and the GitHub Copilot CLI for faster front-end and agentic workflows. LangChain shipped reliability middleware for cross-provider model fallback and added first-class support for building agents with Gemini 3. Microsoft introduced Copilot Mode in Edge for Business, automating multi-step workflows and analyzing up to 30 tabs with enterprise safeguards. Teams can now host a private VS Code marketplace for internal extensions, a long-standing developer request. LlamaExtract added granular PER_TABLE_ROW extraction to turn tables and lists into structured JSON at scale. Claude’s availability widened via Azure, powering Microsoft 365 Copilot scenarios and Excel’s Agent Mode.

Showcases & Demos

Developers showcased Gemini 3 as a rapid software builder: turning images, PDFs, and sketches into working apps, generating full websites from a single prompt, and producing a complex 3D LEGO editor with UI and spatial logic in one shot. “Vibe coding” projects appeared quickly—maze games, website designers, and interactive visualizations—highlighting collaborative agent workflows and brisk iteration. Creatives reported more natural, coherent storytelling quality than prior generations, while educators demoed scientific tutors and explorable explainers. Bench-style creative challenges across models (e.g., modern SVG art, game coding, puzzle-solving) reinforced Gemini 3’s versatility. On the low-cost hardware front, DSPy-driven prompt optimization notably boosted chat-to-SQL accuracy on a Raspberry Pi, hinting at wider access to high-quality AI behavior.

Tutorials & Guides

Educational resources focused on agent design and reasoning improvements. A GRPO explainer contrasted gradient-based training with training-free comparison methods for better reasoning. CrewAI launched a Coursera course on designing, developing, and deploying real-world multi-agent systems. Weekly research roundups highlighted progress in reinforcement learning, intelligence efficiency metrics, and novel training strategies. Upcoming deep dives include a technical session on Kimi K2’s trillion-parameter MoE with heavy tool use, and a Google DeepMind seminar on automating AI research—both aimed at practitioners pushing state-of-the-art workflows.

Discussions & Ideas

Conversations centered on how AI is used, built, and felt. Studies and household anecdotes suggest people form instant rapport with virtual beings, underscoring the social and ethical weight of companion AI. Product teams champion “ambient agents” as a rising UX paradigm that blurs app boundaries. A new survey indicates most production AI teams favor open-source models and wire agents directly to databases, signaling a shift from chat-centric UX to backend-integrated intelligence. With data center demand projected to hit city-scale power levels by 2030, researchers argue for a pivot to “intelligence per watt” and more capable on-device AI. Despite Gemini Deep Think’s benchmark leap, researchers are probing why performance isn’t yet saturating ARC-AGI and what those gains reveal about reasoning. Broader cultural debates continue, from a TED talk on AI’s role in filmmaking to reflections on nature’s agent ecosystems as inspiration for digital intelligence design.