📰 AI News Daily — 13 Sept 2025

TL;DR (Top 5 Highlights)

Google debuts VaultGemma, an open model trained end-to-end with differential privacy—signaling new standards for privacy-preserving AI.
OpenAI and Microsoft rework their partnership; both plan multi‑billion UK data centers, escalating cloud and AI infrastructure competition.
Alibaba unveils the Zhenwu AI chip and introduces ultra‑efficient Qwen3 Next models, intensifying China’s AI silicon and model race.
U.S. FTC opens a probe into AI chatbots’ child safety, foreshadowing tighter guardrails for content, privacy, and product design.
Zoox begins free robotaxi rides in Las Vegas, marking tangible progress toward commercial autonomous mobility.

DeepAgents: A production-ready framework for building Claude Code agents with planning, file I/O, and sub‑agents—helping teams turn prototypes into dependable developer assistants faster.
Coinbase Payments MCP: On‑chain payments for AI agents across multiple blockchains, enabling autonomous apps to transact natively and expanding real utility for agent ecosystems.
Microsoft AI Shopping Agent: Personal shopping guidance with brand‑specific knowledge, sharpening the race in conversational commerce and improving product discovery for retailers and consumers.
Adobe AI Agents (Experience Platform): Natural‑language agents automate marketing, support, and site optimization—reducing manual workflows and lifting engagement across enterprise customer journeys.
Stable Audio 2.5 (Stability AI): Fast, high‑quality music generation with inpainting and mood prompts, lowering creative barriers for audio teams and brand sound design.
Lightning Spreadsheet QA App: A lightweight browser tool delivers sub‑300ms Q&A over spreadsheets using a local model—bringing private, near‑real‑time analytics to the edge.

Google VaultGemma: Open weights and a technical report demonstrate end‑to‑end differential privacy at scale, offering a credible blueprint for privacy‑first model training.
OpenAI: Hints of a new GPT‑5 variant in Codex‑CLI, higher API rate limits, and better calibration—suggesting fewer hallucinations and stronger reliability, pending full details.
Ling‑mini‑2.0: A 16B MoE trained on 20T+ tokens with RL post‑training, offering higher throughput and open checkpoints—balancing cost, speed, and reasoning power.
MobileLLM‑R1 (Meta) and peers: Sub‑billion models achieve strong reasoning—especially in math—showing data‑efficient training can rival larger models for targeted tasks.
Qwen3 Next 80B (Alibaba): Hybrid design activates ~3B parameters per token; A3B variant claims up to 90% lower training cost and 10x faster inference using Gated DeltaNet.
Evaluation Scrutiny: New entrants like K2‑Think face contamination and comparability questions, underscoring the need for transparent, standardized testing.

OpenAI on Hallucinations: New work argues confident falsehoods stem from current training regimes, indicating systematic mitigation—not elimination—will define trustworthy AI.
World‑Models Workshop (Montreal): Talks by Bengio and LeCun highlight progress toward grounded, planning‑capable agents—bridging perception, memory, and control.
RL for LLMs Survey: A comprehensive review of reward engineering and real‑world applications across coding, robotics, and agents helps teams choose practical RL strategies.
Learning Physics from Video (Meta): Findings show models can acquire core physical intuitions from video, informing better world‑modeling and robotics.
Document Retrieval Advances: ColPali‑style encoders improve retrieval on long, structured documents; datasets like FinePDFs and turn‑detection corpora boost benchmarking.
Agent Failure Modes: Studies map seven common failures; tools like HAL and Docent aim to diagnose issues and guide robust agent design.

OpenAI–Microsoft Restructure: New governance and funding terms reportedly value OpenAI’s PBC above $100B, with nonprofit profit share falling to 20% and IPO talk re‑ignited.
UK Data Centers (Nvidia, OpenAI): Multi‑billion investments expand AI compute in Britain, strengthening Europe’s role in global model training and deployment.
FTC Child Safety Probe: U.S. regulators examine chatbot protections for minors, signaling incoming rules on content safety, data use, and product defaults.
Alibaba Zhenwu AI Chip: China’s AI silicon race heats up as Alibaba debuts a new accelerator—alongside fresh analyses of 2025 cloud GPU costs and strategies.
Zoox Robotaxi (Las Vegas): Free public rides mark a milestone for autonomous mobility, pressuring rivals and advancing real‑world safety validation.
Cybersecurity Alerts: Villager automates pentesting via natural language; SpamGPT turbocharges phishing; “EvilAI” malware spreads with AI‑generated code—raising enterprise risk.

Anthropic: A practical playbook for co‑developing tools with Claude Code helps teams ship safer, more capable coding agents.
DeepMind: Pragmatic GPU usage strategies translate research insights into cost‑efficient training and inference operations.
Better RAG by Design: Late chunking and DSPy Tool abstractions enable hybrid vector‑plus‑graph retrieval, improving answer accuracy on complex documents.
LlamaIndex: An end‑to‑end guide to building observable, evaluated PDF agents shows how to measure quality before production.
Planning 2025 Compute: Hardware primers and cloud GPU market analyses equip teams to budget, right‑size, and future‑proof AI infrastructure.

Gemini 2.5 Flash Image vs. Image 4.0 Ultra: Community leaderboards show a near tie on text‑to‑image—underscoring rapid convergence in consumer‑grade generative quality.
Google Nano Banana: Viral 3D “action‑figure” transforms demonstrate playful, low‑friction creation—bringing 3D generation to mainstream audiences.
Kling AI Avatars: Lip‑sync up to 60 seconds from a single image across realistic, anime, animal, and 3D styles—praised for expressive, broadcast‑quality results.
Seedream 4: Strong performance in image generation and editing boosts ByteDance’s creator tooling ambitions against Google’s Gemini ecosystem.
Voice of AGI Showcase: Voice avatars and real‑time agents impress developers, highlighting fast progress in multimodal interactivity and latency.
MCP Hackathon Africa 2025: A 40‑city event advances AI sovereignty, centering local languages and priorities while accelerating regional developer ecosystems.

Evaluation Reform: Current tests may incentivize guessing; public dashboards and deeper log analysis aim for realistic, transparent measurement (a SWE‑bench “breakout” traced to a bug).
Agents Still Struggle: Seven failure modes persist on hard tasks; new, live benchmarks like LiveMCP‑101 push toward practical, end‑to‑end reliability.
Smarter Training Bets: Data‑efficient reasoning and post‑training compute can outperform raw scale; some teams trial on‑policy RL directly in production loops.
How Models Think: Evidence for beneficial attention sinks and stronger hierarchical reasoning adds nuance to capabilities—and to where engineering effort should focus.
Scientific Integrity: Studies show LLM annotators can fabricate outcomes; rigorous guardrails remain essential for research workflows and high‑stakes domains.
Culture Meets AI: Meme‑heavy datasets reportedly degrade facial recognition—an odd twist where internet culture shapes real‑world surveillance accuracy.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.