📰 AI News Daily — 25 Oct 2025

TL;DR (Top 5 Highlights)

AMD inks landmark chip pact with OpenAI, eyeing $100B upside and a 10% equity option—chip race intensifies.
Anthropic gains access to a million Google TPUs, sharpening the cloud-and-browser rivalry.
OpenAI’s Sora 2 hits 1M installs in five days; Android launch announced.
Salesforce unveils MuleSoft Agent Fabric to govern enterprise AI agents at scale.
UK issues a national framework for AI scribing; OpenAI adds UK data residency for enterprises.

Mistral AI launched a production-grade agent studio with deep observability and controls, helping teams debug tools, track actions, and safely deploy complex, multi-step assistants into real workflows.
Meta open-sourced CTran, a cross-vendor GPU collectives library that works across NVIDIA and AMD, widening hardware choices and lowering vendor lock-in for high-performance distributed AI training.
Nanochat by Andrej Karpathy offers a fully open, low-cost chatbot pipeline—data, training, and serving—giving developers an end-to-end baseline to experiment, extend skills, and benchmark agent behaviors.
vLLM ecosystem added TPU support, compressed tensors, DeepSeek-OCR integration, and MoE via transformers, pushing faster, cheaper serving for multimodal and mixture-of-experts workloads across diverse hardware.
Runway opened automated Workflows to all plans, plus advanced fine-tuning and ad tools, making enterprise-grade video automation accessible to smaller teams and marketing orgs.
OpenRouter introduced :exacto, a stricter tool-calling mode that reduces misfires and hallucinated tool usage, improving reliability for production agents executing API calls and multi-tool chains.

Meta released ScaleRL, a framework to predict how different reinforcement learning strategies scale with model size, helping researchers choose methods that actually improve larger LLMs instead of plateauing.
Lookahead improves multi-model routing by estimating outcomes before full inference, boosting selection accuracy and cutting costs when dispatching queries across specialized models or tiers.
Baseten hit >650 tokens/s and ~110 ms TTFT on a 120B open model; paired with 4‑bit QAT, it signals cheaper, faster LLM serving at scale.
MiniMax M2 emerged as a low-latency model strong on coding and agent tasks, with free trials showcasing responsiveness that suits tool use, routing, and interactive applications.
Researchers proposed Scaf‑GRPO, giving models targeted hints only when stuck, yielding a 44% boost on complex reasoning benchmarks and hinting at more autonomous, self-correcting LLM training loops.

Anthropic released ImpossibleBench, stress-testing whether coding agents follow specifications versus game reward signals—highlighting fragile incentives and the need for robust, spec-faithful agent evaluations.
Multiple teams demonstrated black-box techniques to prove model theft via memorization patterns and training-order fingerprints, advancing provenance science and offering practical recourse when weights remain inaccessible.
Tahoe AI open-sourced Tahoe‑x1, a 3B single-cell foundation model setting new marks on cancer tasks, signaling rapid progress for open biotech models that run on modest hardware.
MedSAM’s medical segmentation work surged past major citation milestones, reflecting broad clinical interest in foundation segmentation models and their potential to standardize labeling across institutions.
New theory tied classic graph algorithms to Transformer attention mechanics, sharpening our understanding of what LLMs can compute and guiding architectures for better structure-aware reasoning.

AMD struck a multi-year AI chip deal with OpenAI, with potential $100B revenue and a 10% equity option; yet manufacturing remains bottlenecked by TSMC, underscoring supply-chain concentration.
Anthropic and Google expanded their cloud pact, giving Claude access to roughly a million TPUs—fueling faster training and stoking competition over AI-first browsing and assistant experiences.
OpenAI’s Sora 2 video app passed one million downloads in five days and is coming to Android, intensifying pressure on Meta’s engagement and reshaping mobile creative workflows.
Salesforce unveiled MuleSoft Agent Fabric to register, monitor, and orchestrate thousands of enterprise AI agents across platforms, advancing security, compliance, and real-world deployments beyond pilot demos.
Policy momentum: the UK rolled out data residency for ChatGPT business customers and a national framework for AI scribing, while the Illinois State Bar issued guidance for ethical AI in legal practice.
Safety front: OpenAI backed biosecurity startup Valthos with $30M; lawsuits over ChatGPT’s safety and deepfake harms mounted; and the new Atlas browser raised prompt-injection security concerns.

Together AI published a step-by-step guide to train and deploy Nanochat on instant GPU clusters; Karpathy showed how to add new skills, speeding practical customization.
Engineers shared a deep PyTorch bug-hunting journey that surfaced optimizer, memory, and kernel nuances—valuable for diagnosing performance regressions in production training loops.
A clear explainer reframed RL environments as benchmarks with automatic verifiers, helping practitioners design measurable tasks and avoid reward-hacking traps in agent training.
Modular released Mojo GPU Puzzles with 34 progressive challenges across NVIDIA, AMD, and Apple GPUs, providing hands-on practice for performance tuning and kernel craftsmanship.
Stanford made a full AI curriculum available online, bundling lectures and materials that help learners traverse fundamentals to advanced topics without expensive courseware.

Apple Vision Pro streaming decoder delivered 4K-per-eye PC VR at 120 Hz wirelessly with low latency, demonstrating ample headroom for demanding mixed-reality experiences while multitasking.
Veo 3.1 impressed early testers with frame-to-frame editing and extend features, signaling rapid gains in controllability for professional-grade, AI-assisted video creation workflows.
Wan 2.2 with Glif produced convincing character swaps approaching film quality, hinting at near-term disruption to VFX pipelines for ads, trailers, and short-form content.
Higgsfield Popcorn enabled typed edits to generate multiple precise video variations with robust subject or background locking, boosting iteration speed for editors and social teams.

Neuro-symbolic approaches and the proposed Tensor Logic language aim to unify neural, symbolic, and probabilistic reasoning—promising more reliable planning, verification, and generalization in agent systems.
A “coverage profile” view links pretraining to downstream performance, encouraging dataset design and evaluation methods that measure where models truly generalize versus merely memorize.
Practitioners argue prompt optimization can beat traditional interpretability for actionable gains, while warning many RAG systems “shortcut” answers via shallow retrieval rather than genuine synthesis.
Benchmark critiques note open-source coding tests often mismatch real developer prompts; skeptics are shifting from correctness fears to concerns about code overload in AI-assisted development.
Community voices defend open-source as a check on concentrated power, while reexamining whether past over-caution on nuclear/GMOs increased overall risk—urging balanced, evidence-based AI governance.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.