📰 AI News Daily — 21 Oct 2025

TL;DR (Top 5 Highlights)

IBM plugs Groq’s LPU inference into watsonx, reporting major speed and cost wins for enterprise AI.
Multi-billion chip deals between OpenAI and Nvidia/AMD/Broadcom intensify the compute arms race.
Alibaba’s Qwen3 scales to trillion-parameter MoE; DeepSeek V3.1 shines in real-money trading benchmarks.
A major AWS outage knocked top apps offline, underscoring cloud fragility beyond AI.
Video and robotics leap forward: Krea’s open real-time video model, Veo 3.1 VFX upgrades, and Unitree’s lifelike H2 humanoid.

Krea 14B open-sourced a real-time video model (Apache 2.0) streaming long-form clips at double-digit FPS on one accelerator, making high-quality video generation accessible to more creators and teams.
DeepSeek OCR reads handwriting across 100+ languages and compresses visual context dramatically, enabling cheaper, higher-throughput document pipelines on Hugging Face for enterprise-scale ingestion.
dstack launched a UI to spin GPU dev environments directly into VS Code and Cursor, shrinking setup time and standardizing reproducible, cloud-backed AI development workflows.
TabbyAPI added tensor parallelism, boosting throughput on multi-GPU servers for self-hosted code assistants and chat, reducing latency and infrastructure costs for scaling inference.
Keras 3 now includes built-in GPTQ quantization across JAX, TensorFlow, and PyTorch, cutting memory and serving costs while preserving accuracy for many production workloads.
LangChain integrated Model Context Protocol (MCP) for human-in-the-loop checkpoints, improving oversight and recoverability in agent workflows for regulated or safety-critical deployments.

Claude Sonnet 4.5 and GLM 4.6 climbed web-dev leaderboards; Baseten claims fastest GLM 4.6 serving—evidence of maturing performance and deployment options across vendors.
Alibaba Qwen3 added a trillion-parameter MoE LLM and an open-weight VLM with up to a million-token window, pushing context length and multimodal capability for enterprise and research.
DeepSeek V3.1 paired open access with aggressive pricing and standout live trading benchmarks; prompt sensitivity highlights gaps between static tests and real-world agentic performance.
Safety and reasoning advanced via new misalignment classifiers, ByteDance’s ReSA safety dataset, and CaRT, a method teaching models when to stop gathering information and act.
Rumors signal stronger reasoning in Gemini 3 Pro, speed/accuracy gains in Kimi K2, and 2025 agent benchmarks like TerminalBench, reflecting rapid iteration toward robust agents.
OpenAI added ChatGPT “selective forgetting,” letting users choose what is retained or discarded—improving privacy, personalization, and compliance for long-running assistant use.

Hugging Face hosts the 308GB CommonForms VLM dataset for form understanding, offering rich multimodal supervision to train and evaluate document-heavy vision-language systems.
NVIDIA previewed QeRL, a lighter, faster reinforcement learning approach that reduces compute overhead—promising quicker iteration in decision-making agents and robotics.
Astronomers used Google Gemini to identify supernovae from few examples with interpretable outputs, improving trust and accelerating analysis across massive astronomical datasets.
DeePFAS advances detection of persistent “forever chemicals” with AI, aiding environmental monitoring, public health, and regulatory compliance.

IBM integrated Groq LPU inference into watsonx, reporting major speed and cost gains on enterprise workloads—evidence specialized accelerators can materially shift AI TCO.
A widespread AWS outage disrupted OpenAI, Snapchat, Canva, Signal, and Duolingo, spotlighting systemic cloud fragility rather than AI-specific failure modes.
Nvidia, AMD, and Broadcom secured multi-billion OpenAI chip deals, intensifying the hardware arms race and underscoring capital concentration in AI compute supply chains.
Modular broadened support to seven GPU architectures and set records on AMD’s MI355 series, while AMD’s efficiency gains continue narrowing the gap with NVIDIA.
China advanced alternative lithography paths—SSMB, nanoimprint, and multi-beam e-beam—signaling competitive pressure on EUV and long-term diversification in chipmaking.
Security stack hardened: Amazon Bedrock Guardrails add customizable protections against content and encoding attacks, Microsoft released an open cybersecurity benchmark, and OpenAI tightened Sora 2 consent protocols amid deepfake concerns.

Latent Space released an Open Model Pretraining Masterclass, distilling 2025 best practices and research highlights into pragmatic recipes for training open models.
Stanford published a fully open, step-by-step blueprint for building language models, a rare end-to-end reference from data to evaluation.
Hugging Face launched a comprehensive robotics course covering classical control, real-world RL, generative methods, and generalist policies—an accessible on-ramp to embodied AI.
A hands-on guide shows applying GPTQ quantization in Keras 3 across JAX, TensorFlow, and PyTorch, cutting memory while preserving accuracy in production.
A practical text-to-SQL walkthrough wires open models with orchestration to answer complex database questions, demonstrating retrieval, verification, and governance patterns.

Open Krea 14B streamed long-form, real-time video on a single accelerator, highlighting rapid efficiency gains for live creative workflows and prototyping.
Google Veo 3.1 topped video leaderboards, adding start-to-end frame transitions and surgical object removal—bringing cinematic VFX within reach of individual creators.
Sora 2 refined moderation to reduce false positives after user feedback, aiming for safer and less disruptive creative sessions at scale.
Unitree H2 debuted a taller, more lifelike humanoid with a redesigned hip, reflecting swift progress in capable, affordable robotics platforms.
A Glif-based mobile agent blended AI with live footage for on-the-go Hollywood-style effects, foreshadowing real-time, on-device post-production.

Developers argue AI-generated code hasn’t sped delivery due to review bottlenecks and inconsistent “coding personalities,” even as idea-to-app time keeps falling.
The “AI Operating System” concept gained traction as a unifying layer for intelligent apps—standardizing memory, tools, governance, and policy across agents.
Safety debates intensified: LLMs as insider threats, calls to validate model-judges against humans, and evidence VLM evaluators can flip with minor prompt changes.
Andrej Karpathy emphasized RL breakthroughs as pivotal for AGI and reframed modern AI systems’ nature, while others warned safety activism can drift from technical realities.
OpenAI retracted claims about solving Erdős problems, renewing calls for rigor, transparency, and careful evaluation in AI research communications.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.