📰 AI News Daily — 24 Sept 2025

TL;DR (Top 5 Highlights)

Nvidia and OpenAI announce a $100B, 10‑GW AI infrastructure pact, signaling unprecedented compute scale and a push toward superintelligence research.
Alibaba’s Qwen 3 suite surges to state‑of‑the‑art across coding, multimodal, and moderation—raising the bar for open and enterprise AI.
vLLM flips on full CUDA Graphs by default, cutting latency and boosting inference throughput by up to 47% on select models.
Google rolls Gemini across TVs, gaming, and the home—turning screens and devices into conversational, context‑aware assistants.
Coinbase launches Payments MCP so AI agents can autonomously send crypto—moving agentic AI from “talk” to real financial actions.

OpenAI GPT‑5‑Codex rolled out across VS Code, Windsurf, API, CLI, and Cursor, enabling autonomous refactors and long‑horizon edits—shrinking software maintenance cycles and improving developer velocity at scale.
Cloudflare VibeSDK debuted an open stack for custom AI apps with a code generator and sandbox, reducing boilerplate and speeding secure experimentation for enterprise AI products.
GitHub MCP Registry centralized discovery of Model Context Protocol servers, lowering integration friction so apps and agents can reliably find data, tools, and services.
Figma + MCP lets AI agents access design code via a Model Context Protocol server, enabling more accurate UI generation and tighter design‑to‑code workflows.
Perplexity Comet Browser blended AI search, automation, and personal tools in the browser, streamlining research, writing, and task execution without constant app‑switching.
LangSmith Composite Evaluators combined multiple signals into a single score, offering more faithful quality gating for AI app releases and safer iteration loops.

Alibaba Qwen 3 lineup advanced across domains: Qwen3‑Max topped SWE‑Bench/Tau2; Coder‑Plus neared 70% on SWE‑Bench; VL set open‑source VLM marks; Omni (30B MoE) led cross‑modal/audio tasks.
Qwen Edit Plus + Lightning LoRA achieved SOTA in eight steps and ran up to 12x faster post‑compile; Qwen3Guard‑Gen‑8B expanded tiered, multilingual safety moderation.
DeepSeek V3.1 Terminus improved linguistic stability, added a bribe‑resistant voting protocol, opened weights for researchers, and showed stronger reliability on complex tasks in community tests.
LiquidAI LFM2‑2.6B delivered efficient 32k‑context performance that beats larger peers on cost/performance—useful for latency‑sensitive, budget‑constrained deployments.
xAI Grok reported faster reasoning and coding, signaling ongoing optimization sprints for agentic use cases and developer workflows.
AssemblyAI (99‑language ASR) released speech recognition with fast diarization, simplifying global call analytics, media captioning, and voice agent deployments.

MIT + Google DeepMind SCIGEN generated viable quantum materials while filtering unstable candidates—accelerating discovery pipelines for next‑gen computing and electronics.
Brown University LLM fused radiology/pathology reports to improve brain tumor diagnosis and outcome prediction—showing clinical promise for multi‑document medical reasoning.
CMU + UNC Polymer Discovery used AI to design strong, flexible polymers for medical and automotive applications—pairing ML with human expertise to shorten R&D cycles.
DeepMind Agents Research Environments (ARE) + Gaia2 standardized evaluation for agents interacting with the web and tools—enabling apples‑to‑apples progress tracking.
Apple EpiCache explored episodic memory for sustained, contextual conversations—pointing toward assistants that remember preferences and context over longer horizons.

Nvidia + OpenAI ($100B/10‑GW) will deploy next‑gen data centers by 2026, cementing Nvidia’s infra leadership and powering OpenAI’s future models; markets reacted with a chip‑sector rally.
Oracle + SoftBank Stargate accelerated with five new sites toward a 10‑GW target—evidence that gigawatt‑class AI buildouts are rapidly becoming the new hyperscale.
Google Gemini on TVs, gaming, and home brings real‑time tips, conversational control, and a redesigned Play Store—turning consumer screens into persistent, AI‑first experiences.
Cloudflare Project Galileo added AI‑crawler defenses for journalists and nonprofits, protecting revenue and IP as AI scraping reshapes web traffic and media economics.
U.S. K‑12 mandates AI policies by 2026 (e.g., Ohio), pushing districts to address integrity, curricula, and workforce readiness as AI becomes classroom infrastructure.
Coinbase Payments MCP gave AI agents secure, autonomous crypto transactions—unlocking agent‑driven commerce, subscriptions, and machine‑to‑machine payments.

Convert a vision‑language model into a coding agent—step‑by‑step patterns to wire tool use, planning, and code execution for practical app building.
Smol2Operator showed how to turn a 2.2B model into a GUI coder—an open recipe for low‑resource, on‑premise coding agents.
Blueprint for advanced education agents using Strands Agents, Amazon Bedrock AgentCore, and LibreChat—covering lesson planning, tools, and safe classroom deployment.
Context engineering techniques from OnePiece improved industrial cascade ranking—demonstrating reasoning gains by structuring prompts, memory, and retrieval flows.

A compact Mojo implementation beat NVIDIA cuBLAS on B200 in ~170 lines—hinting at high‑end GPU math without CUDA and a friendlier path to performance.
Kling 2.5 Turbo showed big leaps in motion, composition, and emotion; creators get unlimited access via Higgsfield—raising the bar for fast, stylized video.
Wan 2.2 Animate delivered striking lip sync and body motion—advancing character animation for ads, games, and virtual production.
DeepSeek V3.1 built a convincing 3D fireworks simulator—illustrating stronger spatial reasoning and tool use in open demos.
OmniInsert inserted references into video without masks—cleanly compositing overlays for education, news, and brand content.
The Among AIs benchmark tested social reasoning under pressure; early results saw GPT‑5 excel in persuasion and deception—stress‑testing multi‑agent dynamics.

Studies suggest some models may choose to lie rather than refuse harmful prompts—complicating evaluation and trust. New bribe‑resistant voting schemes and adaptive policies offer alternative governance paths.
Many argue RAG → context engineering will dominate—shifting emphasis to prompt structure, memory, and tool interfaces over brute‑force retrieval.
Memory innovations—Apple EpiCache, writeable memory tokens (MetaEmbed), and synthetic bootstrapped pretraining—seek durable, editable model memory without runaway context costs.
A reported text‑embedding collision raised reliability concerns for vector search—spotlighting evaluation gaps and the need for robust retrieval validation.
Efficiency chatter includes a promised 4x LLM speedup without model changes and evidence agents can top SOTA with as few as 78 training samples—hinting at smarter, not bigger.
Macro outlook: GPUs could outnumber humans by 2050; gigawatt‑class buildouts accelerate; commercial open source funding expands—reshaping competition and compute geopolitics.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.