📰 AI News Daily — 25 Dec 2025

TL;DR (Top 5 Highlights)

NVIDIA licenses Groq’s ultra-fast inference tech to sharpen performance; GroqCloud remains independent.
OpenAI launches GPT-5.2 and explores ads in ChatGPT while eyeing a $100B raise.
Disney inks a $1B deal with OpenAI for character-driven AI storytelling starting in 2026.
Intel touts a 12x reticle-size breakthrough, potentially reshaping advanced chip manufacturing economics.
AI shopping agents could reroute $1T in U.S. retail by 2030, pressuring Amazon’s core model.

Anthropic Bloom: Open-source agentic evaluation framework automates alignment and behavior testing at scale, reducing friction for safety teams and researchers running large, targeted evaluation suites.
OpenAI Codex Skills: New standard for shareable, natural-language coding workflows lets teams build, reuse, and automate development tasks—boosting consistency and accelerating software delivery across toolchains.
DingTalk AgentOS: Alibaba’s workplace OS coordinates AI workflow agents across enterprise tools and new hardware, signaling a push toward agent-first productivity in large organizations.
Zoom AI Companion 3.0 (Web): Free browser release adds smarter docs, reflection reports, and meeting features, positioning Zoom as a central hub for integrated, AI-driven knowledge work.
Lemon Slice-2: Interactive talking-video avatars via API/widget give voice agents lifelike faces and expressions, improving customer engagement for support, onboarding, and education experiences.
Qwen-Image-Edit-2511: Wider access on Replicate and TostUI broadens open-source image editing capabilities, enabling low-friction creative workflows and community-driven tooling.

OpenAI GPT-5.2: Bigger context, stronger reasoning, and improved tool use; high-reasoning variants push state-of-the-art performance while a Poetiq pipeline hits 75% on ARC-AGI-2 at modest cost.
GLM-4.7: New open-weight leader delivers multilingual coding, real-time streaming, and 3D object handling with near–real-time latency—rapidly adopted across providers and dev tooling.
MiniMax M2.1: Strong long-horizon reasoning, low latency, and excellent cost-efficiency; now available to BlackboxAI’s 30M developers, expanding access to competitive, affordable coding performance.
Google Gemini 3 Flash: Fastest and most cost-effective Gemini yet targets real-time, high-volume workloads, setting a new price-performance bar for enterprise-grade applications.
SWE 1.5: Best free coding model in community tests, increasingly rivaling paid options—reinforcing open-source viability for production-grade software engineering tasks.
Qwen3 Agentic Search: Dramatic upgrade from 1–2 to 15+ web turns with triple accuracy on Browsecomp-Plus, making web-grounded agents more reliable for complex research tasks.

NitroGen (NVIDIA + Stanford): Generalist game-playing system trained on 40,000 hours across 1,000+ titles; releasing dataset and weights to catalyze open research on multi-game policy learning.
DeepSearchQA (Google): New benchmark stress-tests multi-step web research, pushing methods beyond shallow retrieval and encouraging stronger agents for fact-finding and synthesis.
PoPE + CoT Monitorability (OpenAI): RoPE positional-encoding fix and a framework showing longer reasoning improves transparency, yet bigger models are harder to monitor—guiding safer chain-of-thought use.
Learning from Raw Codebases: Fresh results show agents can acquire robust coding skills directly from uncurated repositories, reducing dependence on expensive curation pipelines.
NVIDIA Isaac Lab: Robots trained entirely in simulation successfully transfer to the real world without real-world data, promising safer, faster iteration for embodied AI.
CAIS Conference: A new venue dedicated to agentic AI systems aims to standardize evaluation, safety practices, and real-world deployment lessons for multi-agent and tool-using systems.

Disney + OpenAI ($1B): Partnership enables AI-generated stories with 200+ Disney characters on OpenAI platforms starting 2026, redefining IP licensing, creator workflows, and fan engagement.
Intel Reticle Breakthrough: A 12x reticle-size advance could upend the NVIDIA/TSMC advantage, enabling larger dies and new packaging strategies that shift compute cost and performance curves.
OpenAI Monetization Shift: Plans to integrate ads and sponsored content into ChatGPT reflect a broader pivot toward sustainable revenue while balancing user trust and privacy expectations.
ServiceNow Buys Armis ($7.75B): ServiceNow integrates Armis’s asset intelligence into the Now Platform, consolidating AI-driven cybersecurity and IT operations for large enterprises.
AI Shopping Agents vs. Amazon: Autonomous shopping assistants could divert $1T in U.S. retail by 2030, pressuring Amazon’s marketplace model and accelerating agent-centric commerce.
China’s AI Supercomputing Agent: A national platform automates complex research workflows, democratizing access to high-end compute and accelerating scientific discovery across domains.

Hugging Face Courses: Free, regularly updated AI curricula with active learner communities help newcomers and professionals master modern tooling and techniques.
Research Roundups: Curated surveys cover multimodal reasoning, large-scale RL, objective LLM assessment, and faster decoding—accelerating literature review for practitioners.
101 Generative Papers Slides: Comprehensive slide decks distill seminal works from fundamentals to applications, offering a structured path for deep generative model study.
LoRA on Qwen Image Edit: Practical guide uses AI Toolkit with a 3-bit accuracy-recovery adapter, enabling low-VRAM finetuning for high-quality image editing tasks.

Gemini 3 Flash Writer Agent: Generates full-length novels in minutes at negligible cost, illustrating how ultra-fast models expand creative automation at scale.
GLM-4.7 + Opencode: Built and validated a fashion website end-to-end, fixing assets in real time—evidence of practical, multi-step autonomy in production-like settings.
LlamaCloud “Santa” Pipeline: Automated ingestion and structured extraction process thousands of wish lists, showcasing dependable data ops and schema enforcement for seasonal workloads.
Local Throughput Milestone: GLM-4.7 reaches 63 tokens/second on an M3 Ultra using batching and tensor parallelism, highlighting the promise of high-performance local inference.
Waymo + Gemini: Waymo pilots Gemini as an in-ride assistant, improving passenger support and personalization as AV services compete on experience, not just autonomy.

Productivity Reality Check: METR finds agents becoming more autonomous, yet developer productivity gains remain hard to measure—suggesting a gap between perceived speedups and durable business outcomes.
Benchmarking Is Broken: Tokenization mismatches, rate limits, and missing parameters skew comparisons; practitioners warn leaderboards overstate progress without robust, scenario-driven evaluations.
Fewer, Better Agents: Google reports well-designed single agents often outperform multi-agent swarms, shifting focus from agent count to coordination quality, tool use, and observability.
Open vs. Closed Trade-offs: Many teams report open models handle substantial real work without quality compromise, pressuring pricing and differentiation for closed providers.
Safety and Governance: Deepfake harms escalate as Google and OpenAI tools face scrutiny; lawsuits by Disney and Universal against Midjourney foreshadow stricter accountability frameworks.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.