📰 AI News Daily — 18 Dec 2025

TL;DR (Top 5 Highlights)

Google launches Gemini 3 Flash globally, making it the default “Fast” experience across Search and Gemini with strong multimodal performance at lower cost and latency.
Amazon is in talks to invest $10B in OpenAI, potentially pairing Trainium chips with Nvidia GPUs and reshaping AI infrastructure competition.
NVIDIA unveils open-source Nemotron 3, pretrained on 3T tokens, accelerating a developer-led cycle of better data and better LLMs.
China reportedly built a prototype EUV lithography machine using ex-ASML expertise, signaling a potential geopolitical shift in advanced chipmaking.
Yann LeCun plans to leave Meta to found a startup focused on world models, raising the stakes in embodied and predictive AI.

Google Opal: Instantly builds AI-powered “mini apps” inside Gemini from natural language and a visual editor, lowering the barrier to internal tools and personal automations.
Google CC: A Gemini-powered assistant that summarizes Gmail, Calendar, and Drive into personalized daily briefings, streamlining planning and reducing context switching for busy professionals.
xAI Grok Voice Agent API: Real-time, multilingual speech agents with tool use and search; early demo on Pollen Robotics’ Reachy Mini shows hands-free, embodied control for robotics and IoT.
TRELLIS.2 on fal: Converts a single image into high-fidelity, PBR-textured 3D assets, speeding game and VFX pipelines while shrinking photogrammetry and manual modeling workloads.
Tencent HY World 1.5 (WorldPlay): Open-source, real-time world modeling stack with streaming video diffusion, enabling persistent environments for agents, simulation, and interactive media.
TurboDiffusion: Video diffusion accelerated 100–200x, cutting iteration time for creators and researchers and enabling near-real-time experimentation with higher-fidelity motion.

Google Gemini 3 Flash: Dominates early text, vision, coding, and tool-use benchmarks while running cheaper and faster; now the default “Fast” mode across Gemini and Search AI Mode worldwide.
NVIDIA Nemotron 3: Open family trained on 3T tokens; Nano variants see rapid adoption on Hugging Face, reinforcing an open ecosystem for fine-tuning, evaluation, and domain specialization.
DeepSeek V3.2 Coding + Caching: Hits Opus 4.5–tier pass@5 at a fraction of the price; smart caching trims average coding task costs to around $0.10, improving viability at scale.
Frontier reasoning: Reports suggest GPT‑5 autonomously proved an IMProofBench math problem, hinting at stronger self-directed reasoning for complex, formal domains.
OpenAI GPT Image 1.5: Delivers 4x faster, more precise generation and editing with improved instruction following, directly challenging Google Gemini in creative workflows and developer APIs.
OpenAI model routing change: Retired router; defaulting to faster GPT‑5.2 Instant. Highlights tension between speed, safety, and consistency in large-scale deployments and developer defaults.

OpenAI FrontierScience: A new benchmark spanning complex bio, chem, and physics tasks pushes beyond standard QA, incentivizing grounded reasoning and tools use for real scientific inquiry.
Google factuality audit: A real-world benchmark finds leading chatbots—including Gemini and ChatGPT—miss about one-third of answers, underscoring brittle factuality and the need for human oversight.
GyroSwin (EU/UK): AI-accelerated plasma turbulence simulations run up to 1,000x faster, enabling faster iteration in fusion reactor design and moving clean energy research closer to practicality.
Shanghai AI Lab’s MemVerse: An open multimodal “hippocampus” for agents improves cross-modal recall and speed, unlocking more reliable, memory-augmented behavior in complex environments.
Patronus AI Generative Simulators: Procedurally create dynamic tasks for continual agent training, shrinking the train–test gap and better preparing systems for messy, real-world workflows.
Stanford NLP at 25: A retrospective maps the field’s evolution from symbolic to neural to agentic systems, guiding future research on robust reasoning, data stewardship, and evaluation.

Amazon x OpenAI ($10B talks): Potential Trainium + Nvidia stack would scale OpenAI’s training capacity and push Amazon deeper into AI infrastructure, intensifying hyperscaler competition.
China’s EUV prototype: Reuters reports a homegrown EUV system leveraged ex-ASML talent. If viable, it could blunt export controls and reconfigure advanced semiconductor supply chains.
Yann LeCun’s startup: The Meta chief scientist plans a new company centered on world models, raising competition in grounded reasoning and embodied intelligence.
OpenAI taps George Osborne: The former UK Chancellor will lead global sovereign AI efforts and the $500B Stargate data center push, signaling deeper government partnerships.
Tencent hires ex-OpenAI scientist: Yao Shunyu will lead a new AI Infrastructure Department, consolidating model and platform efforts as Tencent races for global talent.
US healthcare AI policy: Dozens of 2025 laws target mental health chatbots, transparency, and sandboxing, balancing safety with innovation as clinical AI adoption accelerates.

SkyPilot RL + Search: Walkthrough on training agents to use Google Search with reinforcement learning across multi-cloud setups, emphasizing reliability, cost control, and reproducibility.
dspy‑helm: A practical framework for holistic LLM benchmarking—beyond single metrics—helping teams compare models fairly across tasks, tools use, and prompt robustness.
MCP infrastructure deep dive: Compares internal versus external MCP servers, showcasing speed-ups in fastmcp 2.14 and Remix servers for dependable agentic operations.

Grok Voice Agent + Reachy Mini: Real-time, multilingual voice control with tool-calling demonstrates hands-free robotics, from basic manipulation to web-connected tasks.
Marble → NVIDIA Isaac Sim: Teams generate simulation-ready environments and import at scale, boosting data diversity and speeding robot training cycles.
AI + puppetry filmmaking: Creators blend physical puppets with AI animation for a “real-life Toy Story” short, illustrating hybrid pipelines for fast, cinematic storytelling.
Reachy Mini Lite ships: Early recipients are already experimenting with embodied AI, accelerating community feedback loops and practical, on-device agent testing.

LLMs as judges: Faster evaluations are promising but require bias calibration and transparent rubrics to avoid amplifying model quirks in safety and quality assessments.
RL fragility: Many reinforcement learning stacks remain unstable and compute-hungry; researchers call for standardization, stronger baselines, and more sample-efficient objectives.
VLA demo realism: Polished video–language agent demos often mask heavy orchestration; practitioners emphasize data plumbing, memory, and latency engineering over model swaps.
PDR vs. long chains: Parallelized, refined reasoning beats sprawling chains of thought in reliability and cost, suggesting a pragmatic path to stronger tool-augmented reasoning.
Enterprise bottleneck = culture: Adoption lags less from model limits than organizational inertia; clear ownership, data readiness, and incentive design are emerging success factors.
Strategy watch: US power constraints for AI may ease by 2030; labs pursue exclusive domain data; even failed startups monetize codebases as synthetic training corpora.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.