📰 AI News Daily — 12 Dec 2025

TL;DR (Top 5 Highlights)

OpenAI ships GPT-5.2 across ChatGPT, API, Copilot, and partners, adding Thinking/Pro/Instant tiers, later cutoff, and big efficiency gains that quickly attracted Perplexity, Cursor, and enterprises.
Disney signs a $1B, three-year deal with OpenAI to generate character-safe Sora content; simultaneously escalates IP tensions with Google over alleged Gemini copyright misuse.
Google launches the Gemini Interactions API, Deep Research agent, and experimental Disco/GenTabs—while Gemini powers the Pentagon’s GenAI.mil rollout to ~3 million users.
Security and governance tighten: the EU probes Google’s AI scraping, 42 state AGs push chatbot oversight, and ~1,000 exposed MCP servers spotlight urgent enterprise hardening needs.
Salesforce buys Informatica for $8B to unify “trusted context” for AI agents; Oracle reports a 438% surge in AI cloud commitments fueled by OpenAI demand.

🛠️ New Tools

Cohere Rerank 4 debuts across API, AWS SageMaker, and Microsoft Foundry, delivering faster, stronger reranking for search/RAG. Better retrieval precision and lower latency translate directly into higher-quality, cheaper production pipelines.
Google Disco/GenTabs turns live browser tabs and chats into instant apps powered by Gemini. It compresses multi-step planning into one surface, boosting personal productivity and lightweight internal tool creation.
Adobe + ChatGPT integrate free Photoshop, Express, and Acrobat actions in chat. Creators edit images and documents conversationally—streamlining workflows across web, desktop, and iOS without switching tools.
UnslothAI releases new training kernels that 3× LLM training speed while cutting VRAM needs. Faster iteration and smaller hardware footprints reduce costs and broaden who can fine-tune models.
CopilotKit useAgent and a dedicated coding Dev Browser make it easy to wire agents into frontends and curb token burn during web automation—improving reliability and operating costs for agentic apps.
SkyPilot ships an enterprise-scale update for massive GPU fleets and multi-cloud orchestration, simplifying cost-aware scheduling, preemption handling, and autoscaling for training and inference at scale.

🤖 LLM Updates

OpenAI GPT-5.2 raises the bar in coding, math, long-context, and agent reliability, with top ARC-AGI-1 and strong SWE-bench results. Cost-effective tiers help, though it trails Opus 4.5/Grok 4 on LisanBench.
Runway Gen-4.5 expands creative/scientific ambitions, signaling native audio and larger roadmaps. NVIDIA infrastructure underpins frontier training, while CoreWeave scales Runway’s training and inference.
Mistral Devstral 2 emerges as a leading open-source coding model, pushing competitive performance while preserving transparency and local control for enterprises wary of proprietary lock-in.
Amazon Nova 2 models target small businesses with stronger reasoning and multimodal features at competitive prices, democratizing automation without heavy ML expertise.
Google Gemini TTS expands to 24 languages with realistic, customizable voices and multi-speaker support—unlocking higher-quality audio for e-learning, audiobooks, and product voice features globally.
Ecosystem momentum: Deepseek v3.2 sets value marks for Chinese tasks; lighter models like Trinity Mini and Rnj-1-Instruct gain traction for cost-sensitive deployments.

📑 Research & Papers

The FACTS Benchmark Suite (with Kaggle) introduces rigorous, cross-modal tests of factual reliability for text, search, and image prompts—improving transparency and standardization for model comparisons.
AI-driven extreme weather modeling now delivers faster, more accurate forecasts, improving disaster preparedness and public safety as climate-change-induced events intensify and response windows shrink.
An AI system flags missed Alzheimer’s diagnoses from medical records, addressing inequities in care and enabling earlier interventions—evidence that practical clinical AI can narrow disparities.
New studies show models accurately diagnose brain tumors non-invasively and predict cardiovascular events in angina patients, pointing to earlier, safer diagnostics and personalized treatment pathways.
Research warns that models trained only on benign data can still harbor covert backdoors, reinforcing the need for robust evaluations, red-teaming, and secure training pipelines.
Genesis Pearl draws attention at NeurIPS for multimodal progress—illustrating rapid advances beyond text and the growing importance of integrated perception and reasoning.

🏢 Industry & Policy

Disney inks a $1B, three-year partnership with OpenAI to bring 200+ characters into Sora video/image generation under Disney guardrails, while issuing a cease-and-desist to Google over Gemini outputs.
The EU opens a fresh antitrust probe into Google’s AI web-scraping and content sourcing practices, signaling potential shifts in compensation, opt-outs, and competitive dynamics for training data.
U.S. government ramps AI: the U.S. DoD deploys Gemini via GenAI.mil to nearly 3 million users; the U.S. DOT modernizes with Salesforce AI and shifts its workforce to Google Workspace with Gemini.
Salesforce acquires Informatica for $8B, aiming to unify fragmented enterprise data into “trusted context” for AI agents—boosting decision quality, compliance, and cross-app automation.
Oracle reports a 438% surge in AI cloud commitments following OpenAI partnerships—evidence that infrastructure providers are capturing outsized demand from frontier model training and inference.
42 state AGs urge stricter chatbot oversight as the Model Context Protocol (MCP) joins the Linux Foundation’s Agentic AI initiative—amid ~1,000 exposed MCP servers and rising demand for validation tools from Vectara and Bigeye.

📚 Tutorials & Guides

New practitioner guides map end-to-end RAG failure modes—covering indexing, filtering, reranking, and grounding—delivering materially higher answer accuracy and stability in production systems.
Methodology primers caution against over-reading leaderboards; experts advocate a single “best score” per model and careful interpretation of semi-private datasets and shifting scoring conventions.
OpenAI launches AI Foundations and ChatGPT Foundations for Teachers via Coursera and employer pilots—credentialing workers and K–12 educators for safer, more effective AI use at scale.

🎬 Showcases & Demos

Starcloud-1 runs Gemma in orbit and beams text from space—a milestone for off-world inference and resilient, low-latency edge compute.
WonderZoom demonstrates multi-scale 3D scene generation, enabling richer worldbuilding for games, film previz, and simulation-heavy workflows.
Meta SAM 3 shows robust object segmentation on noisy dashcam footage, underscoring progress in real-world perception beyond curated datasets.
EMMA highlights unified multimodal generation and editing, pointing to simplified pipelines for creators juggling text, images, and audio.
Waymo robotaxi rides give a glimpse of autonomous mobility at scale, illustrating how safety, coverage, and cost curves are maturing.
A controlled study shows an autonomous agent compromising Stanford systems—spotlighting the dual-use power of agentic AI and the urgency of safeguards.

💡 Discussions & Ideas

ROI of AI-generated code remains contested; despite productivity anecdotes, companies demand measurable impact. Funding follows belief—Port raises $100M for agentic engineering—yet buyers want robust governance and proofs.
Benchmarking norms evolve: emphasis on a single best score per model, quick-check tests (e.g., verified SimpleQA, chess puzzles), and transparency around semi-private datasets to curb leaderboard gaming.
Market dynamics shift: revenue multiples compress faster for model providers than app-layer startups; infrastructure players (NVIDIA, CoreWeave, Oracle) capture momentum as training/inference demand accelerates.
Adoption signals strengthen: a major law firm adopts Perplexity Enterprise for research; Pew finds 30% of U.S. teens use chatbots daily, intensifying safety and mental-health debates.
Efficiency advances—like rapidly training a 140M-parameter model on a single node—hint at widening access as costs fall and smaller teams achieve credible results.

Source Credits

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.