📰 AI News Daily — 10 Dec 2025

TL;DR (Top 5 Highlights)

OpenAI, Anthropic, Microsoft, Google back the new Agentic AI Foundation; Anthropic’s MCP donated to the Linux Foundation to standardize safer, interoperable agents.
The Pentagon launches secure GenAI.mil on Google Gemini, bringing generative AI to nearly 3 million personnel for operations, intelligence, and productivity.
India proposes mandatory royalties for AI training on copyrighted content, potentially resetting global norms for creator compensation and model development costs.
OpenAI commits $4.6B for a Sydney GPU supercluster and nationwide upskilling, positioning Australia as a strategic AI infrastructure hub.
The FDA clears AIM-NASH, the first AI tool for MASH liver biopsy analysis, accelerating drug trials and standardizing complex imaging evaluations.

CTGT launched adjustable LLM guardrails, letting teams edit behavior and safety constraints without retraining, reducing iteration cost while tightening policy compliance across deployments.
AWS unveiled a goal-driven agent builder that abstracts orchestration and error handling, helping developers ship reliable multi-step agents faster with production-grade observability.
Google Workspace Studio enables no-code AI automations across Gmail, Drive, and Chat; early users report up to 90% faster drafting, driving bottom-up productivity.
Amazon Autonomous Agents perform long-running tasks without supervision, promising major workflow gains in support, logistics, and operations while raising new governance requirements.
iFixit FixBot debuts as a free repair assistant and mobile app, guiding DIY fixes with step-by-step instructions; a paid tier will later add advanced features and limits.
Marble and EgoEdit expand creative tooling—prompt-to-3D world generation and egocentric streaming/editing—for faster prototyping, immersive content, and real-world automation workflows.

Mistral released new open code models—Devstral 2 (123B) and a 24B variant—plus unrestricted open models with 256K context, improving enterprise-scale coding and long-document understanding.
Zhipu AI GLM-4.6V shipped as an open multimodal model with strong visual reasoning and function calling, enabling developers to integrate vision-language capabilities into practical applications.
Jais 2 (70B), an open-weight Arabic LLM from Abu Dhabi, advances Modern Standard Arabic and dialect support, bolstering regional AI capacity for research and production.
OpenAI is reportedly accelerating GPT-5.2 to improve speed, reasoning, and reliability amid rising competition—signaling a shift toward dependable daily performance over flashy demos.
OpenAI “Confession” adds self-assessment prompts to flag response quality and bias, aiming for more transparent chatbots and improved user trust in high-stakes workflows.
Zhipu AutoGLM enables on-device smartphone control across many apps, promising private, stable assistants for complex mobile tasks without cloud dependency.

Investigations into ARC-AGI contamination show training–evaluation overlap can inflate reported gains, reinforcing the need for rigorous dataset hygiene and independent replication.
Stanford’s 2025 AI Transparency Index finds leading labs becoming less open year over year, complicating auditing, safety research, and public accountability across the ecosystem.
The UK AI Security Institute ran red-vs-blue interpretability exercises, stress-testing detection of malicious behaviors and informing practical standards for model monitoring and red-teaming.
OfficeQA introduces grounded enterprise evaluations, measuring how well models complete realistic office tasks, encouraging benchmarks aligned with day-to-day productivity outcomes.
SAPO proposes more stable reinforcement learning for large and MoE models, reducing training instability and improving policy quality in complex optimization settings.
GRAPE unifies positional encodings across architectures, simplifying design choices while maintaining accuracy, which may streamline model portability and hybrid system research.

Agentic AI Foundation launched with OpenAI, Anthropic, Microsoft, Google and the Linux Foundation; MCP donation and Agent Client Protocol momentum aim to standardize agent interoperability and safety.
India proposes mandatory royalties for training on copyrighted content, potentially establishing a global template for creator compensation and reshaping AI cost structures.
The Pentagon’s GenAI.mil platform, powered by Google Gemini, will deliver secure generative AI to millions, accelerating analysis, planning, and operational workflows across the U.S. military.
The EU opened an antitrust probe into Google over AI use of publisher content, testing whether AI features like Overviews require new licensing and compensation models.
OpenAI will invest $4.6B in an Australian GPU supercluster and upskill 1.2M workers, strengthening regional infrastructure and talent while diversifying global compute supply.
IBM will acquire Confluent for $11B, blending real-time data streaming with enterprise AI stacks to modernize analytics pipelines and competitive cloud offerings at scale.

LangChain compares voice-agent architectures—STT–LLM–TTS “sandwich” vs direct speech-to-speech—explaining latency, controllability, and extensibility trade-offs for production voice experiences.
A comprehensive survey charts the lifecycle of code-focused LLMs—data, training, prompting, tooling, and security—providing architects a blueprint for building safer, robust code assistants.
A Stanford guest lecture demystifies recurring computational motifs inside transformers, offering mental models to reason about performance, generalization, and emergent capabilities.
New “Physics of LMs” installments present reproducible, textbook-style references for principled architecture research, helping practitioners replicate findings and avoid evaluation pitfalls.

Waymo details disciplined, fully autonomous data operations as a template for embodied AI at scale—closed-loop data curation, simulation, and deployment feeding reliable field performance.
Live demos highlight Aria’s conversational music performance with a grand piano and a Figma-to-production-code pipeline, illustrating creative and developer workflows converging with AI.
A head-to-head crowned Kling 2.6 best at subtle facial expressions and skin realism, signaling rapid quality gains in consumer-grade generative video tools.
Qdrant powers semantic search across 100K+ product images, demonstrating practical retrieval at e-commerce scale with improved discovery, faster queries, and reduced manual tagging.

Debate over AGI definitions persists; attendees and commentators urge hands-on engagement with frontier models to tighten claims, reduce hype, and align expectations with measured capability.
“Deep agents” promise long-horizon autonomy, yet multi-step reasoning remains brittle; researchers stress evaluation rigor and reliability before high-stakes deployments.
Methodology under scrutiny: peer review is overloaded and historically fallible; teams advocate stronger baselines, ablations, and open evaluations for durable progress.
Product philosophy shifts toward procedural skills and lightweight tool use over heavy agent stacks, emphasizing predictability, cost control, and auditability in production.
Renewed interest in symbolic–neural hybrids for math, “dark leisure” where workers hide AI-driven productivity, and concerns that photorealism crowds out experimentation in creative communities.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.