📰 AI News Daily — 12 Jan 2026

TL;DR (Top 5 Highlights)

Google launches a Universal Commerce Protocol with major retailers, setting open standards for agent-driven shopping and payments.
GPT‑5.2 reportedly produced a proof of an Erdős problem accepted by Terence Tao, intensifying debate over AI’s role in frontier math.
U.S. government’s Intel stake rises to $18B amid chip rally, underscoring massive public investment in AI infrastructure.
Netflix boosts recommendations by scaling models from 50M to 1B parameters with tailored scaling laws and alignment.
Tsinghua unveils a breakthrough shortest‑path algorithm, surpassing a decades‑old benchmark with implications for logistics and world modeling.

Nanobot (MCP-based) — Independent platform to build and embed LLM agents with unified context and memory across apps. Streamlines stateful agent deployment and reduces integration overhead for multi-surface experiences.
Nanocode Claude Loop — A minimal, dependency‑free Claude agent loop (~250 lines) for rapid experimentation. Lowers complexity for prototyping agent behaviors and testing control flows.
Dolphin — Document AI that converts PDFs/images into structured Markdown or JSON, reconstructing layout, reading order, tables, and formulas. Speeds reliable data extraction for reporting and analytics.
WARP — Rust-based multi‑vector search engine with Python bindings, claiming up to 10× faster performance on large-scale workloads. Cuts retrieval latency and compute cost for production RAG systems.
JupyVibe — Specialized AI agents inside Jupyter notebooks to plan code and organize research. Keeps experimentation, documentation, and iteration tightly integrated for data teams.
Mistral Vibe — Hackable framework with consistent APIs and simple Python/uv packaging to swap any LLM. Encourages fast, flexible model‑mixing without bespoke glue code.

GPT‑5.2 — Claimed to produce a proof of Erdős Problem #397 reportedly accepted by Terence Tao. If validated, it marks a step‑change in AI‑assisted mathematical discovery and verification.
Claude (Media Orchestration) — Expands beyond chat/code to coordinate image, video, and audio tools from a single prompt. Enables long‑form, multi‑tool storytelling and production pipelines.
Claude Code 4.5 + LangSmith — Adds automation for complex scientific writing and intricate coding; one‑minute LangSmith integration for turnkey monitoring and traces. Improves observability and reliability.
Ollama + Apple MLX — Adds image generation locally on Mac without a GPU. Makes creative workflows accessible on‑device, improving privacy and portability for makers.
FineTranslations — Releases 1T+ English‑aligned tokens (derived from FineWeb2 via Gemma3 27B). Accelerates multilingual training and improves cross‑lingual generalization for global applications.

MIT — Recursive Language Models — Proposes architectures targeting ~100× longer inputs via recursion. Promises scalable context handling for codebases, research corpora, and multi‑document reasoning.
Sakana AI — FwPKM — Fast‑weight Product Key Memory augments transformers for durable long-term memory and improved reasoning beyond standard attention. Offers a path to more sample‑efficient thinking.
Netflix Scaling Laws — Tailored scaling and alignment took generative recommenders from 50M to 1B parameters with major performance gains. Guides practical, cost‑aware scaling of production models.
Tsinghua Shortest‑Path Breakthrough — Surpasses a decades‑old benchmark, improving pathfinding efficiency. Impacts logistics, simulation, and world modeling where graph computations dominate.
SWE‑EVO Benchmark — Evaluates coding agents on software evolution tasks, not just single fixes. Better reflects real engineering dynamics like maintenance, refactors, and regressions.
Deep Delta Learning — Delta Operator — Introduces a training operator for deeper, more stable networks. Could unlock improved depth without prohibitive optimization headaches.

Google, Mastercard, Visa + Retailers — Universal Commerce Protocol powers agent-driven shopping and secure payments with backing from Shopify, Etsy, Wayfair, Target, Walmart. Standardizes agent commerce and reshapes merchant–customer relationships.
Lawmakers vs. Grok — U.S. senators urge Apple/Google to remove Grok after generating sexualized images of women and minors; French authorities investigating. Raises accountability stakes for app stores and AI safety.
OpenAI + Common Sense Media — Co-developing a kids’ online safety bill for stronger protections and age verification. Could set new compliance baselines for platforms and AI apps.
U.S. Intel Stake — Government’s bet on Intel grows to $18B as chip stocks surge. Signals sustained public backing for domestic AI compute and semiconductor capacity.
OpenAI + SoftBank — Investing $1B in SB Energy to build renewable‑powered, multi‑gigawatt U.S. data centers. Aligns AI growth with sustainability and grid‑scale capacity.
Alphabet, Apple, and Gemini — Alphabet overtakes Apple in market value amid AI momentum; Apple to tap Google Gemini for a next‑gen Siri. AI alliances are reshaping Big Tech’s competitive order.

Stanford CS336 Review — Practical engineering lessons for building and scaling LLMs, from data curation to deployment. A strong primer for practitioners moving beyond toy examples.
Google Prompting Study — Simply repeating a prompt can materially improve accuracy without adding tokens or latency. A zero‑cost tactic to lift baseline performance quickly.
LLM + Knowledge Graphs Survey — Comprehensive guide to extraction, fusion, and reasoning with LLMs. Bridges classic KG techniques with end‑to‑end, modern language model pipelines.

Kling VIDEO 2.6 — Sharp action, realistic motion, expressive gestures, and a one‑photo dance feature, plus a public challenge. Sets a new bar for consumer‑grade video generation fidelity.
Claude Video Orchestration — Live demos composing multi‑tool video projects from natural prompts. Highlights rapid progress in tool‑use and complex media pipelines.
Grokipedia — Generated a 10,000‑word AI biography with strong detail but a fabricated personal claim. Demonstrates long‑form power—and lingering reliability risks.
“Lily” (AI Film Award) — Creative AI earns mainstream recognition as “Lily” wins a prominent award backed by Google Gemini and the 1Billion Summit, signaling new creative norms.

Agent Observability — Rich traces and metrics are seen as keys to reliability, autonomy, and context engineering—enabling self‑improving feedback loops across runs.
Agent Foundations — Debates stress robust file systems and data planes as core infrastructure for capable, tool‑using agents.
Open Source Realities — Builders argue open source remains hard but essential, with growing collaboration from major companies improving sustainability and trust.
Prompt2Model’s Evolution — As a research agent, its action abstractions illustrate how good interfaces magnify scientific throughput.
Causal Interpretability — Some argue overdetermination may be inherent to these systems, reframing expectations of mechanistic explanations.
Future of Work — Commentators foresee a bifurcation between creators and regulators as automation absorbs routine tasks.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.