📰 AI News Daily — 12 Feb 2026

TL;DR (Top 5 Highlights)

Zhipu’s GLM-5 open-weight MoE lands top rankings with low hallucinations, 200k context, permissive licensing, and aggressive pricing—challenging closed models on capability and cost.
OpenAI’s ads in ChatGPT spark backlash and resignations; Anthropic counters with ad‑free Claude upgrades, intensifying the race for privacy‑conscious consumers and enterprises.
Coinbase launches agent‑native wallets so autonomous agents can securely spend, earn, and trade—just as regulators rush to modernize payment rules for AI‑driven transactions.
NVIDIA unveils KVTC, compressing LLM KV caches by up to 20x—cutting memory, boosting throughput, and enabling cheaper, larger‑context inference at scale.
OpenAI teams with Foxconn on U.S. AI hardware while SK Hynix benefits from surging data‑center demand—signaling a rapid buildout of AI infrastructure.

🛠️ New Tools

Coinbase — Agent‑Native Wallets: New wallets let autonomous agents securely transact, enabling hands‑free commerce, payouts, and DeFi operations. It lowers friction for real agent workflows while centralizing permissions and risk controls.
AgentSkiller: A DAG‑based framework to synthesize reliable, multi‑turn training conversations at scale. It improves agent robustness and coverage without costly human data collection.
Mini‑SWE‑Agent 2.0: Distills powerful coding agents into ~100 lines while nearing state‑of‑the‑art results. Delivers practical, reproducible baselines teams can extend with minimal overhead.
OpenAI — Responses API Upgrades: Adds pluggable “skills,” server‑side compression, and controlled browsing for long‑running agents. Developers gain safer, cheaper, more capable autonomy with fewer moving parts.
OpenAI — ChatGPT Document Viewer: Deep Research mode now previews, navigates, and exports reports in‑app. Knowledge workers get faster synthesis and traceability, reducing context switching across tools.
Astrix Security — OpenClaw Scanner: Free scanner finds misconfigured AI agents and risky permissions. Helps security teams rein in agent sprawl and protect sensitive data before breaches occur.

🤖 LLM Updates

Zhipu GLM‑5 (MoE 744B, ~40B active): Trained on 28.5T tokens with DeepSeek‑style attention, it posts strong long‑context, low hallucinations, robust coding, permissive licensing, and instant vLLM support.
Qwen3‑Coder‑Next (80B) & MiniMax 2.5: New Chinese models accelerate agentic coding throughput and close gaps with closed systems—expanding high‑performance, open‑weight options for developers.
NVIDIA KVTC: Up to 20x KV‑cache compression slashes memory footprints and cost, unlocking longer contexts and denser batching for production‑scale inference.
Prism & Together Research: Training‑free spectral block‑sparse prefill (up to 5.1x at 128k) and cache‑aware prefill/decode disaggregation (≈40% throughput) improve efficiency without retraining.
UnslothAI: Reports 12x faster fine‑tuning with 35% less VRAM. Makes iterative domain adaptation feasible on modest hardware and tighter budgets.
Anthropic Claude Opus 4.6: Independently rated top for agentic coding and adds sabotage‑risk checks. Arena voting favors fast “good‑enough” outputs, boosting Grok Code Fast and Gemini 3 Flash.

📑 Research & Papers

Google DeepMind — Gemini Research Wins: Gemini helped tackle 18 complex problems across algorithms, ML, economics, and astrophysics—evidence of growing AI utility in frontier science.
DeepMind Aletheia & Gemini Deep Think: New math/science evaluations push from Olympiad to research‑grade tasks; Aletheia set a 91.9% record on IMO‑Proofbench Advanced, raising the evaluation bar.
Stanford/UT Austin/Harvard: Released unpublished, research‑level math problems for uncompromised testing. Shields evaluations from contamination and leaderboard gaming.
MIT — Fragile LLM Rankings: Study shows popular leaderboards can be manipulated by small interaction sets, risking poor vendor choices. Calls for rigorous, tamper‑resistant evaluation.
Observational Memory for Agents: New memory technique cuts long‑context agent costs by ~90% while improving task persistence, enabling sustained autonomy in real workflows.
AI Digital Stethoscopes (McGill & partners): Earlier, more accurate TB detection at scale—promising impact for high‑burden regions and global health screening programs.

🏢 Industry & Policy

Anthropic — $3M Open Benchmarks Grants: Funding aims to close evaluation gaps across safety, robustness, and agent reliability. Better measurement means safer deployments and cleaner vendor selection.
OpenAI — Ads Backlash & Resignations: Ads on ChatGPT Free/Go and a retailer pilot with Williams‑Sonoma ignite privacy and UX concerns—highlighting tension between monetization and trust.
OpenAI x Foxconn & SK Hynix: U.S. hardware co‑development and a memory supply surge underscore an AI infrastructure boom, localizing manufacturing and easing bottlenecks.
FDA vs. Moderna’s AI Flu Vaccine: Reported override of internal reviewers to block the application spotlights rising regulatory friction around AI‑assisted drug development and evidence standards.
OpenAI — Pentagon Access via GenAI.mil: Non‑classified use approved with standard controls. The “all lawful uses” clause stirs debate on oversight, reliability, and ethical guardrails.
Global Coalition vs. Nudification Tools: The European Commission, Interpol, and NGOs urge a worldwide ban, prioritizing child safety and consent—pressuring platforms to harden content policies.

📚 Tutorials & Guides

Build a GPT in 243 Lines (Python): A compact educational walkthrough demystifies core transformer components—ideal for learners seeking a practical, end‑to‑end understanding.
NVIDIA — Dynamo Playlist (16 Sessions): Expert talks on MoE, KV‑aware routing, and multimodal serving. A production‑focused blueprint for scaling inference efficiently.
Agentic Engine Optimization (AEO): Marketers are tuning sites and apps for AI assistants and voice search—future‑proofing discoverability as conversational interfaces mediate traffic.
Weekly Research Roundups: Hands‑on resources for RL task synthesis, adaptive environments, and text‑feedback training, helping teams prototype emerging techniques quickly.

🎬 Showcases & Demos

Google Gemini — Real‑World Visual Reasoning: Correctly identified the location and Post‑Impressionist style of a decades‑old family painting—evidence of steady multimodal gains on authentic artifacts.
ai.com (Super Bowl Spotlight): High‑profile debut of personal AI agents for shopping, email, and crypto tasks. Brings agentic automation to mainstream audiences and commerce.

💡 Discussions & Ideas

Anthropic & Practitioners: Shift from single to hierarchical multi‑agent systems—reserve LLM reasoning for hard steps while using deterministic pipelines for reliability and cost control.
Memory as Autonomy’s Backbone: Researchers explore agents that learn their own memory mechanisms, enabling persistence, preference modeling, and long‑horizon task success.
Ads, Data, and Incentives: The OpenAI ad rollout renews concerns over data protection and safety, spotlighting how business models shape user trust and product design.
Science Needs Domain Grounding: LLMs won’t advance research alone; curated domain data and specialized models remain essential for credible findings and reproducibility.
Platforms over Point Models: Teams prioritize integrated, end‑to‑end systems. The future: users describe behavior while code becomes invisible—shifting value from models to orchestration.
Open Source Collaboration Norms: Coding agents should empower human maintainers, follow OSS conventions, and accept that the final 10% of quality still relies on judgment and tooling.

Source Credits

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.