đ° AI News Daily â 12 Feb 2026
TL;DR (Top 5 Highlights)
- Zhipuâs GLM-5 open-weight MoE lands top rankings with low hallucinations, 200k context, permissive licensing, and aggressive pricingâchallenging closed models on capability and cost.
- OpenAIâs ads in ChatGPT spark backlash and resignations; Anthropic counters with adâfree Claude upgrades, intensifying the race for privacyâconscious consumers and enterprises.
- Coinbase launches agentânative wallets so autonomous agents can securely spend, earn, and tradeâjust as regulators rush to modernize payment rules for AIâdriven transactions.
- NVIDIA unveils KVTC, compressing LLM KV caches by up to 20xâcutting memory, boosting throughput, and enabling cheaper, largerâcontext inference at scale.
- OpenAI teams with Foxconn on U.S. AI hardware while SK Hynix benefits from surging dataâcenter demandâsignaling a rapid buildout of AI infrastructure.
đ ď¸ New Tools
- Coinbase â AgentâNative Wallets: New wallets let autonomous agents securely transact, enabling handsâfree commerce, payouts, and DeFi operations. It lowers friction for real agent workflows while centralizing permissions and risk controls.
- AgentSkiller: A DAGâbased framework to synthesize reliable, multiâturn training conversations at scale. It improves agent robustness and coverage without costly human data collection.
- MiniâSWEâAgent 2.0: Distills powerful coding agents into ~100 lines while nearing stateâofâtheâart results. Delivers practical, reproducible baselines teams can extend with minimal overhead.
- OpenAI â Responses API Upgrades: Adds pluggable âskills,â serverâside compression, and controlled browsing for longârunning agents. Developers gain safer, cheaper, more capable autonomy with fewer moving parts.
- OpenAI â ChatGPT Document Viewer: Deep Research mode now previews, navigates, and exports reports inâapp. Knowledge workers get faster synthesis and traceability, reducing context switching across tools.
- Astrix Security â OpenClaw Scanner: Free scanner finds misconfigured AI agents and risky permissions. Helps security teams rein in agent sprawl and protect sensitive data before breaches occur.
đ¤ LLM Updates
- Zhipu GLMâ5 (MoE 744B, ~40B active): Trained on 28.5T tokens with DeepSeekâstyle attention, it posts strong longâcontext, low hallucinations, robust coding, permissive licensing, and instant vLLM support.
- Qwen3âCoderâNext (80B) & MiniMax 2.5: New Chinese models accelerate agentic coding throughput and close gaps with closed systemsâexpanding highâperformance, openâweight options for developers.
- NVIDIA KVTC: Up to 20x KVâcache compression slashes memory footprints and cost, unlocking longer contexts and denser batching for productionâscale inference.
- Prism & Together Research: Trainingâfree spectral blockâsparse prefill (up to 5.1x at 128k) and cacheâaware prefill/decode disaggregation (â40% throughput) improve efficiency without retraining.
- UnslothAI: Reports 12x faster fineâtuning with 35% less VRAM. Makes iterative domain adaptation feasible on modest hardware and tighter budgets.
- Anthropic Claude Opus 4.6: Independently rated top for agentic coding and adds sabotageârisk checks. Arena voting favors fast âgoodâenoughâ outputs, boosting Grok Code Fast and Gemini 3 Flash.
đ Research & Papers
- Google DeepMind â Gemini Research Wins: Gemini helped tackle 18 complex problems across algorithms, ML, economics, and astrophysicsâevidence of growing AI utility in frontier science.
- DeepMind Aletheia & Gemini Deep Think: New math/science evaluations push from Olympiad to researchâgrade tasks; Aletheia set a 91.9% record on IMOâProofbench Advanced, raising the evaluation bar.
- Stanford/UT Austin/Harvard: Released unpublished, researchâlevel math problems for uncompromised testing. Shields evaluations from contamination and leaderboard gaming.
- MIT â Fragile LLM Rankings: Study shows popular leaderboards can be manipulated by small interaction sets, risking poor vendor choices. Calls for rigorous, tamperâresistant evaluation.
- Observational Memory for Agents: New memory technique cuts longâcontext agent costs by ~90% while improving task persistence, enabling sustained autonomy in real workflows.
- AI Digital Stethoscopes (McGill & partners): Earlier, more accurate TB detection at scaleâpromising impact for highâburden regions and global health screening programs.
đ˘ Industry & Policy
- Anthropic â $3M Open Benchmarks Grants: Funding aims to close evaluation gaps across safety, robustness, and agent reliability. Better measurement means safer deployments and cleaner vendor selection.
- OpenAI â Ads Backlash & Resignations: Ads on ChatGPT Free/Go and a retailer pilot with WilliamsâSonoma ignite privacy and UX concernsâhighlighting tension between monetization and trust.
- OpenAI x Foxconn & SK Hynix: U.S. hardware coâdevelopment and a memory supply surge underscore an AI infrastructure boom, localizing manufacturing and easing bottlenecks.
- FDA vs. Modernaâs AI Flu Vaccine: Reported override of internal reviewers to block the application spotlights rising regulatory friction around AIâassisted drug development and evidence standards.
- OpenAI â Pentagon Access via GenAI.mil: Nonâclassified use approved with standard controls. The âall lawful usesâ clause stirs debate on oversight, reliability, and ethical guardrails.
- Global Coalition vs. Nudification Tools: The European Commission, Interpol, and NGOs urge a worldwide ban, prioritizing child safety and consentâpressuring platforms to harden content policies.
đ Tutorials & Guides
- Build a GPT in 243 Lines (Python): A compact educational walkthrough demystifies core transformer componentsâideal for learners seeking a practical, endâtoâend understanding.
- NVIDIA â Dynamo Playlist (16 Sessions): Expert talks on MoE, KVâaware routing, and multimodal serving. A productionâfocused blueprint for scaling inference efficiently.
- Agentic Engine Optimization (AEO): Marketers are tuning sites and apps for AI assistants and voice searchâfutureâproofing discoverability as conversational interfaces mediate traffic.
- Weekly Research Roundups: Handsâon resources for RL task synthesis, adaptive environments, and textâfeedback training, helping teams prototype emerging techniques quickly.
đŹ Showcases & Demos
- Google Gemini â RealâWorld Visual Reasoning: Correctly identified the location and PostâImpressionist style of a decadesâold family paintingâevidence of steady multimodal gains on authentic artifacts.
- ai.com (Super Bowl Spotlight): Highâprofile debut of personal AI agents for shopping, email, and crypto tasks. Brings agentic automation to mainstream audiences and commerce.
đĄ Discussions & Ideas
- Anthropic & Practitioners: Shift from single to hierarchical multiâagent systemsâreserve LLM reasoning for hard steps while using deterministic pipelines for reliability and cost control.
- Memory as Autonomyâs Backbone: Researchers explore agents that learn their own memory mechanisms, enabling persistence, preference modeling, and longâhorizon task success.
- Ads, Data, and Incentives: The OpenAI ad rollout renews concerns over data protection and safety, spotlighting how business models shape user trust and product design.
- Science Needs Domain Grounding: LLMs wonât advance research alone; curated domain data and specialized models remain essential for credible findings and reproducibility.
- Platforms over Point Models: Teams prioritize integrated, endâtoâend systems. The future: users describe behavior while code becomes invisibleâshifting value from models to orchestration.
- Open Source Collaboration Norms: Coding agents should empower human maintainers, follow OSS conventions, and accept that the final 10% of quality still relies on judgment and tooling.
Source Credits
Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.