📰 AI News Daily — 14 Feb 2026
TL;DR (Top 5 Highlights)
- MiniMax M2.5 opens weights, tops SWE‑Bench Verified, and runs fast on Apple Silicon—strong, affordable option for coding and agent workflows.
- Google’s Gemini 3 Deep Think posts standout reasoning results, targeting complex scientific and engineering use cases.
- OpenAI teams with Cerebras on GPT‑5.3 Codex‑Spark, signaling a shift beyond Nvidia and faster agentic coding.
- State‑sponsored hackers are weaponizing AI (including Gemini), pushing enterprises to harden defenses and monitor “shadow” agents.
- ByteDance’s Seedance 2.0 triggers Hollywood backlash over deepfakes; features restricted amid intensifying IP and safety concerns.
🛠️ New Tools
- OpenAI Responses API adds server‑side compaction, shell containers, and an open “Skills” spec for agents—making multi‑million‑token sessions practical and standardizing interoperable, production‑ready agent capabilities.
- Cline CLI 2.0 launches a local‑first coding agent runner—no API keys required—supporting parallel agents, headless workflows, and free access to select models, lowering friction for automated software work.
- WebMCP debuts a “browser‑as‑API” paradigm so agents can transact across any site without custom UIs, expanding automation from scripted endpoints to the broader interactive web.
- Zed Editor introduces a headless mode, letting AI agents perform fully automated coding tasks end‑to‑end—moving closer to unattended, CI‑style AI development loops.
- LangChain’s deepagents gain a native “Box” file system, giving agents structured, auditable access to files and tools—improving reproducibility, security, and cross‑task coordination.
- AssemblyAI Universal 3 Pro adds prompt‑steerable transcription, letting teams tailor diarization, summarization, and formatting on the fly—accelerating audio analytics and downstream workflows.
🤖 LLM Updates
- MiniMax M2.5 opens its weights, climbs to the top of SWE‑Bench Verified, and runs locally on Apple Silicon—bringing high‑quality coding, tool use, and document handling to affordable, on‑device deployments.
- Google’s Gemini 3 Deep Think posts strong Codeforces and scientific reasoning scores, targeting complex STEM tasks—promising better accuracy and less hand‑holding for real research and engineering work.
- GLM‑5 releases fully open source (744B sparse attention), edges leaders on LiveBench coding/analysis, and demonstrates distributed MLX on Mac Studios—broadening accessible, high‑throughput experimentation.
- Mistral’s compact Ministrel 3 vision‑language models use a staged distillation approach to rival larger systems—offering strong multimodal performance at lower cost and latency.
- OpenAI and Cerebras unveil GPT‑5.3 Codex‑Spark, optimizing agentic coding with wafer‑scale hardware—reducing training bottlenecks and diversifying compute beyond Nvidia for faster iteration.
- OpenAI retires GPT‑4o to prioritize reliability and reduce bias—trading a “friendlier” persona for steadier, more trustworthy answers in enterprise and consumer settings.
📑 Research & Papers
- OpenAI’s GPT‑5.2 contributes a novel gluon amplitude formula validated by academic partners—signaling growing AI impact in frontier physics and symbolic reasoning beyond benchmark demos.
- QED‑Nano, a 4B theorem prover, matches far larger systems on formal reasoning—showing compact models can achieve strong math performance with careful training and evaluation.
- AIME 2026 expands rigorous math evaluation with new problems and formats—raising the bar for measurable long‑term reasoning progress across open and closed models.
- The Open Korean Historical Corpus releases 17.7M documents spanning 1,300 years—unlocking large‑scale, non‑English pretraining and cultural research with deep temporal coverage.
- CommonLID tests 109 languages on real web text, finding many language‑ID models—LLMs included—struggle in the wild, underscoring the need for stronger multilingual robustness.
- AI‑driven food safety tools now detect pathogens in roughly three hours—accelerating recalls and preventing outbreaks, with clear benefits for public health and supply‑chain reliability.
🏢 Industry & Policy
- Google reports state‑sponsored groups from China, Russia, Iran, and North Korea are leveraging Gemini for reconnaissance, phishing, and malware—elevating urgency for enterprise hardening and model‑extraction defenses.
- The U.S. Department of Defense adopts OpenAI’s ChatGPT on GenAI.mil for wargaming and research at scale—expanding secure, governed AI use across over a million potential users.
- ByteDance’s Seedance 2.0 faces MPA accusations of copyright abuse; amid deepfake alarms, features are curtailed—spotlighting intensifying IP battles and demand for stronger safeguards in generative video.
- Reports peg Anthropic’s valuation near $380B–$500B, fueling IPO speculation—reflecting investor conviction in top labs while raising questions about sustainability and returns.
- Microsoft signals less reliance on OpenAI, investing in proprietary models and “medical super‑intelligence” by 2026—reshaping alliances and competition across foundation and domain‑specific AI.
- Top inference providers claim up to 10× cost cuts on Nvidia Blackwell with open models—reshaping the unit economics of real‑time AI and enabling broader, margin‑friendly deployments.
📚 Tutorials & Guides
- Efficient fine‑tuning on Apple Silicon: step‑by‑step guides show Qwen3 and Granite training with MLX and LoRA for ultra‑long contexts—achieving strong results without data‑center budgets.
- PPO vs DPPO explained: practical walkthroughs highlight stability, sample efficiency, and deployment trade‑offs—helping teams pick the right reinforcement strategy for reasoning and tool‑use gains.
- “Context engineering” emerges beyond prompts—covering retrieval, memory, and tool schemas—to make autonomous agents more reliable under real‑world drift, adversarial inputs, and long‑horizon tasks.
🎬 Showcases & Demos
- SWE Agent, strengthened by RL and robust harnesses, surpasses rivals on complex repositories—showing practical agent reliability gains, not just leaderboard spikes.
- A new humanoid robot hand reaches near‑human dexterity after multiple engineering cycles—demonstrating rapid iteration loops between simulation, hardware, and learning.
- Agents transact across the web programmatically (via browser‑as‑API), completing purchases and workflows without bespoke UIs—hinting at an automation layer atop existing websites.
- LlamaIndex‑powered analytics deliver automated recommendations in product demos—showing how retrieval and reasoning can directly drive business decisions in dashboards.
- Open‑weight models run locally at high speeds on Apple hardware—proving compelling offline performance for privacy‑sensitive, cost‑conscious applications.
💡 Discussions & Ideas
- Dario Amodei projects trillions in AI revenue by 2030—arguing massive compute bets carry high risk but outsized upside for economies willing to scale aggressively.
- Experts urge governments and nonprofits to fund bold AI “moonshots” in education and journalism—counterbalancing purely commercial incentives with public‑interest outcomes.
- Analysts scrutinize Google Gemini 3 risk disclosures and absent system card—calling for clearer transparency as capability jumps accelerate.
- Debate grows over agent “economic leaderboards”—are they measuring durable value creation or overfitting to contrived tasks?
- Advocates argue open models—though 6–9 months behind—remain essential for reproducible research, safety studies, and broad access.
- Interviews with Jeff Dean and platform retrospectives on VS Code outline long‑horizon infra bets and how editor ecosystems became today’s AI development backbone.
Source Credits
Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.