INAI • The Open AI Hub

📰 AI News Daily — 14 Feb 2026

TL;DR (Top 5 Highlights)

MiniMax M2.5 opens weights, tops SWE‑Bench Verified, and runs fast on Apple Silicon—strong, affordable option for coding and agent workflows.
Google’s Gemini 3 Deep Think posts standout reasoning results, targeting complex scientific and engineering use cases.
OpenAI teams with Cerebras on GPT‑5.3 Codex‑Spark, signaling a shift beyond Nvidia and faster agentic coding.
State‑sponsored hackers are weaponizing AI (including Gemini), pushing enterprises to harden defenses and monitor “shadow” agents.
ByteDance’s Seedance 2.0 triggers Hollywood backlash over deepfakes; features restricted amid intensifying IP and safety concerns.

🛠️ New Tools

OpenAI Responses API adds server‑side compaction, shell containers, and an open “Skills” spec for agents—making multi‑million‑token sessions practical and standardizing interoperable, production‑ready agent capabilities.
Cline CLI 2.0 launches a local‑first coding agent runner—no API keys required—supporting parallel agents, headless workflows, and free access to select models, lowering friction for automated software work.
WebMCP debuts a “browser‑as‑API” paradigm so agents can transact across any site without custom UIs, expanding automation from scripted endpoints to the broader interactive web.
Zed Editor introduces a headless mode, letting AI agents perform fully automated coding tasks end‑to‑end—moving closer to unattended, CI‑style AI development loops.
LangChain’s deepagents gain a native “Box” file system, giving agents structured, auditable access to files and tools—improving reproducibility, security, and cross‑task coordination.
AssemblyAI Universal 3 Pro adds prompt‑steerable transcription, letting teams tailor diarization, summarization, and formatting on the fly—accelerating audio analytics and downstream workflows.

🤖 LLM Updates

MiniMax M2.5 opens its weights, climbs to the top of SWE‑Bench Verified, and runs locally on Apple Silicon—bringing high‑quality coding, tool use, and document handling to affordable, on‑device deployments.
Google’s Gemini 3 Deep Think posts strong Codeforces and scientific reasoning scores, targeting complex STEM tasks—promising better accuracy and less hand‑holding for real research and engineering work.
GLM‑5 releases fully open source (744B sparse attention), edges leaders on LiveBench coding/analysis, and demonstrates distributed MLX on Mac Studios—broadening accessible, high‑throughput experimentation.
Mistral’s compact Ministrel 3 vision‑language models use a staged distillation approach to rival larger systems—offering strong multimodal performance at lower cost and latency.
OpenAI and Cerebras unveil GPT‑5.3 Codex‑Spark, optimizing agentic coding with wafer‑scale hardware—reducing training bottlenecks and diversifying compute beyond Nvidia for faster iteration.
OpenAI retires GPT‑4o to prioritize reliability and reduce bias—trading a “friendlier” persona for steadier, more trustworthy answers in enterprise and consumer settings.

📑 Research & Papers

OpenAI’s GPT‑5.2 contributes a novel gluon amplitude formula validated by academic partners—signaling growing AI impact in frontier physics and symbolic reasoning beyond benchmark demos.
QED‑Nano, a 4B theorem prover, matches far larger systems on formal reasoning—showing compact models can achieve strong math performance with careful training and evaluation.
AIME 2026 expands rigorous math evaluation with new problems and formats—raising the bar for measurable long‑term reasoning progress across open and closed models.
The Open Korean Historical Corpus releases 17.7M documents spanning 1,300 years—unlocking large‑scale, non‑English pretraining and cultural research with deep temporal coverage.
CommonLID tests 109 languages on real web text, finding many language‑ID models—LLMs included—struggle in the wild, underscoring the need for stronger multilingual robustness.
AI‑driven food safety tools now detect pathogens in roughly three hours—accelerating recalls and preventing outbreaks, with clear benefits for public health and supply‑chain reliability.

🏢 Industry & Policy

Google reports state‑sponsored groups from China, Russia, Iran, and North Korea are leveraging Gemini for reconnaissance, phishing, and malware—elevating urgency for enterprise hardening and model‑extraction defenses.
The U.S. Department of Defense adopts OpenAI’s ChatGPT on GenAI.mil for wargaming and research at scale—expanding secure, governed AI use across over a million potential users.
ByteDance’s Seedance 2.0 faces MPA accusations of copyright abuse; amid deepfake alarms, features are curtailed—spotlighting intensifying IP battles and demand for stronger safeguards in generative video.
Reports peg Anthropic’s valuation near $380B–$500B, fueling IPO speculation—reflecting investor conviction in top labs while raising questions about sustainability and returns.
Microsoft signals less reliance on OpenAI, investing in proprietary models and “medical super‑intelligence” by 2026—reshaping alliances and competition across foundation and domain‑specific AI.
Top inference providers claim up to 10× cost cuts on Nvidia Blackwell with open models—reshaping the unit economics of real‑time AI and enabling broader, margin‑friendly deployments.

📚 Tutorials & Guides

Efficient fine‑tuning on Apple Silicon: step‑by‑step guides show Qwen3 and Granite training with MLX and LoRA for ultra‑long contexts—achieving strong results without data‑center budgets.
PPO vs DPPO explained: practical walkthroughs highlight stability, sample efficiency, and deployment trade‑offs—helping teams pick the right reinforcement strategy for reasoning and tool‑use gains.
“Context engineering” emerges beyond prompts—covering retrieval, memory, and tool schemas—to make autonomous agents more reliable under real‑world drift, adversarial inputs, and long‑horizon tasks.

🎬 Showcases & Demos

SWE Agent, strengthened by RL and robust harnesses, surpasses rivals on complex repositories—showing practical agent reliability gains, not just leaderboard spikes.
A new humanoid robot hand reaches near‑human dexterity after multiple engineering cycles—demonstrating rapid iteration loops between simulation, hardware, and learning.
Agents transact across the web programmatically (via browser‑as‑API), completing purchases and workflows without bespoke UIs—hinting at an automation layer atop existing websites.
LlamaIndex‑powered analytics deliver automated recommendations in product demos—showing how retrieval and reasoning can directly drive business decisions in dashboards.
Open‑weight models run locally at high speeds on Apple hardware—proving compelling offline performance for privacy‑sensitive, cost‑conscious applications.

💡 Discussions & Ideas

Dario Amodei projects trillions in AI revenue by 2030—arguing massive compute bets carry high risk but outsized upside for economies willing to scale aggressively.
Experts urge governments and nonprofits to fund bold AI “moonshots” in education and journalism—counterbalancing purely commercial incentives with public‑interest outcomes.
Analysts scrutinize Google Gemini 3 risk disclosures and absent system card—calling for clearer transparency as capability jumps accelerate.
Debate grows over agent “economic leaderboards”—are they measuring durable value creation or overfitting to contrived tasks?
Advocates argue open models—though 6–9 months behind—remain essential for reproducible research, safety studies, and broad access.
Interviews with Jeff Dean and platform retrospectives on VS Code outline long‑horizon infra bets and how editor ecosystems became today’s AI development backbone.

Source Credits

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.