📰 AI News Daily — 10 Feb 2026

TL;DR (Top 5 Highlights)

OpenAI starts testing ads in ChatGPT; Anthropic vows to keep Claude ad‑free. OpenAI also denies “Dime” hardware rumors.
GPT‑5.3‑Codex rolls out in Cursor and VS Code; reports suggest a broader GPT‑5.3 model launch is imminent.
Massive breach at Chat & Ask AI exposes 300M messages from 25M users, renewing privacy alarms.
EU pressures Meta to open WhatsApp to third‑party AI services, upping competition and user choice.
ByteDance’s SeeDance 2.0 beta spotlights rapid gains in AI video, fueling a Chinese market rally.

🛠️ New Tools

Box and LangChain launched a document‑intake agent that checks completeness, flags risks, and summarizes next steps, helping enterprises reduce onboarding errors and accelerate compliance-heavy workflows.
deepagents 1.7.3 improves cross‑platform reliability across Linux, BusyBox, macOS, and Windows, making agent deployments more predictable in heterogeneous environments and reducing engineering effort for DevOps teams.
fal released FLUX.2 Klein for real‑time, low‑latency image‑to‑image editing, enabling responsive creative workflows for designers, live content production, and interactive applications without heavy GPU requirements.
OpenEnv from Hugging Face and Meta streamlines building reinforcement‑learning environments for language and vision agents, lowering experimentation friction and standardizing benchmarks for more reproducible research.
LangSmith added instant tracing and debugging across 20+ frameworks and quietly became core infrastructure for several agent SDKs, giving teams unified observability to diagnose failures and improve reliability faster.
Developer tooling saw upgrades: Composer 1.5 scales training 20×; GitHub Copilot CLI adds multi‑model voting; VS Code Insiders improves reliability; Codex Pro subscribers receive another 10–20% speed boost.

🤖 LLM Updates

Anthropic Claude Opus 4.6 jumped to top ranks in Code and Text Arenas; nonprofits get free access. Perplexity switched Deep Research Max to Opus 4.6, improving factuality and coding assistance.
OpenAI GPT‑5.3‑Codex is rolling out in Cursor and VS Code, running 25% faster and trained for vulnerability detection; reports suggest a broader GPT‑5.3 model rollout is imminent.
Kimi K2.5 gained traction on OpenRouter and Qoder, showing strong coding and real‑world task performance, offering developers another competitive option alongside Claude, GPT, and Qwen families.
Arcee Trinity Large (400B MoE, Apache‑2.0) joined OpenRouter’s elite tier, expanding open licensing choices for enterprises seeking high‑end performance without restrictive terms or vendor lock‑in.
GLM‑5 surfaced on GitHub, reportedly scaling to 745B parameters with DeepSeek‑style sparse attention for longer context, signaling another escalation in the model‑size race and renewed interest in efficient attention.
Qwen3‑Coder‑Next and Minimax‑M2.1 landed on Hugging Face endpoints with automatic context handling, simplifying tool adoption and reducing prompt‑engineering overhead for teams building coding assistants.

📑 Research & Papers

Researchers trained diffusion models on a billion LLM activations, suggesting meta‑generative understanding of internal states and opening paths to interpretability, compression, and controllable behavior without expensive end‑to‑end retraining.
Multi‑Head LatentMoE with head parallelism improved GPU utilization and throughput, demonstrating how architectural tweaks can deliver significant cost and latency reductions for large‑scale inference and training workloads.
Google evaluated 180 multi‑agent configurations, finding big gains on parallelizable tasks but slowdowns on strictly sequential ones, guiding teams on when agent swarms help versus hurt real‑world performance.
Security researchers found predictable vulnerabilities in code generated by large models like Claude, highlighting the need for stronger safeguards, automated audits, and secure‑by‑design training as AI coding adoption grows.
Benchmark fragility resurfaced: SWE‑bench scores fell 5% after a formatting tweak; LLMs struggled with the Eleusis “game of science,” while chess‑variant tests revealed narrow, quirky strengths.
Scientists from MIT and Harvard unveiled an AI tool mapping brainstem white‑matter bundles, promising better diagnosis and tracking for Alzheimer’s and Parkinson’s, and setting a standard in medical imaging accuracy.

🏢 Industry & Policy

OpenAI began testing clearly labeled ads in ChatGPT for U.S. free and Go users, emphasizing privacy protections. Anthropic pledged to keep Claude ad‑free. OpenAI also dismissed viral “Dime” hardware rumors.
The European Commission pressed Meta to open WhatsApp to third‑party AI services, warning of penalties. The move aims to spur competition and consumer choice across rapidly AI‑enhancing messaging ecosystems.
A Firebase lapse at Chat & Ask AI exposed 300 million messages from 25 million users, underscoring risks in AI apps and the need for stronger security baselines.
Databricks reported AI agents now build most enterprise databases on its platform, while enterprise spending on Claude coding surged, signaling rapid, practical adoption of autonomous tooling inside large companies.
ACM CAIS partnered with the AI Engineer World’s Fair to co‑feature accepted real‑world systems papers; new peer‑reviewed industry awards debut, with special poster sessions planned for 2026.
ByteDance unveiled SeeDance 2.0 in China, sparking a market rally and showcasing rapidly improving, cinematic‑quality video generation, intensifying global competition in creative AI and tooling for media production.

📚 Tutorials & Guides

A new course on reinforcement learning’s impact explains how RL shapes model behavior, improves reasoning, and changes deployment risks, helping practitioners tune systems and anticipate failures before costly launches.
LangChain published a practical guide for testing LLM applications, covering unit tests, dataset baselines, and regression checks so teams can catch quality drops early and ship with greater confidence.

🎬 Showcases & Demos

Claude Code assembled a ~10,000‑line, locally runnable, agent‑powered video editor in minutes, demonstrating how modern agents can scaffold complex, customizable software with minimal human glue code.
Google’s Perch 2.0 showed striking transfer learning: trained on bird audio, it accurately classifies whale vocalizations. An end‑to‑end bioacoustics demo aims to accelerate marine research and conservation.
fal FLUX.2 Klein delivered ultra‑low‑latency image edits suitable for live creative tooling, enabling interactive art, streaming overlays, and rapid visual iteration without heavyweight infrastructure.
SeeDance v2 wowed early testers with cinematic‑quality video, highlighting how quickly generative video is approaching production‑ready fidelity for advertising, entertainment, and social content creators.
Genspark debuted a Super Bowl ad produced with generative AI starring Matthew Broderick, underscoring mainstream marketing’s embrace of AI‑accelerated production pipelines and celebrity‑driven creative experiments.
AI.com drew viral attention with a Super Bowl spot promoting a personal‑agent platform, signaling intensifying competition around everyday autonomous assistants and renewed consumer curiosity post‑chatbot boom.

💡 Discussions & Ideas

Commentators argued AI progress is accelerating and time horizons compressing, calling for richer world models and recursive architectures to overcome LLM limitations in planning, memory, and real‑world grounding.
Debate sharpened around AI risk and capability claims: critiques of Dario Amodei, Yann LeCun’s warnings against conflating competence with intelligence, and concerns over voice AI’s brittleness tempered hype.
Practitioners explained why RL‑trained reasoning can look strange yet useful, and stressed today’s models don’t self‑improve without costly retraining, guiding realistic expectations for roadmap planning.
Small interface and formatting changes were shown to upend benchmark scores, reinforcing the need for robust evaluation suites, task variation, and continuous testing pipelines in production teams.
Ethical tensions persisted around neighborhood surveillance tools like Ring, a widening China‑West capability gap in advanced ML tasks, and a bifurcation between power users and casuals as agentic tools advance.
Forecasts ranged from blockbuster‑quality workplace video within years to $650B in AI‑infrastructure spend by 2026, constrained by energy and materials. Google clarified taxonomy: Veo (video generators) versus Genie (world models).

Source Credits

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.