📰 AI News Daily — 18 Feb 2026

TL;DR (Top 5 Highlights)

Anthropic rolls out Claude Sonnet 4.6 with better coding, visuals, and 1M-token beta context for enterprises.
NVIDIA’s GB300/Blackwell Ultra cuts agent inference costs by up to 35x, unlocking cheaper, faster deployments.
ServiceNow and OpenAI sign multi-year pact to embed advanced LLMs across enterprise workflows.
European Parliament disables built-in AI tools on lawmaker devices over data privacy and sovereignty concerns.
OpenAI debuts Frontier, an OS to govern, integrate, and scale enterprise AI agents.

Dreamer launches in beta as a discovery-and-building hub for agentic apps, bundling templates, sandboxes, and observability—helping teams prototype reliable agents and reach production faster.
Kaizen relaunches as a continuously learning “digital employee” that automates repetitive knowledge work, promising measurable time savings and smoother handoffs across support, ops, and back-office workflows.
Manus brings customizable personal AI agents to Telegram (and soon WhatsApp/LINE/Slack), handling booking, data processing, and retrieval—pushing practical assistants into everyday chat experiences.
LlamaExtract turns massive PDFs into skimmable, citation-linked summaries, giving legal, policy, and research teams faster document triage with traceability for audits and compliance.
Recraft V4 and BitDance 14B expand creative pipelines with photorealistic, brand-ready imagery and ultra-fast autoregressive generation—empowering designers to iterate campaigns and product visuals at scale.
PicoClaw and nanobot deliver ultralight agent frameworks for edge devices, enabling autonomous behaviors on minimal hardware—useful for IoT, robotics, and offline assistant scenarios.

Anthropic — Claude Sonnet 4.6 ships broadly with stronger coding, computer-use, long-context performance (beta 1M tokens), and better visuals—positioning it near top-tier proprietary models for enterprise workloads.
Alibaba — Qwen 3.5 advances open-weight multimodal performance, adds day‑zero AMD GPU support, and narrows gaps with elite systems—expanding affordable, language-rich, agentic options for global developers.
Cohere — Tiny Aya (3.35B) brings multilingual generation and translation to phones and laptops, enabling private, on-device AI across 70+ languages for global users and privacy-sensitive deployments.
GLM‑5 posts strong open-source benchmark results, topping SimpleBench and tying records on WeirdML—showcasing rapid gains while still trailing frontier closed models on some complex tasks.
OpenAI — GPT‑5.3‑Codex debuts as a faster, self-debugging coding model; early comparisons reveal distinctive problem-solving styles versus Cerebras-backed variants—useful signal for teams tuning dev pipelines.
QwenASR delivers low error rates across multiple languages, improving speech-to-text foundations for assistants, transcription services, and multilingual customer support.

Probe‑based reward training reduces model hallucinations by penalizing unsupported claims, improving factual reliability—valuable for high-stakes domains like healthcare, finance, and legal analysis.
VLAW (vision‑language‑action) training yields better alignment between perception and control, improving embodied agents’ real‑world task execution—promising safer, more capable robots and assistants.
Agent memory management insights highlight strategies for pruning, retrieval, and summarization—cutting context costs while preserving accuracy, a key lever for scalable, long‑running agent systems.
MapTrace and a separate 2M robotics navigation Q&A dataset unlock richer spatial reasoning and planning—accelerating research on mapping, pathfinding, and embodied decision-making.
EvaluatingEval — “Every Eval Ever” proposes a public standard to share benchmarks and metadata, improving reproducibility, comparability, and transparency across rapidly evolving LLM evaluation suites.
OC‑PAM enables non-invasive, high‑resolution tracking of cancer organoid drug responses, accelerating discovery and personalization—illustrating AI’s growing impact in translational biomedical research.

ServiceNow + OpenAI sign a multi‑year partnership to infuse OpenAI models into enterprise workflows—promising smarter automation, faster resolutions, and better employee experiences at global scale.
NVIDIA — GB300/Blackwell Ultra slashes low‑latency agent inference costs by up to 35x and boosts throughput—lowering barriers for real‑time assistants, coding copilots, and interactive AI.
European Parliament disables built‑in AI features on lawmakers’ devices over data security risks, signaling stricter scrutiny of cloud assistants in sensitive government environments.
OpenAI — Frontier launches as an enterprise OS for agent governance, integrations, and learning—helping organizations standardize deployment, policy controls, and value tracking across many teams.
Bharat‑VISTAAR (Government of India) debuts a multilingual AI helpline delivering crop advice, weather, and market updates by phone—bringing expert guidance to over 140 million farmers.
Funding wave: Runway ($315M), Temporal ($300M), PolyAI ($200M), and Render ($100M) lead fresh capital into agents and infrastructure—signaling accelerating confidence in practical AI deployments.

LangChain shares agent reliability recipes—self‑verification loops and structured checks—that significantly boost coding agents’ accuracy and reduce costly failure modes in production pipelines.
LLM‑driven wireframes show how markdown/ASCII mockups compile into functional web pages—collapsing design-to-code cycles and enabling faster product iteration for lean teams.
Home robot how‑to demonstrates a vision‑enabled assistant that recognizes family members, schedules, and codes—showing what’s possible with off‑the‑shelf parts and today’s open models.

FLUX.2 [klein] delivers responsive, interactive image editing with generative controls—enabling real-time visual exploration for designers, marketers, and creators.
16‑agent collaboration stack coordinates distributed reasoning to tackle complex tasks—hinting at modular agent teams outperforming monolithic systems on breadth and robustness.
Perceptive Humanoid Parkour uses online depth sensing and full‑body coordination to navigate challenging terrain—advancing agility and safety for bipedal robots.
Personal home robot demo blends vision, productivity, and dialogue—illustrating how embodied assistants can mix utility and personality in everyday environments.

Transcript analytics and systems thinking emerge as better ways to measure agent impact—focusing on end‑to‑end ripple effects instead of optimizing isolated components.
Narrow assistants may outperform generalists on reliability; leaders argue dependable “doers” for core tasks beat breadth—aligning with enterprise demand for trustworthy automation.
Reasoning vs. data exposure debates intensify; some suggest gains reflect broader training distributions, while codec‑inspired video tokens could fix bloated context in multimodal models.
Infrastructure reflections: data movement drives energy costs, data centers as critical infrastructure, and balancing government AI limits with state capacity dominate policy conversations.
Practice notes: in‑context learning remains surprisingly strong; user‑driven tool chaining reshapes agent UX; and perspectives from John Carmack and Terence Tao underscore AI’s practical and scientific momentum.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.