INAI • The Open AI Hub

📰 AI News Daily — 16 Nov 2025

TL;DR (Top 5 Highlights)

Google’s Gemini 3 is reportedly imminent, beating key coding benchmarks and aiming to challenge ChatGPT across search, productivity, and media.
AMD signed a multiyear AI chip deal with OpenAI, including an option for a 10% equity stake—reshaping competitive pressure on NVIDIA.
Microsoft confirmed a 27% stake and deep IP access to OpenAI through 2032, cementing control across cloud, chips, and model commercialization.
The infrastructure race escalated: OpenAI/Microsoft unveiled mega GPU clusters, Google committed $40B to Texas data centers, and NVIDIA-powered clouds surged.
Anthropic reported thwarting a major cyber-espionage attempt and warned of more sophisticated AI-enabled attacks by 2026, elevating security urgency.

🛠️ New Tools

The Station launched an open-world sandbox for autonomous science agents, enabling end‑to‑end experiments and analysis. It lowers barriers to self‑directed research and accelerates reproducible discovery loops.
AgentEvolver introduced self-improvement loops—self‑questioning, navigation, and attribution—to make agents more reliable with less human oversight. Expect steadier task completion and fewer brittle failures.
Google Colab now connects directly to VS Code, combining familiar local dev workflows with managed GPUs/TPUs. It shortens iteration cycles for researchers and students building and training models.
GitHub Copilot added a study companion that breaks down complex topics and keeps learners on track. Persistent, personalized guidance improves retention and reduces context-switching.
OpenAI’s ChatGPT added group chats for up to 20 participants, enabling real‑time collaboration, brainstorming, and summaries. Teams gain faster alignment and less meeting overhead.
Google’s Gemini Veo 3.1 lets creators add reference images to text prompts for precise video generation. It boosts creative control for advertisers and social teams while cutting production costs.

🤖 LLM Updates

OpenAI’s GPT‑5.1‑high added vision+text multimodality, improving reasoning over images and documents. Stronger cross‑modal comprehension broadens use cases in analysis, support, and content workflows.
OpenAI’s GPT‑5.1 Codex topped Anthropic’s Claude Sonnet 4.5 Thinking on SWE‑Bench at lower cost, signaling stronger code automation economics for teams shipping and maintaining complex software.
Google’s Gemini 3 reportedly surpassed 80% verified on SWE‑Bench and is expected imminently. If confirmed, it resets the coding and reasoning leaderboard against OpenAI.
Google cut hallucinations by 40% and expanded Gemini context windows to 1 million tokens. Better reliability and long‑document handling unlock safer enterprise and research deployments.
Baidu’s ERNIE 5.0 delivered a more polished step over 4.5, narrowing gaps with top US labs in practical tasks while still trailing on frontier benchmarks.
Open‑source momentum: MiniMax M2 led select public tests; Kimi K2 Thinking showed long‑horizon reasoning with efficient INT4 quantization; Sherlock‑Alpha neared Grok‑4 on LisanBench, suggesting RL gains in smaller models.

📑 Research & Papers

New findings show robots powered by LLMs can exhibit biased or hazardous behaviors. The work underscores urgent needs for transparency, auditing, and safety controls in embodied AI.
EZSpecificity achieved 91.7% accuracy predicting enzyme‑substrate interactions, promising faster drug discovery and synthetic biology by narrowing wet‑lab search spaces and costs.
Analyses suggest around 20% of ICLR peer reviews may be AI‑generated. Academic norms are shifting, raising questions about disclosure, evaluation quality, and reviewer incentives.
Interpretability and honesty studies explored models explaining internal mechanisms and simple training interventions improving truthfulness—practical steps toward more trustworthy systems.
Reports of a large autonomous AI‑enabled cyberattack highlight a new threat era. Security research is pivoting to defensive AI that detects, contains, and learns from adaptive adversaries.

🏢 Industry & Policy

AI infrastructure surged: OpenAI and Microsoft unveiled massive GPU clusters, Google committed $40B to Texas data centers, and NVIDIA’s ecosystem powered CoreWeave and Nscale. Expect cheaper, abundant compute regions.
AMD and OpenAI forged a multiyear chip pact with an option for OpenAI to buy up to 10% of AMD. It intensifies competition and diversifies supply beyond NVIDIA.
Microsoft secured broad access to OpenAI IP and a 27% stake through 2032, locking strategic influence across models, Azure, and custom silicon—stabilizing product roadmaps for enterprises.
Google faces a class‑action lawsuit alleging Gemini AI secretly recorded private conversations in Gmail, Chat, and Meet. The case spotlights high‑stakes privacy governance for ambient assistants.
Apple tightened data‑sharing rules for AI apps, requiring explicit user consent ahead of its Siri overhaul. Privacy‑forward defaults set a higher bar for third‑party AI integrations.
AI startup funding remained torrential: Cursor ($2.3B), d‑Matrix ($275M), and Scribe ($75M) led rounds. Investor confidence favors tools that accelerate software delivery and operational efficiency.

📚 Tutorials & Guides

Google published a production AI agents playbook emphasizing CI/CD, evaluation harnesses, and agent‑to‑agent protocols—turning prototypes into reliable, maintainable systems at scale.
A visual AWS guide and overview of eight RAG architectures showed how to balance latency, accuracy, and cost when building retrieval‑augmented applications.
Jane Street’s talk shared GPU training tactics—profiling, kernel optimizations, and memory discipline—to extract more performance from modern hardware without ballooning bills.
A practical guide on giving constructive, reasoned feedback helps teams steer model behavior, reducing vague prompts and improving iterative outcomes.
The RLHF Book opened discounted early access, offering practitioners a grounded overview of preference modeling, safety trade‑offs, and evaluation practices.
The free “Agents in Production” conference (OpenAI, Meta, Google speakers) promises hard lessons on deployment, monitoring, and failure modes from real-world systems.

🎬 Showcases & Demos

At ParisVibeathon, teams built a voice‑driven proposal generator in under 10 hours using Gemini 2.5 Pro, ElevenLabs, and Qdrant—proof that orchestration of mature components now yields overnight MVPs.
OpenAI’s Sora app went public in select regions, surpassing one million downloads in five days. Ten‑second custom videos hint at mainstream creative workflows shifting to AI.
Google DeepMind unveiled SIMA 2, a generalist gaming agent powered by Gemini models. Transfer across games suggests broader potential for robotics, simulation, and autonomous systems.
Disney partnered with Animaj to slash animation timelines via AI in‑betweening. Studios can iterate faster while preserving creative direction, reshaping production economics.
Google rolled out Gemini‑powered shopping in Search and the app—natural‑language queries, real‑time inventory, and agentic checkout—streamlining holiday commerce and raising expectations for retail experiences.

💡 Discussions & Ideas

Yann LeCun criticized the field’s fixation on ever‑larger models and warned about regulatory capture curbing open‑source. The debate centers on innovation speed versus centralized control.
Some argue researcher time, not compute, is the true bottleneck. Better tooling, evaluations, and automation may unlock more progress than chasing bigger clusters alone.
A browser‑centric future is emerging, where the web acts as a universal virtual machine. Agents navigating pages could unify apps, data, and workflows under open standards.
Leaders like Satya Nadella and Alex Karp pressed for broad AI empowerment and US leadership, while others argued recent releases may actually lengthen AGI timelines.
Practitioners urged moving beyond YOLO to transformer‑based vision for robustness. The conversation highlights practical trade‑offs between legacy pipelines and modern architectures.
Engineers contrasted PyTorch’s deep systems work with GPT app‑building, underscoring that seemingly simple apps often hide complex orchestration, evaluation, and data plumbing.

Source Credits

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.