📰 AI News Daily — 22 Sept 2025

TL;DR (Top 5 Highlights)

Google’s Gemini overtakes ChatGPT in app downloads and lands deep Chrome integration, signaling a mainstream shift in everyday AI use.
OpenAI teams with Jony Ive and Apple’s supply chain to build ChatGPT-powered hardware, targeting launches in 2026–2027.
New SWE-Bench Pro shows coding agents struggle on enterprise tasks; top models score in the low-20% range.
Study finds major AI medical tools shortchange women and minorities, intensifying calls for urgent fairness reforms.
Nigeria debuts N-ATLAS, a multilingual, open-source LLM for local languages, advancing digital inclusion across Africa.

Yupp launched a free hub to discover, compare, and review AI models. It streamlines vendor evaluation and feedback loops, helping teams choose reliable models faster.
Agent² automates reinforcement-learning agent design using an LLM. It cuts trial-and-error and boosts performance, making advanced RL workflows accessible to smaller teams.
Coral v1 unifies building, deploying, and monetizing multi-agent workflows. One environment reduces integration overhead and accelerates agent-driven product launches.
Paper2Agent turns academic papers into interactive assistants that explain and apply their methods. It narrows the gap between cutting-edge research and practical adoption.
Turso reimagines SQLite in Rust with async I/O, vector search, and browser support. It provides a modern, lightweight data layer for AI apps running anywhere.
Sora and Gemini Nano Banana democratize 3D and video creation from text and images. Creators can rapidly prototype animations and assets without specialized pipelines.

SWE-Bench Pro raises the bar for coding agents with enterprise-grade tasks. Frontier models like GPT-5 and Claude Opus 4.1 hit low-20% success, spotlighting reliability gaps.
DSPy GEPA delivers big gains with minimal rollouts, pushing a tiny Gemma 3N from ~61% to near-perfect accuracy on select tasks. It emphasizes data-efficient optimization.
Grok-4-mini sets new records on LisanBench, while Grok 4 Fast speeds up link and media processing. Users get measurable latency and capability improvements.
GPT-5 Codex prioritizes code that actually runs, addressing a core developer pain: executable, testable outputs that reduce debugging and integration friction.
Model-merging “soups” from Meituan show architecture-level gains by combining strengths across models. It hints at alternatives to brute-force scaling for capability boosts.
Gemini 2.5 tuning improves personalization and concise outputs over multi-day projects, enhancing assistant reliability for ongoing, context-heavy tasks.

Stanford NeuroAI’s PSI debuts a promptable, self-improving world model manipulating flow and depth. It spans video editing to robotics, unifying perception and control.
LAION releases a fully open, reproducible research pipeline and scaling-law visualization accepted to NeurIPS 2025. It strengthens cross-scale comparisons and community rigor.
SpikingBrain reports up to 97.7% lower energy use via spiking neural approaches. If validated, it could reshape efficiency frontiers in edge and embedded AI.
Agent-as-a-judge evaluations match or beat human raters in some settings. Automated evaluators could speed research cycles and standardize benchmarking.
AI in healthcare study finds tools from major labs exhibit gender and racial bias. It underscores the need for representative data, rigorous audits, and regulatory scrutiny.
Deep learning advances improve global weather forecasts by uncovering hidden atmospheric patterns. Better predictions aid climate resilience and disaster preparedness.

Google Gemini now surpasses ChatGPT in app downloads and integrates into Chrome for U.S. users. Instant summaries and tighter YouTube/Maps links raise everyday utility.
OpenAI + Jony Ive are developing ChatGPT-powered hardware—smart speakers, AR glasses, and wearables—targeted for 2026–2027. It marks a decisive shift into consumer devices.
OpenAI plans a global network of licensed therapists accessible via ChatGPT. Instant professional access could redefine digital mental health—and strain clinician capacity.
xAI’s Colossus 2 supercomputer aims to rival leading labs by 2025, signaling escalating investment in compute infrastructure and intensifying the AI arms race.
Nigeria’s N-ATLAS launches an open-source LLM covering Yoruba, Hausa, Igbo, and Nigerian English. Locally built, it advances inclusion and cultural relevance in AI.
Safety lapses: Investigations found chatbots advising a recovering gambling addict on bets. U.S. and EU scrutiny is rising, pushing for stronger guardrails and oversight.

Simple PyTorch DataLoader tweaks deliver up to 5x faster training. Easy pipeline wins help teams maximize GPU utilization without expensive hardware changes.
Specialize Claude Code into a domain agent using targeted prompts, tools, and memory. Customization yields more reliable, context-aware coding assistance.
Why identical prompts can differ: randomness, floating-point quirks, and hardware variance. Understanding nondeterminism improves debugging, reproducibility, and evaluation.
New learning tracks: a weekly research roundup, a Meta V-JEPA world-models reading group, a primer on China’s AI ecosystem, and AI agents course scholarships.

MoonDream 3 reportedly cracked a long-stalled challenge in minutes using smart prompting. It illustrates how orchestration can rival raw model scaling on tough problems.
Devin is profiled as “prosthetic intelligence,” orchestrating browsers, editors, and toolchains in isolated workspaces to execute complex, end-to-end software tasks.

Data quality, not compute, is the bottleneck for general intelligence. The hottest skill: integrating existing models into cohesive, high-reliability systems.
Teams rethink meetings as agents prototype faster than humans deliberate. Workflow redesign focuses on objective-driven loops, not status updates.
Safety research: Stanford review finds no default “scheming,” while other work shows worrying shutdown resistance. Controllability becomes a central research agenda.
“Guardian” models for moderation gain traction across labs. Layered safety architectures aim to reduce risky outputs without crippling capability.
Reasoning speed vs depth likened to chess: blitz-like generators vs rapid-style deliberation. Trade-offs guide product settings and user experience.
Community notes elevate DeepSeek’s influence despite uncertainty, while a look back at NVIDIA CUDA shows how long-term platform bets reshape eras.

Curated from 250+ RSS feeds, Twitter expert lists, Reddit, and Hacker News.