• Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know
    Jun 12 2025

    What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next.

    Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplained

    Plus o3-pro and whether it is my current most-recommended model.

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:57 - Viral Post + Headlines
    01:42 - Apple Paper Analysis
    08:34 - But they do Hallucinate
    10:43 - Not Supercomputers
    11:18 - o3 Pro and Recommendations


    13.7M Tweet: https://x.com/RubenHssd/status/1931389580105925115

    Apple Paper: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

    Guardian Article: https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse

    Lisan al Gaib post: https://x.com/scaling01/status/1931854370716426246

    Multiplication: https://x.com/yuntiandeng/status/1836114401213989366

    The Illusion of the Illusion of Thinking: https://drive.google.com/file/d/1Zx9ikRj0Enc3SB4wA9HlYIlpmO_8QiUO/view

    Marcus: https://www.theguardian.com/commentisfree/2025/jun/10/billion-dollar-ai-puzzle-break-down

    Prof Rao: https://x.com/rao2z/status/1927707640223719631

    AI Job Headlines: https://www.nytimes.com/2025/06/11/technology/ai-mechanize-jobs.html
    https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    Sky News Story: https://news.sky.com/story/can-we-trust-chatgpt-despite-it-hallucinating-answers-13380975

    Veo 3 Ad: https://x.com/Kalshi/status/1932891608388681791

    Altman Essay: https://blog.samaltman.com/

    o3 Original benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8b6c44-acd6-43b3-b5c6-1a1d5c6c25e4_2486x1388.png

    https://pbs.twimg.com/media/GfQ0bfcXQAAQt13.jpg

    Alpha Evolve Video: https://www.youtube.com/watch?v=RH4hAgvYSzg

    https://simple-bench.com/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    14 mins
  • AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed
    Jun 6 2025

    There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plus Sundar Pichai’s AGI date and an analysis of whether the current AI unemployment headlines are justified, and Elevenlabs v3.


    https://emergentmind.com


    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    02:04 - Gemini 2.5 Ultra
    03:34 - Benchmarks
    07:41 - AGI Date and Meaning Pichai
    09:13 - Jobs and AI Unemployment Fears
    15:28 - Elevenlabs v3

    Sundar Pichai Fridman: https://www.youtube.com/watch?v=9V6tWC4CdFQ

    Pichai More Jobs (until 2026 at least): https://www.techradar.com/pro/alphabet-ceo-sundar-pichai-says-ai-wont-lead-to-job-cuts-will-be-an-accelerator

    Gemini Comparison: https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/
    https://x.com/viathebrink/status/1930733154203292121

    https://simple-bench.com/

    White Collar Bloodbath: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
    https://fortune.com/2025/05/25/ai-entry-level-jobs-gen-z-careers-young-workers-linkedin/
    https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.html
    https://www.nytimes.com/2025/03/25/business/economy/white-collar-layoffs.html

    College Unemployment: https://www.newyorkfed.org/research/college-labor-market/#--:explore:unemployment

    New Scientist AI Hallucinaitons: https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/

    Duolingo: https://fortune.com/2025/05/24/duolingo-ai-first-employees-ceo-luis-von-ahn/
    Klarna: https://www.forbes.com/sites/quickerbettertech/2025/05/18/business-tech-news-klarna-reverses-on-ai-says-customers-like-talking-to-people/

    Sholto Douglas: https://www.reddit.com/r/ClaudeAI/comments/1ktt1rb/anthropics_sholto_douglas_says_by_202728_its/

    Figure 02: https://x.com/adcock_brett/status/1930693311771332853

    Elevenlabs v3: https://www.youtube.com/watch?v=zv_IoWIO5Ek

    Gemini Speech Generation: https://aistudio.google.com/generate-speech


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    17 mins
  • Claude 4: Full 120 Page Breakdown … Is it the Best New Model?
    May 22 2025

    Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more!


    https://80000hours.org/aiexplained


    Chapters:

    00:00 - Introduction
    01:12 - 3 Quick Controversies

    02:42 - Benchmark Results

    04:20 - 120 page Card 20 Highlights

    10:07 - Coding Test
    11:27 - Model Welfare and Spiritual Bliss

    13:29 - ASL-3

    Claude Card:
    https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?s=09
    ASL 3:https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdf

    Tweets: https://x.com/fish_kyle3/status/1925597284546629753

    https://x.com/EMostaque/status/1925624164527874452?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet


    Cursor Says State of the Art for Coding: https://x.com/cursor_ai/status/1925594428095561941


    Benchmarks: https://www.anthropic.com/news/claude-4



    Show More Show Less
    19 mins
  • Google Takes No Prisoners Amid Torrent of AI Announcements
    May 21 2025

    Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Veo 3 clips to enjoy…

    https://80000hours.org/aiexplained

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:48 - Veo 3
    02:10 - Gemini 2.5 Flash
    03:13 - Universal Assistant
    03:47 - Usage Skyrockets + OpenAI dig
    04:51 - Gemini Pro Deep Think
    06:21 - Overviews and AI Mode
    07:26 - Deep Research Updates (new) + Jules
    08:53 - Make and Deploy Apps with Gemini
    09:12 - Imagen 4
    10:00 - Gemini Diffusion
    11:46 - Try It On
    12:17 - SynthID Detector
    13:30 - GemmaVerse, SignGemma, Gemma3n, medGemma
    14:24 - Outro + Clips

    Event: https://www.youtube.com/watch?v=o8NiE3XMPrM
    Ntaive Audio: https://aistudio.google.com/generate-speech
    Gemini Diffusion: https://deepmind.google/models/gemini-diffusion/#capabilities
    New Gemini 2.5 Flash: https://deepmind.google/models/gemini/flash/
    SignGemma (See end of this vid): https://www.youtube.com/watch?v=GjvgtwSOCao
    Deep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#flash-improvements
    Google Parallel Sampling: https://www.patreon.com/posts/next-level-good-127441188

    Price Plans: https://blog.google/products/google-one/google-ai-ultra/
    Imagen 4 Benchmarks: https://deepmind.google/models/imagen/
    Jules: https://jules.google/
    SynthID Detector: https://blog.google/technology/ai/google-synthid-ai-content-detector/
    Veo 3 Benchmarks: https://deepmind.google/models/veo/evals/
    MedGemma: https://deepmind.google/models/gemma/medgemma/
    Build Apps: https://aistudio.google.com/apps


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    17 mins
  • AI Improves at Self-improving
    May 19 2025

    AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system.

    Gray Swan: http://app.grayswan.ai/ai-explained

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:27 - AlphaEvolve
    05:23 - Limitation
    06:10 - Achievements
    08:21 - Future Improvements
    13:30 - Quirks
    16:34 - Final Thoughts

    AlphaEvolve release: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

    Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

    Terence Tao Quote: https://mathstodon.xyz/@tao/114508029896631083

    Nature Article: https://www.nature.com/articles/s41586-022-05172-4
    MIT Article: https://www.technologyreview.com/2025/05/14/1116438/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems/
    AI Co-Scientist: https://arxiv.org/pdf/2502.18864

    OpenAI Codex: https://openai.com/index/introducing-codex/


    70% of Pull Requests: https://x.com/slow_developer/status/1920920456393028027

    Amodei Essay: https://www.darioamodei.com/essay/machines-of-loving-grace

    OpenAI Jason Wei Tweet: https://x.com/_jasonwei/status/1923091260354531612

    PromptBreeder: https://arxiv.org/pdf/2309.16797
    DrEureka: https://arxiv.org/pdf/2406.01967

    FT DeepMind: https://www.ft.com/content/4e497a91-670a-4f69-be4a-18e247daba3e



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    18 mins
  • o3 breaks (some) records, but AI becomes pay-to-win
    Apr 25 2025

    A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.

    https://app.grayswan.ai/ai-explained

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:33 - FictionLiveBench
    01:37 - PHYBench
    02:14 - SimpleBench
    02:54 - Virology Capabilities Test
    03:13 - Mathematics Performance
    04:29 - Vision Benchmarks
    05:43 - V* and how o3 works
    06:44 - Revenue and costs for you
    08:54 - Expensive RL and trade-offs
    09:40 - How to spend the OOMs
    13:27 - Gray Swan Arena

    Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
    PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
    How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
    Visual puzzles: https://neulab.github.io/VisualPuzzles/
    Fiction Bench: https://x.com/ficlive/status/1912863028141244850
    https://geobench.org/
    https://simple-bench.com/
    AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
    USAMO: https://x.com/mbalunovic/status/1914398518896193747
    NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
    Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
    IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
    Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
    Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
    Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
    GPU Trade-offs: https://x.com/sama/status/1915098951067554030
    RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
    Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
    2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
    Model Size: https://x.com/slow_developer/status/1874554473256997201
    Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
    Papers on Patreon: https://arxiv.org/pdf/2502.01839
    https://arxiv.org/pdf/2504.13837
    Chollet Quote: https://x.com/fchollet/status/1912934762580447447
    OpenSim: https://opensim.stanford.edu/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    15 mins
  • o3 and o4-mini - they’re great, but easy to over-hype
    Apr 16 2025

    Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

    https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - o3 and o4-mini


    https://simple-bench.com/

    Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679

    System Card: https://openai.com/index/o3-o4-mini-system-card/

    Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/

    https://deepmind.google/technologies/gemini/pro/

    https://x.com/DeryaTR_/status/1912558350794961168

    https://x.com/polynoamial/status/1912564068168450396

    API Pricing:https://openai.com/api/pricing/

    https://aider.chat/docs/leaderboards/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    14 mins
  • ‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed
    Apr 16 2025

    This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.

    https://www.emergentmind.com/


    Chapters:

    00:00 - Introduction

    00:30 - Kling 2.0

    01:35 - GPT 4.1

    05:25 - o3 Build-up

    07:37 - ‘Product Company’

    09:31 - Safe Superintelligence

    10:54 - DolphinGemma

    13:16 - Data Dominance?


    Kling 2.0: https://app.klingai.com/global/release-notes


    Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09


    https://openai.com/index/gpt-4-1/


    OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq


    Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503


    Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626


    Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k


    https://simple-bench.com/try-yourself


    https://aider.chat/docs/leaderboards/


    4.5: https://www.youtube.com/watch?v=6nJZopACRuQ


    Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/


    Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151

    Evals: https://www.youtube.com/watch?v=scsW6_2SPC4

    Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai

    https://x.com/sethsaler/status/1912188383457059301


    https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/

    https://ai.meta.com/blog/llama-4-multimodal-intelligence/

    https://deepmind.google/technologies/gemini/pro/

    https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

    https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

    OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490

    Show More Show Less
    20 mins