OpenAI just shipped the Mythos killer (GPT 5.5)
Search inside any video
SavedThat transcribes your saved videos and lets you search across all of them instantly. Save this video and find any moment.
TL;DR
In this video, David Ondrej reviews OpenAI's GPT 5.5, discussing its capabilities in web development and comparing it to previous models and competitors.
Chapters
Full Transcript
Okay, so OpenAI just dropped GPD 5.5 minutes ago. So, we're going to look at it. Supposedly, it's on the level of Mythos. I already have it in chat GBD. U pro users got it first on my team's account. I don't have it. So, we're going to look at how good it is
we're going to look at how good it is and test it inside of Cordex app to see what it can build because supposedly it's insane at SVG graphics and 3D and any kind type of web development and front end. So this is opening as response to clo mythos. The difference is enthropics
mythos is not available. GPD 5.5 is available now. So let's go through the article first. Introducing GBD 5.5 a new class of intelligence for real work. Okay. So 90 second video. Let's look at it. It is different in the sense that it understands what I'm trying to
understands what I'm trying to tell it to do. I see. Okay. That's a big issue with Opus 4.7. Opus 4.7 does not understand what you're telling it to do. In fact, I would even say that Opus 4.7 is clear sign of Enthropic's compute crunch. This is why
2026 could be the year of OpenAI because Enthropic they're running out of compute. Opus 4.7 is the first model that Enthropic released in the last two years where people think it's worse. This is completely unprecedented. Andropic has in the last three weeks
they've suffered massive reputation hits first because they didn't release Mythos but then there were user spotted regressions in Opus 4.6 and 4.7 dropped and the vibes are off. Sure on the benchmarks it's better than 4.6 but if you used it and I use it
if you used it and I use it every single day. I still prefer to use Opus 4.6. In fact, if I open up a new terminal, boom, and I open CL code. Enter. Guess what model is selected? 4.6. 6 fast. Now, mainly it's because 4.7 doesn't have fast mode, but still
4.6 just listens to your instructions. It just does what you want. 4.7 sometimes it gives you tuning insight, but most of the times it's just So, OpenAI has a great opportunity to not only catch up to Enthropic, but overtake them once again
Enthropic, but overtake them once again because they have secured more compute. Dario Amu, the CEO of Enthropic, he was very safe. He played a lot of very safe with how much compute Enthropic has invested in and because they had insane growth in the first quarter of 2026. Now
growth in the first quarter of 2026. Now we're seeing that Enthropic is running out of compute and users are sporting massive degradations whether it's inside of cloud as the app or cloud code or just usage limits people are hitting usage limits super fast. So, OpenAI has a real opportunity because
they invested way more money into compute, infrastructure, data centers than Enthropic. And this year, they're going to have way more compute which will allow them to make better models. So, let's see if GPD 5.5 is the first hint of this OpenAI comeback
hint of this OpenAI comeback comes up with potentially multiple options of how we could do it. And then, , so obviously we will test that. this guy is an engineer from RAMP which is the finance credit card company. But we will test that at the end. Okay. After we go through the benchmarks and the main info about the model, I'll jump
we go through the benchmarks and the main info about the model, I'll jump straight into Codex which by the way few minutes ago there was update. It was a little late because I was did they added to Codex? Did they not add it to Codex? But anyways, you can see that the UI is a bit different. You can see GBD 5.5 here. Speed fast. We're going to do
all kinds of tests. But first I want to learn what this model is about and how good it is. And in the meantime, I'm just going to copy the full page. I'm going to go into CHBT. I'm going to do page. Just do page XML tags. Definitely not pro mode. Let's do thinking normal. And I'm going to say,
thinking normal. And I'm going to say, give me a concise summary of the most interesting points about this new GPD 5.5 model and especially what is unusual or new about this model compared to other cutting edge AI model releases. Be very concise. Now, obviously, we are
using thinking with GBD 5.5 already here selected. So, we're going to have GBD 5.5 summarize this page about 5.5. And it did, I would say, a 98% job all by itself. And I buttoned some stuff up and it was done. It was able to tr The problem with these testimonials or
The problem with these testimonials or these clips is that I think Opus 4.6 probably could have done them. , she says she had a bunch of bugs. I don't know if 5 GBD 5.5.4 or if Opus 4.6 could have fixed these bugs. So, yeah, this is not really valuable. Let's read through this text to see what's
yeah, this is not really valuable. Let's read through this text to see what's interesting here. understands what you're trying to do faster. The gains are especially strong in agentic coding, computer use, knowledge work, and early scientific research. This is this is direct jab and enthropic. Look at this.
direct jab and enthropic. Look at this. Larger, more capable models are often slower to serve, but GPD 5.5 matches GPD 5.4 per token latency in real world serving while performing at a much higher level of intelligence. It also uses significantly fewer tokens to
uses significantly fewer tokens to complete the same correct task. Another hint, a hit, another jab at Enthropic because if you remember my Opus 4.7 video, it has a new tokenizer which burns more tokens for the same
task. So, OpenAI is flexing their compute. Here we are releasing GPT. no. no. We're releasing GP 5.5 strongest set of safeguards to date designed to reduce misuse.
So, it's even more censored. Yikes. Today GPD 5.5 is rolling out to plus pro business enterprise users. So nothing for the free users guys. This is why you need to pay for AI. I don't care if it's JGBD cloud perplexity. Just pay for some
account, okay, to use the latest and greatest models. Okay, some benchmarks. Let's look at it. Also, the included 5.5 Pro as well, which we can also test out. I have it obviously I'm on the pro plan. We have a terminal bench where it absolutely demolishes Opus 4.7 expert
SWE. So this is very risky. This is this is dirty from OpenAI did not include SWE bench verified where Opus is better. They just use some benchmark that Opus doesn't have benchmark. This is very strange. GDP val so this is
economically valuable tasks. 5.5 wins. OS world verify barely wins. Tulafon another Opus doesn't have it. Browser comp. Okay, much better. Frontier math much better. Okay, so on math it
absolutely destroys Opus and Cyber Gym. It's probably some cyber security stuff it destroys. But this is very small list of benchmarks. If you look at Opus, they included way larger list in their release way more benchmarks. So shady from OpenAI, but let's keep going. , let's let's look at the
, let's let's look at the summary from 5.5 to see what's interesting here. It is positioned less a smarter chatbot and more an agentic work model. The big claim is not just better answers, it is better at taking messy task planning using this. Okay, coding jump looks real, especially
long horizon coding. Yeah, SWench Pro is way behind. , not way behind, but quite behind Opus 4.7. Yeah, fewer tokens while doing better. GBD 5.5 improved the infra serving GBD 5.5. So this is hints of u self-improvement you
know recurring recursive self-improvement obviously not on the level of training of the models but we're getting there very long context is now materially better GBD 5.5 supports 1 million contexts in the API and performs much better than GBD 5.4 or 512. Okay.
All . That's a huge jump on MRCR. GBD 5.5 pro seems to be aimed at high accuracy professional work. yeah this takes if you're in chat GBD pro queries take 20 minutes to answer. So obviously that's not for typical tasks or everyday chatting scientific research capability.
chatting scientific research capability. I think this is just the openi wanting to have good PR Google deep mind cyber biochemistry API not okay this is interesting API not available immediately. Let's check open router GPT. Yeah it's not here. Wow.
CHGBT and Codex get it first. API access very soon. Damn. They just want to have more people using Codex, which again, we're going to test that in a second. , now, because we just went through it. Bottom line, unusual angle is not marketed mainly as getting better answer
marketed mainly as getting better answer question. Marketed as a persistent tool calling. First of all, I'll have to say this answer was much better than 5.4. Something weird was happening with GBD 5.4 before where the answers were hard to understand. This was pretty easy read and was nicely formatted. So
read and was nicely formatted. So conversationally just from this one answer I can say it's already feeling better than GBD 5.4. But what we care about is whether you can build anything with this model. So what I'm going to do I'm going to open a new project existing folder.
There we go. Create a new folder GBD 5.5. And let's go here. So obviously standard settings inside of Codex is full access don't have it on default permissions full access for sure then model obviously use
GB 5.5 there's no reason you would use any other model I don't know why they even make it available use 5.5 okay now medium or high are good base points extra high is for fixing debugs and big refactors so I think we can start with medium and then probably go to high
and then speed Make sure to do fast because it's a lot faster inference. It consumes the limits a bit faster, but hey, I want a $200 a month plan. I don't care. So, let's see what people have been building with this. Some of the most impressive not societies
unicorn test. Okay. So, I'm just going to screenshot this big Zcode. I don't know what that is, but let's try to recreate it. Recreate this unicorn. exactly as on the image attached.
attached. Boom. And , I'm going to stop it. Let's do a new chat. I'm going say inside of this folder, create unicorn subfolder. We're going to test a
lot. Okay. Native Mac OS Retro Library game. This is going to be the second test. Boom. And then we're going to do 3D dungeon arena prototype. I'm going to push the model to its limits. But we're going to
model to its limits. But we're going to start with this tick unicorn or we can try SVG if it doesn't know that. Okay. Boom. Let's continue in here and let's see if it can make that. Here we can see the context already 26k use. That's crazy. That's
That's crazy. That's crazy. Okay. First of all, I'm going to open terminal. I'm going to launch Codex. Let's see if the update is in the CLI as well. It is. So, I'm going to update it. Boom. There we go. Bun
install codex. I'm going to say check my MacBook to see if I have root level agents MD file somewhere. If so, tell me where. I want to check
because it was loaded with 26k tokens away. So I don't know if that's opening as default system prompt or if I have it somewhere on the computer. So I'm running this correct root level. And in the meantime, we have the file here. It's checking some line limit. It's following something. I don't know what
following something. I don't know what it's following. I found one. I'm waiting. Okay. Maybe this one. Agents.mmd. All . Let's launch this. What is
happening here? index html some error fill attributes. Okay, it's fixing it. Let's open it up here. What is this? Close this. Boom. Okay, so it recreated the whole tweet. I don't know why there's no color. Is it fixing it now?
yeah, it didn't fill out the colors. Okay, it caught it by itself. I didn't have to point it out. That's good. So, obviously, we're on medium and it still takes some time. Yeah, it's still running for 2 minutes 30 seconds. That's why I put it to medium and not high or extra high.
medium and not high or extra high. What is going on here? So, we have the codeex one. This one agents.mmd. We can f probably find it somewhere here. , here it is. Okay, this is good. I think this is good. But this is not 26,000 tokens, so that doesn't explain the answer. All , we have
our f. Okay, this is nice. Wow. , I forgot about the codex updates for computer use. We didn't test that. So, I think we did correct that. Let's test that. Did it test the computer use? It might have used that to correct home
play, ? , I think it used it and it saw it him itself. This is very convenient by the way. You can just open this. , it's built in here. , wait. This is good. Let me check the reference. , wow.
, wow. What guys? This is nearly identical. Wait, wait a second. Wait a second. Wait. Full screen. Just a screenshot. Full screen. Yo,
what? Okay, first test. I am impressed. How did it nail so accurately from a screenshot? Okay, that's that's impressive. All , but let's not get too excited. I'm also going to test more things that
I'm also going to test more things that are more challenging than a unicorn. For example, a full game use image and skill to generate reference UI. So, we were going to use , wow, we're going to combine this with images 2.0. I was about to make a dedicated video on GBT images
make a dedicated video on GBT images 2.0. That's a hint. Make sure to subscribe for that. It's coming very soon. In fact, most of you are not subscribed. I'm going to pop up a graphic. I think 75% of you are not subscribed. So, if you want to see more videos on the cutting edge AI tools and models, please subscribe. It takes a
models, please subscribe. It takes a couple of seconds. It's completely free to subscribe on YouTube and it motivates my team a lot. So, , take a few seconds, subscribe. Appreciate it a lot. Now, let's try to recreate it with the assets. So, we're going to use the image and by the way, you can do that by
assets. So, we're going to use the image and by the way, you can do that by tagging the dollar sign. So, dollar sign image to do the skill. Wait, image gen. And I'm also going to say, okay, first of all, I'm going to say create a new is
it Doom? is create a new folder named /doomstyle game. Same level as slash unicorn folder. Okay. And then we're going to try to copy this. Going to use this as a
try to copy this. Going to use this as a reference image. Boom. Reference image for this. Use image and skill gen reference UI sprites. Build the macros app Labyrinth game. Okay.
Use image. There we go. image gen skill to your insp use the absolute fully remember game and use your this is going to be the computer use
for me in the terminal okay boom. I'm also going to do medium because otherwise we're going to be here forever. Okay. So workspace is empty except for the new folder. Okay. So the beauty of Codex app is that we can pin it on the left. So we can go here GBD55. We can pin this to make it a favorite chat. And while this
it a favorite chat. And while this is running, we can start a new thread. It has built-in Git work. So we can work on multiple things at once without interfering with each other. So we're going to try to build this. This is insane. What? No way.
This is insane. What? Okay, I need to make this. So, I'm going to put this as a reference, I guess. Good frame for reference image. Something this, I guess. All , that was a good
All , that was a good snipe. Damn, I need to make this. This is wild. What a playable prototype code. Okay, where's the prompt? Okay, the game architecture.
architecture. Let's copy this view character dialogue. Okay, so we're going to need open API. That's fine. Details. I'm going to put this as details. Create a new slash 3D
3D dungeon folder root level same as slash and then build a full playable 3D prototype game. same visual
style as reference screenshot. I'm going to say use and then the image gen skill to generate all the graphics assets and textures and then do your
best work to implement all the details below. Get to work and build this 3D game end to end and then launch it. I'm going to do high for this. Let's kick this off. , what was this? Chrome. Is
this our Let's pin this as well. Is this our Doom? , damn. Generated assets. What's happening here? 500 lines. Can't drive a native SW directly. So, I'm using it. What's happening here?
It's useless. Swift UI lab L labyrinth crawler with first person quarter rendering. Okay, this is very nice graphics, but let's see how the actual game looks.
, it opened this it opens the image. It's useless. I don't know why it's opening the sprite. Okay, let's let it cook. I'm not going to interrupt it. Let's let it cook. In the meantime, let's test the SVG capabilities.
This is a living stained glass laboratory. And this is not a generation, guys. This is SVG code. That's code, ? That's vector graphics. And okay, we have a prompt. Thank you, Emily, for giving that. All . Well,
let's test that codeex. I'm going to do new project. , no. Here, new thread. Boom. I'm going to do medium for this. Create a new older SVG. Okay. And now let's copy this full prompt.
It's from here. There we go. Let's send it in this new folder. Boom. Let's see if it can make this. This is pretty insane. This is vector graphics. This is not AI generated image. This is
code. Let's see. Let's see how good it can make that. . What we have here? We have the Doom style game. Did it launch it? reference image blah. How to
start this? Give me step by step. Be very concise. Or is it running? We can do command J here to open the terminal natively. We need to cd into Doom style
game and then do swift run ember whatever emberth. Hopefully we have all the dependencies. Mac OS app should open. , why is it?
Okay, I don't know. It doesn't work over Codex, but let's try. Okay, we have a labyrinth WSD combat log. Okay, this is pretty bad. What's happening here? Is it full? , it's not full screen.
Okay, to make it full screen, how do we play this? Okay, I don't know. I'm going to screenshot this and say now the game is pretty unplayable. There is no controls, no movement, nothing. And it doesn't look
movement, nothing. And it doesn't look nowhere near as good as it could look. Go over the textures and make sure everything has custom generated textures. Use your image generation capabilities to generate the assets and then place the assets with code in a way a professional
with code in a way a professional developer would. Get to work and fix this. I'm going to boost it up with high because it's a lot more complex thing than the previous prompts. SVG. Let's check the SVG. Okay, let's look at it. It's not quite as detailed as the tweet
one, but this is still pretty insane. It's valid SVG. This is vector graphics, meaning it could be scaled infinitely, ? It's not a PNG that if you zoom in, you lose the quality. So that's
zoom in, you lose the quality. So that's yeah that's pretty good. So not quite as good. I think the tweet used the pro though. Yeah pro. Okay. So we're not using the same amount of inferences. This is not fair because on JGBD Pro obviously it would run for 20 minutes and use test time compute a
parallel test time compute to generate something much more impressive. So this one is good but it's not a fair comparison. Let's focus on these two. nice. I these graphics. I these graphics. Is it still running? Let's let it cook. What is this guy? Is
it symbol fixing output? Okay. So, I gave it a screenshot. I told you to fix it. Let's test another thing. We have another game. So, it can do really good web UI, it seems . All . Well, this one I can't, , see this one is not as impressive
know, see this one is not as impressive to recreate. It's really good graphics. So, , I'm going to show it to you as a tweet, but I'm not excited to recreate this one. It's a slide deck, , and very welldesigned, might I say. Let's look at this.
Incredible agentic leap with GBD 5.5 from autonomous solving Rubik's cube in the eye browser. 10 moves to searching Gmail. Okay, I guess we'll see how good it is in OpenClaw or Hermes agent. So I guess
he provided that as a raster image and generated SVG of this. Nice. Let's check back on our codeex app to see how these games are doing. Okay, we have more graphics. So okay, it's taking long because it image generation is taking
because it image generation is taking long. That's the thing. This is using the GPT images V2 model 2.0 model which just came out. There it is 3 days ago. And again, I'm making a video on this tomorrow. So if you want to if you don't want to miss that, make sure
you don't want to miss that, make sure to subscribe. Okay. So, yeah, this is the best image model in the world. , you can see that it can do all kinds of generations, graphics, textures, realism, posters, iPhone, art deco.
iPhone, art deco. Yeah, this is going to be insane for game design. , it's generating these graphics, but each generation takes some time, . That's the thing. , them. Okay. Okay. Now we have some monsters. So obviously cobblestone textures,
So obviously cobblestone textures, it's fine. this would save you time if you're a game developer for sure, but it's not the thing that's going to shock you. But this is good. This generating this pixel art yourself would man incredibly hard. You have to be a
man incredibly hard. You have to be a talented artist to do that. If you're just a game deer, that's one of many skills you need. All . Maybe telling it to do these textures was a bad idea because it's waiting on these API calls. M. This is beautiful. It used the playright skill and found some issues and now it's fixing them by
issues and now it's fixing them by itself. Wow. See what's happening. Okay, we have it. I don't know if it's going to look as good as the tweet, but hey, we'll see, I guess. Okay, here I need to stop it
guess. Okay, here I need to stop it from creating textures. I'm going to say stop creating textures and finish the game. Damn, it's testing the app and see how the wind works.
wind works. Look at it. It ran multiple tests and it doesn't how the wind is running and then fixes the app. Yeah, we're This is incredible. I wish I had this when I was building vectal in the 2024
era was using sonet 3.5 and there was nothing no clo code no codecs and the models definitely couldn't test your app. It's absolutely revolutionary guys. It's never been better time to build software. I don't care. you need to be building software whether it's
to be building software whether it's custom tools internally for your team for yourself whether it's a SAS whether it's just I don't know anything automations cool scripts Python scripts anything that saves you time or makes you more productive custom user interfaces for
productive custom user interfaces for yourself just build software you really don't you don't need to be developer you don't it's pure excuse everyone on my team now is building software even people who have zero technical background. People
zero technical background. People who come from media content creation, they are building software because it's just so easy. You speak in plain English and tools Codex or Cloud Code can just do it for you. And by the way, if you do want to master AI coding, then make sure to join the new
by the way, if you do want to master AI coding, then make sure to join the new society because we just completely revamped the classroom. This is the single best way to be able to build anything with AI. In this three weeks, you're going to master AI coding. I can guarantee you that. even if you're a complete beginner. And the reason I can guarantee you is because every single
complete beginner. And the reason I can guarantee you is because every single day I see a new post in the winds channel of people first time building a project, first time pushing to GitHub, learning how to use cloud codex, building new platform, deploying their apps, building businesses on top of it in just matter of a few hours. So
of it in just matter of a few hours. So I've seen hundreds of people do it. We have endless testimonials, endless reviews. So if they can do it, people with no development background, no technical background in these three weeks, just following these step-by-step modules, which I made super easy to
modules, which I made super easy to follow on purpose, one minute, two minute, three minute modules that get you step by step through everything you need to know to build any app to run it, to develop it with cloud code, and to deploy it full stack in the simplest way possible. In these three weeks, you're
possible. In these three weeks, you're going to master a coding. And if you go through these modules, I guarantee you will be able to build any software you want with the help of AI coding agents. So if you're serious about AI and you want to master coding, make sure to join the new society. It's going to be the first link below the video. Now let's jump back to
below the video. Now let's jump back to Codex to see if it finished testing the app. Okay, this I'm going to say stop testing and just run the website. Okay, this Codex is being overly perfectionistic. It did 10 different tests already. I would have
different tests already. I would have tested myself. Okay. , it was testing mobile here. Okay, nobody cares. We're We're on desktop. We don't have time for mobile. , both of these games finished. , everything finished at once, of course. All . We have a our website here. Let's run it. Okay.
our website here. Let's run it. Okay. Let's let's put it full screen to give it justice. Okay. Can we hide the chat or no? So, we have the we have the flag. Okay. That's
nice. We got this is very realistic. , turbulence stiffness. , damn. I think it's not as visually pleasing as the tweet. Where is this tweet? Yeah, I think here the flag
looks better, but this is very good. Let's amp up the wind. I think the graphics and the physics might be better. , not the graphics,
might be better. , not the graphics, the physics I think might be better than the tweet. but wow. I can guys, this is crazy. Okay, let's look at the games. Let's close
this up. Boom. Close this. Boom. Boom. Boom. Open up this sidebar. 3D dungeon game preview as well. All , click the arena, lock your
aim, and cut through the first wave. It's a bit too dark here. And that's not good. I'm going to take a , wait. Okay, wait. , I don't see anything. Never mind.
I'm going to screenshot this. The game is way too dark. Fix this. All , let's check this Doomstyle game. Clear. Let's run this.
See where it is. Okay, here we are. Labyrinth WSD space. All , let's enter the game. Where are we? How does this work? This game is sketch though, not going to lie.
Am I just bad at playing the lab? Okay, I understand the game now. I'm looking at the map. So, the game works. The graphics are very confusing. Some stone. , it teleported me to the next level.
, it teleported me to the next level. , I see. That was going to be a boss. Space. Boom. Boom. Boom. Attack. , I killed him. Nice. So yeah, the graphics are the problem here. , the textures are good, but
here. , the textures are good, but it's just not intuitive at all. Let's go to the next level. Yeah, it's the game works, but , , it's it would need some time and care . What is this bat?
bat? I killed him. Yeah, whatever. It's not the best game. Let's check the dungeon arena. Okay, here we are. This looks much better. How do you attack? Wait, wait. The graphics again are
, with the sword. Clicking. I don't know how you attack. Okay, next level. Warrior vitality. Okay. With space. You have to go
Okay. With space. You have to go through them this. I don't know what the mechanics are. You need to keep your distance or something. Again, we would need to play around to graphics. Obviously, this is much better than the previous game already cuz it's
3D. I can move around. Okay, I'm damaging them. All , I'm about to die probably. I'm playing it incorrectly. no, I don't know the range.
Yeah, I'm dead. But yeah, anyways, first impressions, it is a better model. Question is by how much. We're going to see because , I don't want to overhype it. A lot of people, they just overhype every model. And the real thing isn't benchmarks or day one impressions.
isn't benchmarks or day one impressions. It is how it fits in your , workflows, in your automations, in your AI agents. if you start running it in OpenClaw or Hermes agent, will it be good? We're going to see. takes a couple of days to figure out, but I'm going to be testing it every single day and I'm
be testing it every single day and I'm going to let . As I mentioned, I'm making a video on the GPT images 2.0 model, which is the best image model in the world, and how to make videos by that. So, make sure to subscribe. That should be coming tomorrow. So subscribe to not miss
Related Transcripts
Hermes Agent is insane… 100,000+ github stars
David Ondrej
Claude Code + Karparthy’s AutoResearch = GOD MODE
Jack Roberts
Comment "MANUS" to get this New OpenClaw Like AI Agent that runs locally on your machine.
Prajwal Tomar
Я посадил ИИ в свой Obsidian и он изменил мою работу
Рустам Агамалиев
Tate Confidential #342
Sesto elemento
Comment "CLAUDE" to get this Github Repo packed with Claude's AI System Prompts…
Nick Saraev
Never lose a moment again
Save videos from YouTube, Instagram, and TikTok. Search across all your transcripts with AI-powered semantic search.
Start Saving Videos — It's Free