ChatGPT Image 1.5 vs. Gemini Nano Banana Pro: the ultimate AI image battle
ChatGPT just dropped Image 1.5 with a clear pitch: better instruction-following, sharper editing, more legible text. The model it has to dethrone is Google Gemini Nano Banana Pro, currently the strongest image model in the wild. So I ran nine head-to-head tests — accuracy, editing, text rendering, speed, style transfer — to settle it. Here's the verdict.
The contenders
ChatGPT Image 1.5 vs. Gemini Nano Banana Pro. Same prompts, same time of day, same number of tries. No cherry-picking.
Test 1: Simple object generation (speed)
A basic prompt — a coffee mug on a wooden table, soft window light. One model returned the image in under 6 seconds. The other took almost three times as long. Speed matters when you're iterating on ad concepts.
Test 2: Complex 6x6 grid (accuracy)
I asked for a 6×6 grid of distinct objects with specific attributes per cell. This is where most models fall apart — instruction-following at scale. One model nailed 34 of 36 cells. The other missed 11.
Test 3: Realistic newspaper ad from markdown
Markdown in, layered newspaper-style ad out. Photorealistic newsprint, accurate columns, real-feeling typography. One model produced something you could almost mistake for a scanned page. The other got the layout but missed the texture.
Test 4: Caloric deficit chart (text + accuracy)
Charts are the AI imaging final boss. Numbers on axes, labels in legends, lines that match the data. One model produced a clean, readable chart. The other shipped a chart with two mislabeled bars — useless in an actual presentation.
Tests 5-7: The editing battle (replace, add, age-shift)
Replacing the newspaper with a coffee mug, adding a pigeon to a park bench scene, making a man younger by 20 years — all in-place edits without rerolling the full image. This is the part where most users feel the gap most. One model handles in-place edits with surgical precision; the other re-paints half the image and you lose your composition.
Test 8: Movie poster challenge (text)
Big bold title, tagline, credits at the bottom — the hardest text-rendering test in image AI. One model produced legible, hierarchically correct text. The other still slips letters and warps kerning.
Test 9: Retro photo style transfer (the tie-breaker)
This is the test I stayed up for. Take a clean digital image and convert it to a printed retro photograph — the kind that has lens glare, slight color shift, scratches, and the unmistakable physicality of paper. Genuinely impressive on the winning side. The losing side gave me a filter, not a transformation.
Final verdict: which AI won?
I save the verdict for the video — but the cumulative score is decisive. One model wins on speed, accuracy, in-place editing, and style transfer. The other still has the edge in one specific category. Watch the test that breaks the tie at the 17:42 mark.
Beyond images: the AI Media Machine
Images alone don't make ads. You need text-to-image, image-to-video, voiceover, music, and a final-cut editor — wired together. That's exactly what the AI Media Machine does in 12 connected apps. Try it for $1, or book a free strategy call and we'll wire the whole thing up for your business.