We're in early access — onboarding jewelry brands one at a time

← Back to Academy

Published

December 20, 2025

Read time

12 min read

Author

studio formel

AI Image Generation Studios & Models: The Complete 2025 Guide

Every major AI image generation company and their models, updated for December 2025. Focus on Replicate-accessible models with image-to-image support.

The AI image generation landscape has exploded in 2025. What started with a handful of players has become a competitive ecosystem of studios, each with multiple models optimized for different use cases.

This guide covers every major AI image generation studio and their available models, with a focus on models accessible via Replicate that support image-to-image (reference image) workflows.


TL;DR: Quick Recommendations

Need the absolute best quality? FLUX.2 Max or Imagen 4 Ultra. Both excel at photorealism with fine detail rendering. FLUX.2 Max supports reference images; Imagen does not.

Need text in your images? Ideogram V3 Quality is the industry leader. It handles long sentences, logos, and precise positioning better than any competitor.

Need speed on a budget? FLUX.1 Schnell (<1 second) or Photon Flash (~3 seconds). Both are dramatically cheaper than premium models while maintaining good quality.

Need open source? Stable Diffusion 3.5 Large or Qwen-Image. Run locally with no API costs, massive LoRA ecosystem, full customization control.

Need character consistency? Nano Banana Pro (up to 14 reference images), FLUX.2 Flex (up to 10), or Runway Gen-4 with @character tagging.

Need Chinese text? Qwen-Image is the only model with commercial-grade Chinese typography.


Visual Comparison: Same Prompt, 44 Models

We ran the same prompt through every Replicate-accessible model to show how each interprets identical instructions.

Prompt: “A young woman in a navy linen blazer, leaning against a terracotta-colored wall. Direct sunlight casting sharp geometric shadows across her face. She wears small gold hoop earrings and a delicate gold chain necklace. Natural makeup, slicked-back hair. She looks off-camera with a calm, confident expression. Shot on medium format, editorial fashion photography, Vogue Italia aesthetic.”

Black Forest Labs

FLUX.2 Maximg2img
FLUX.2 Max ↗ 2025

Highest fidelity. 4MP, ~30s. Best for hero images.

FLUX.2 Proimg2img
FLUX.2 Pro ↗ 2025

Production standard. 2MP, ~10s. Best quality/speed balance.

FLUX.2 Fleximg2img
FLUX.2 Flex ↗ 2025

Up to 10 reference images. Best for character consistency.

FLUX.2 Devimg2img
FLUX.2 Dev ↗ 2025

Research/non-commercial. Open weights for experimentation.

FLUX.1 Proimg2img
FLUX.1 Pro ↗ 2024

Original flagship. 12B params, strong photorealism.

FLUX.1.1 Proimg2img
FLUX.1.1 Pro ↗ 2024

6x faster than 1.0 Pro. Better image quality, diversity.

FLUX.1.1 Pro Ultraimg2img
Pro Ultra ↗ 2024

4MP raw output. Up to 2K resolution for large prints.

FLUX.1 Devimg2img
FLUX.1 Dev ↗ 2024

Open weights, guidance-distilled. Fine-tuning base.

FLUX.1 Schnell
FLUX.1 Schnell ↗ 2024

Apache 2.0 license. 4 steps, fastest FLUX variant.

Kontext Maximg2img
Kontext Max ↗ 2025

Text+image context. Highest quality editing model.

Kontext Proimg2img
Kontext Pro ↗ 2025

Fast context editing. Good balance of speed and quality.

FLUX Dev LoRAimg2img
Dev LoRA ↗ 2024

Custom style training. Load your own LoRA weights.

FLUX Schnell LoRA
Schnell LoRA ↗ 2024

Fast LoRA inference. 4-step with custom styles.

FLUX Krea Devimg2img
Krea Dev ↗ 2024

Krea-optimized FLUX Dev. Enhanced for creative tools.

Google

Imagen 4 Ultra
Imagen 4 Ultra ↗ 2025

Highest quality. Fine fabric/water/fur rendering.

Imagen 4
Imagen 4 ↗ 2025

Standard tier. Enhanced typography for posters/cards.

Imagen 4 Fast
Imagen 4 Fast ↗ 2025

Speed-optimized Imagen 4. Quick iterations, good quality.

Imagen 3
Imagen 3 ↗ 2024

Previous flagship. Strong photorealism, reliable output.

Imagen 3 Fast
Imagen 3 Fast ↗ 2024

Speed variant of Imagen 3. Good for prototyping.

Nano Banana Proimg2img
Nano Banana Pro ↗ 2025

Gemini-based. Top benchmark performer, strong consistency.

Nano Bananaimg2img
Nano Banana ↗ 2025

Standard Gemini variant. Good balance of speed and quality.

Gemini 2.5 Flashimg2img
Gemini 2.5 Flash ↗ 2025

Fast multimodal. Best for conversational image editing.

OpenAI

GPT Image 1.5img2img
GPT Image 1.5 ↗ 2025

Native ChatGPT integration. Best for iterative, conversational workflows.

DALL-E 3
DALL-E 3 ↗ 2023

Natural language prompting. Uses GPT-4 to expand prompts automatically.

DALL-E 2
DALL-E 2 ↗ 2022

Pioneer of img2img. Inpainting, outpainting, variations.

Stability AI

SD 3.5 Largeimg2img
SD 3.5 Large ↗ 2024

8B params, MMDiT architecture. Highest quality open-source option.

SD 3.5 Large Turboimg2img
SD 3.5 Turbo ↗ 2024

Distilled Large. 4-step generation for fast iterations.

SD 3.5 Mediumimg2img
SD 3.5 Medium ↗ 2024

2.5B params. Quality/speed balance, lower resource usage.

SD 3img2img
SD 3 ↗ 2024

First MMDiT model. Triplet text encoder for better prompt understanding.

Ideogram

V3 Qualityimg2img
V3 Quality ↗ 2025

Best text rendering in the industry. Posters, logos, signage.

V3 Balancedimg2img
V3 Balanced ↗ 2025

Quality/speed sweet spot. Good for most production work.

V3 Turboimg2img
V3 Turbo ↗ 2025

Fastest V3. Great for rapid prototyping with text.

V2aimg2img
V2a ↗ 2024

Enhanced realism over V2. Better anatomy and composition.

V2a Turboimg2img
V2a Turbo ↗ 2024

Fast V2a for quick iterations. Budget-friendly.

V2img2img
V2 ↗ 2024

Core V2 model. 1280x1280 resolution, solid text rendering.

V2 Turboimg2img
V2 Turbo ↗ 2024

Fast V2 variant. Lower cost, good for testing.

Alibaba

Qwen-Imageimg2img
Qwen-Image ↗ 2025

Best for Chinese text. Strong multi-language support.

ByteDance

Seedream 4.5img2img
Seedream 4.5 ↗ 2025

Latest flagship. Fast generation, commercial-friendly license.

Seedream 4img2img
Seedream 4 ↗ 2025

Production stable. Reliable for batch processing.

Seedream 3
Seedream 3 ↗ 2024

Previous generation. Good value, lower cost.

Dreamina 3.1
Dreamina 3.1 ↗ 2024

Consumer-focused. Optimized for everyday use.

Luma AI

Photonimg2img
Photon ↗ 2024

Fast, high-quality. From the Dream Machine team.

Photon Flashimg2img
Photon Flash ↗ 2024

Ultra-fast variant. Best for real-time applications.

Runway

Gen-4 Imageimg2img
Gen-4 Image ↗ 2025

Reference-based generation. Strong style consistency.


Quick Reference: Studios with Replicate + Img2Img Support

StudioFlagship ModelKey StrengthImg2Img Models
Black Forest LabsFLUX.2 MaxPhotorealism, text rendering13
Google DeepMindNano Banana ProQuality, consistency3
OpenAIGPT Image 1.5Conversational generation3
Stability AISD 3.5 LargeOpen source, customization6
IdeogramV3 QualityText in images8
Alibaba (Qwen)Qwen-ImageChinese text rendering4
ByteDanceSeedream 4.5Speed, commercial use5
Luma AIPhotonFast generation2
RunwayGen-4 ImageReference-based2

Black Forest Labs (FLUX)

Black Forest Labs was founded on August 1, 2024 by former Stability AI researchers who created Stable Diffusion. The founders previously researched AI at LMU Munich under Björn Ommer. The company achieved unicorn status within months and closed a $300M Series B in December 2025. Key partners include Adobe, Canva, Meta, and xAI (Grok integration).

Model Timeline: FLUX.1 (August 2024) → FLUX.1.1 Pro (October 2024) → FLUX.2 (November 2025)

Known for photorealism, accurate text rendering, and strict prompt adherence.

FLUX.2 vs FLUX.1: What Changed?

FLUX.1FLUX.2
Parameters12B32B (with Mistral-3 VLM)
Max Resolution1-2 MP4 MP
Reference ImagesLimitedUp to 10
TypographyGoodLegible fine text, UI elements

Key FLUX.2 Improvements: New VAE, 32K token context, better skin/fabric micro-details.

Model Variants

VariantSpeedBest For
Max~30sHero images, final production
Pro~5sProfessional workflows
Dev~2sDevelopment, fine-tuning
Schnell<1sRapid iteration

Kontext = text-based image editing (not generation)

FLUX.2 Series

FLUX.2 Maximg2img
FLUX.2 Max ↗ 2025

Highest fidelity. 4MP, ~30s. Best for hero images.

FLUX.2 Proimg2img
FLUX.2 Pro ↗ 2025

Fast (~5s), 8 reference images. Professional workflows.

FLUX.2 Fleximg2img
FLUX.2 Flex ↗ 2025

Optimized for img2img editing and style transfer.

FLUX.2 Devimg2img
FLUX.2 Dev ↗ 2025

Open weights. Best for development and fine-tuning.

FLUX.1 Series

FLUX.1 Proimg2img
FLUX.1 Pro ↗ 2024

12B params. Original flagship for production use.

FLUX.1.1 Proimg2img
FLUX.1.1 Pro ↗ 2024

Improved quality over 1.0. Better prompt adherence.

FLUX.1.1 Pro Ultraimg2img
Pro Ultra ↗ 2024

Highest res in 1.x series. Up to 2MP output.

FLUX.1 Devimg2img
FLUX.1 Dev ↗ 2024

Open weights. 28 steps, ~2s. For fine-tuning.

FLUX.1 Schnell
Schnell ↗ 2024

Fastest (<1s). 4 steps. For rapid prototyping.

FLUX Kontext Series

Kontext Maximg2img
Kontext Max ↗ 2024

Text-based editing. Transform style, clothing via prompts.

Kontext Proimg2img
Kontext Pro ↗ 2024

Faster Kontext. ~4s per edit. Maintains composition.

FLUX LoRA Variants

FLUX Dev LoRAimg2img
Dev LoRA ↗ 2024

Custom style training. Load your own LoRA weights.

FLUX Schnell LoRA
Schnell LoRA ↗ 2024

Fast LoRA inference. 4-step with custom styles.

FLUX Krea Devimg2img
Krea Dev ↗ 2024

Krea-optimized FLUX Dev. Enhanced for creative tools.

Hardware Requirements: FLUX.1 dev requires ~24GB VRAM


Google DeepMind (Imagen / Gemini)

Google’s AI lab brings deep pockets and research talent to image generation. Imagen 4 is their flagship, but the real story is Nano Banana Pro—a Gemini-based model that quietly dominated anonymous benchmarks before being identified. For reference image workflows, Nano Banana Pro supports up to 14 input images, more than any competitor.

The team uses SynthID invisible watermarking to prevent deepfakes and supports text rendering in 7+ languages.

Model Timeline: Imagen (May 2022) → Imagen 2 (December 2023) → Imagen 3 (August 2024) → Imagen 4 (May 2025, Google I/O)

Google’s image generation spans the Imagen series and Gemini-based models (marketed as “Nano Banana” on LMArena’s anonymous benchmark).

Imagen Evolution

Imagen 2Imagen 3Imagen 4
TypographyBasicImprovedSignificantly enhanced
DetailStandardFewer artifactsFine fabrics, water, fur
SpeedStandardStandardFast variant 10× faster

Imagen Series

Imagen 4 Ultra
Imagen 4 Ultra ↗ 2025

Highest quality. Fine fabric/water/fur rendering.

Imagen 4
Imagen 4 ↗ 2025

Standard tier. Enhanced typography for posters/cards.

Imagen 4 Fast
Imagen 4 Fast ↗ 2025

Speed-optimized Imagen 4. Quick iterations, good quality.

Imagen 3
Imagen 3 ↗ 2024

Previous flagship. Strong photorealism, reliable output.

Imagen 3 Fast
Imagen 3 Fast ↗ 2024

Speed variant of Imagen 3. Good for prototyping.

Gemini Image Models

Nano Banana Proimg2img
Nano Banana Pro ↗ 2025

Gemini-based. Top benchmark performer, strong consistency.

Nano Bananaimg2img
Nano Banana ↗ 2025

Standard Gemini variant. Good balance of speed and quality.

Gemini 2.5 Flashimg2img
Gemini 2.5 Flash ↗ 2025

Fast multimodal. Best for conversational image editing.

Key Capabilities

  • Character consistency across generations
  • Image blending and editing
  • Accurate text rendering in multiple languages
  • Up to 4K resolution

OpenAI (GPT Image / DALL-E)

OpenAI invented the category with DALL-E in 2021, but their real advantage today is conversational iteration. GPT Image 1.5 integrates directly with ChatGPT, letting you refine images through natural dialogue: “make the background warmer” or “add a second person on the left.” If your workflow involves back-and-forth refinement, this is uniquely powerful.

In March 2025, DALL-E 3 was replaced by GPT Image’s native multimodal generation in ChatGPT. All outputs include C2PA metadata for provenance tracking.

Model Timeline: Image GPT (June 2020) → DALL-E (January 2021) → DALL-E 2 (April 2022) → DALL-E 3 (October 2023) → GPT Image 1 (March 2025) → GPT Image 1.5 (December 2025)

OpenAI offers GPT Image models and the legacy DALL-E series.

GPT Image Models

GPT Image 1.5img2img
GPT Image 1.5 ↗ 2025

Native ChatGPT integration. Best for iterative, conversational workflows.

DALL-E Models

DALL-E 3
DALL-E 3 ↗ 2023

Natural language prompting. Uses GPT-4 to expand prompts automatically.

DALL-E 2
DALL-E 2 ↗ 2022

Pioneer of img2img. Inpainting, outpainting, variations.

Key Features

  • Native multimodal generation
  • Conversational refinement through chat
  • Context-aware iterations
  • C2PA metadata on all outputs

Stability AI (Stable Diffusion)

If you want to run models locally, train your own styles, or avoid recurring API costs, Stable Diffusion is the answer. It’s fully open source with a massive ecosystem of community fine-tunes, LoRAs, and tools. The trade-off: you’ll need a decent GPU (8GB+ VRAM) and some technical comfort. For maximum customization at minimum cost, nothing else comes close.

Stability AI revolutionized the industry in August 2022 by making model weights freely available. SD 3.5 Large represents their current flagship.

Model Timeline: SD 1.x (August 2022) → SD 2.0 (November 2022) → SDXL (July 2023) → SD 3 (February 2024) → SD 3.5 (October 2024)

Pioneered open-source image generation. Important for customization and fine-tuning.

SD Version Comparison

SD 1.5SDXLSD 3.5
Parameters983M3.5B8B
ArchitectureUNetUNetDiffusion Transformer
Text GenerationPoorBetterBest in series
VRAM~6GB~12GB~20GB

Trade-offs: SD 3.5 is slower (1+ min) but has market-leading prompt adherence. Still struggles with hands.

Stable Diffusion 3.5 Series

SD 3.5 Largeimg2img
SD 3.5 Large ↗ 2024

8B params, MMDiT architecture. Highest quality open-source option.

SD 3.5 Large Turboimg2img
SD 3.5 Turbo ↗ 2024

Distilled Large. 4-step generation for fast iterations.

SD 3.5 Mediumimg2img
SD 3.5 Medium ↗ 2024

2.5B params. Quality/speed balance, lower resource usage.

SD 3img2img
SD 3 ↗ 2024

First MMDiT model. Triplet text encoder for better prompt understanding.

Key Advantages

  • Fully open source
  • Massive ecosystem of LoRAs and fine-tuned models
  • Run locally without API costs

Hardware Requirements

  • SDXL: 8GB+ VRAM
  • SD 3.5: 12GB+ VRAM

Ideogram

Need text in your images that actually looks right? Ideogram is the clear leader. Whether it’s a logo, poster, storefront sign, or book cover, V3 Quality renders long sentences, precise positioning, and complex typography that other models mangle. No other model comes close for text-heavy designs.

Founded by former Google Imagen researchers. Co-founder Jonathan Ho authored the foundational 2020 paper on diffusion models. First to render coherent text in images at launch.

Model Timeline: Ideogram 0.1 (August 2023) → Ideogram 1.0 (February 2024) → Ideogram 2.0 (August 2024) → Ideogram 3.0 (March 2025)

Stats: 7M+ creators, 600M+ images generated.

Leader in text rendering within images.

Version Evolution

1.02.03.0
Text ClarityGoodImprovedComplex layouts
StylesBasic20+50+ presets
Key FeatureFirst coherent textRealism + stylesStyle references

V3 Variants: Quality vs Balanced vs Turbo

VariantSpeedCostUse Case
Quality~9s$0.09Final production
Balanced~4s$0.06General use
Turbo~1s$0.02Rapid iteration

Ideogram V3 Series

V3 Qualityimg2img
V3 Quality ↗ 2025

Best text rendering in the industry. Posters, logos, signage.

V3 Balancedimg2img
V3 Balanced ↗ 2025

Quality/speed sweet spot. Good for most production work.

V3 Turboimg2img
V3 Turbo ↗ 2025

Fastest V3. Great for rapid prototyping with text.

Ideogram V2 Series

V2aimg2img
V2a ↗ 2024

Enhanced realism over V2. Better anatomy and composition.

V2a Turboimg2img
V2a Turbo ↗ 2024

Fast V2a for quick iterations. Budget-friendly.

V2img2img
V2 ↗ 2024

Core V2 model. 1280x1280 resolution, solid text rendering.

V2 Turboimg2img
V2 Turbo ↗ 2024

Fast V2 variant. Lower cost, good for testing.

Key Capabilities

  • Long text strings including sentences
  • Precise text positioning
  • Multilingual text support
  • Style references (up to 3 images)

Alibaba (Qwen-Image)

Targeting the Chinese market or need proper Chinese typography in your images? Qwen-Image is the only model that renders Chinese characters with commercial-grade accuracy. It’s also fully open source (Apache 2.0), so you can run it locally without API costs—making it a compelling Stable Diffusion alternative for bilingual workflows.

20B parameter MMDiT model with multi-line Chinese and English text layouts.

Model Timeline: Qwen2-VL (September 2024) → Qwen-Image (August 2025) → Qwen-Image-Edit (August 2025) → Qwen-Image-Layered (December 2025)

20 billion parameter model. First open-source model with accurate Chinese text rendering.

Image Generation Models

Qwen-Imageimg2img
Qwen-Image ↗ 2025

Best for Chinese text. Strong multi-language support.

Key Capabilities

  • Commercial-grade Chinese text rendering
  • Bilingual (English + Chinese)
  • Multi-line text layouts
  • Layered output for editing

ByteDance (Seedream)

TikTok’s parent company quietly built one of the best image generators. Seedream 4.5 combines exceptional speed (~3 seconds for 2K images), high benchmark scores (ELO 1,222), and commercial-friendly licensing. If you need to generate images at scale with predictable costs, Seedream deserves serious consideration.

Seedream 4.0 surpassed Gemini 2.5 Flash and OpenAI models on benchmarks. Generates 2K images in ~3 seconds with 94% text accuracy (Chinese and English).

Model Timeline: Seedream 2.0 (December 2024) → Seedream 3.0 (April 2025) → Seedream 4.0 (September 2025) → Seedream 4.5 (November 2025)

Doubao platform leads China’s AI market. Seedream models compete with GPT-4o and Midjourney.

Seedream Evolution

3.04.04.5
Max Resolution2K4K4K
Reference ImagesBasicUp to 10Up to 14
Key Feature3s speedELO 1,222Story scenes

4.5 New: Group generation mode for story scenes and character variations.

Seedream Series

Seedream 4.5img2img
Seedream 4.5 ↗ 2025

Latest flagship. Fast generation, commercial-friendly license.

Seedream 4img2img
Seedream 4 ↗ 2025

Production stable. Reliable for batch processing.

Seedream 3
Seedream 3 ↗ 2024

Previous generation. Good value, lower cost.

Dreamina 3.1
Dreamina 3.1 ↗ 2024

Consumer-focused. Optimized for everyday use.

Key Capabilities

  • Speed: 2K images in ~3 seconds
  • 94% text accuracy (Chinese and English)
  • Optimized for commercial use
  • Up to 4K resolution (Seedream 4)

Luma AI (Photon)

Luma AI is primarily known for video (Dream Machine, Ray3), but Photon deserves attention for image generation. It’s exceptionally fast—Photon Flash runs at $0.002/image—and excels at character consistency with adjustable reference weights. A sleeper pick for high-volume workflows where cost and speed matter more than bleeding-edge quality.

Raised $1.07B total, including $900M Series C in 2025. Partnered with Adobe to integrate Ray3 into Firefly. 30M+ users.

Product Timeline: Dream Machine (June 2024) → Photon (November 2024) → Ray3 (September 2025)

Stats: 30M+ users.

Photon is their image generation model. Known for speed.

Photon vs Photon Flash

PhotonPhoton Flash
Speed~11s~3s
Cost$0.03$0.002
Best ForProductionIteration

Key Features: Character consistency, multi-reference support, adjustable reference weights.

Image Models

Photonimg2img
Photon ↗ 2024

Fast, high-quality. From the Dream Machine team.

Photon Flashimg2img
Photon Flash ↗ 2024

Ultra-fast variant. Best for real-time applications.

Key Capabilities

  • High generation speed (8x faster than competitors)
  • High-resolution output
  • Image, style, and character reference support

Pricing

  • Free tier available
  • Subscriptions: $9.99 - $99.99/month

Runway

Runway is the choice for film and TV production—their tools appear in Everything Everywhere All at Once and Amazon’s House of David. Gen-4 Image excels at maintaining character identity across scenes using @character and @location tagging. If you’re building visual narratives that need consistent characters across multiple frames, this is purpose-built for that workflow.

Co-released Stable Diffusion in August 2022. Total funding: $544M.

Model Timeline: Stable Diffusion co-release (August 2022) → Gen-1/Gen-2 (February 2023) → Gen-3 (June 2024) → Act-One (October 2024) → Gen-4 (April 2025) → Gen-4.5 (December 2025)

Industry Use: Everything Everywhere All at Once, The Late Show with Stephen Colbert, Amazon’s House of David (350+ AI shots in Season 2).

Known primarily for video, Runway also offers image generation capabilities.

Gen-4 Image vs Turbo

Gen-4 ImageGen-4 Turbo
SpeedStandard2.5× faster
720p Cost$0.05Lower
1080p Cost$0.08Lower

Key Feature: Reference-based with 1-3 images. Tag with @character, @location for control.

Image Models

Gen-4 Imageimg2img
Gen-4 Image ↗ 2025

Reference-based generation. Strong style consistency.

Key Features

  • Reference image support (up to 3 images)
  • High-quality generation
  • Turbo variant 2.5x faster

Notable Others

The following studios are significant in the AI image generation landscape but either lack Replicate access or don’t offer img2img capabilities via Replicate.

Midjourney

  • Website: https://www.midjourney.com/
  • Replicate: N/A
  • Founded: August 2021
  • Headquarters: San Francisco, California
  • Founder: David Holz (previously founded Leap Motion)

Unlike other AI startups, Midjourney is not VC-funded and has been profitable since August 2022. Runs Discord’s largest server (21M+ members as of May 2025). Web interface launched August 2024.

V7 vs V6 Comparison

V6/V6.1V7
ArchitecturePrevious genCompletely rebuilt
Speed~35sDraft: 4-5s (10× faster)
Hands/AnatomyStruggledSignificantly improved
Text ClarityBasicNear-perfect
Personalization200+ images5 minutes

V7 Key Features:

  • Draft Mode: 10× faster, half cost
  • Omni Reference (—oref): Blend styles, colors, lighting
  • Character Reference (—cref): Maintain identity across generations

When V6 is better: Stylized fictional world-building (V7 can feel “too clean”)

  • Availability: Discord bot, Web app, $10/month minimum

Leonardo AI

  • Website: https://leonardo.ai/
  • Replicate: N/A
  • Founded: December 2022
  • Headquarters: Sydney, Australia
  • Founders: JJ Fiasson, Ethan Smith, Jachin Bhasme
  • Acquired by: Canva (July 2024, ~$320M)

Originally focused on video game assets, Leonardo grew from 14,000 users (February 2023) to 19M users by end of 2023. Canva acquired Leonardo in July 2024; all 120 employees joined Canva. 1B+ images generated.

ModelKey Features
Phoenix 1.0 Ultra5MP+ resolution
Phoenix 1.0 FastSpeed-optimized

Adobe Firefly

Adobe Firefly focuses on commercial safety, trained on Adobe Stock and public domain content. 13B+ images generated since launch; ~1.5B assets/month.

Model Timeline: Firefly Beta (March 2023) → Image Model 2 (October 2023) → Image Model 3 (April 2024) → Image Model 4/4 Ultra (April 2025)

ModelResolution
Image Model 4 Ultra2K
Image Model 4Standard
  • Layered output (objects as editable layers)
  • Trained on licensed content (commercial-safe)
  • Adobe Creative Cloud integration

Recraft

Recraft V3 (codenamed “Red Panda”) achieved #1 on Hugging Face’s Text-to-Image Leaderboard with ELO 1172, outperforming DALL-E and Midjourney (October 2024).

Stats: 4M+ users, $5M+ ARR.

ModelReplicate Link
Recraft V3https://replicate.com/recraft-ai/recraft-v3
Recraft V3 SVGhttps://replicate.com/recraft-ai/recraft-v3-svg
Recraft 20Bhttps://replicate.com/recraft-ai/recraft-20b
Recraft 20B SVGhttps://replicate.com/recraft-ai/recraft-20b-svg
  • Long text generation (sentences, paragraphs)
  • Vector (SVG) output
  • No img2img support on Replicate

NVIDIA (Edify / SANA)

NVIDIA Edify (renamed from Picasso in September 2024) is the enterprise platform. SANA, developed with MIT, is 20× smaller and 100× faster than FLUX-12B while generating up to 4K images.

Product Timeline: Picasso/Edify (2023) → Edify rename (September 2024) → SANA (November 2024) → SANA-Video (October 2025)

Partnerships: Getty Images, Shutterstock, Adobe.

ModelReplicate Link
SANAhttps://replicate.com/nvidia/sana
SANA Sprint 1.6Bhttps://replicate.com/nvidia/sana-sprint-1.6b
  • Edify platform for enterprise (4K, custom training)
  • SANA models for research
  • No img2img support on Replicate

Frequently Asked Questions

What is image-to-image (img2img) generation?

Image-to-image generation lets you use existing images as reference inputs alongside your text prompt. Instead of generating from scratch, the model incorporates visual elements from your reference—like a product photo, a style example, or a character’s face—into the output. This is essential for maintaining consistency across marketing campaigns, product catalogs, and brand assets.

Which AI image generator has the best quality in 2025?

For pure photorealism, FLUX.2 Max and Imagen 4 Ultra lead the pack. FLUX.2 Max excels at fine details like skin texture and fabric rendering, while Imagen 4 Ultra handles materials like water, fur, and metallic surfaces exceptionally well. The choice depends on whether you need img2img support (FLUX.2 Max has it; Imagen does not).

What’s the fastest AI image generator?

FLUX.1 Schnell generates images in under 1 second at ~$0.003/image. For slightly higher quality with similar speed, Photon Flash (~3 seconds, $0.002/image) and Ideogram V3 Turbo (~1 second, $0.02/image) are excellent choices. Seedream also generates 2K images in roughly 3 seconds.

Which model is best for generating text in images?

Ideogram V3 Quality is the industry leader for text rendering. It handles long sentences, logos, signage, and complex typography that other models mangle. For Chinese text specifically, Qwen-Image is the only model with commercial-grade Chinese typography.

Can I run these models locally?

Yes, but only open-source models. Stable Diffusion 3.5 and Qwen-Image (Apache 2.0 license) can run locally without API costs. You’ll need a GPU with 8GB+ VRAM for SDXL or 12GB+ for SD 3.5. FLUX.1 Dev and FLUX.1 Schnell also have open weights for local use.

What is Replicate?

Replicate is a cloud platform that hosts AI models with a simple pay-per-use API. You don’t need to manage infrastructure—just send requests and get results. Most models in this guide are accessible via Replicate, making it easy to test different options before committing to one.

How do I maintain character consistency across images?

Several models support multi-reference inputs: Nano Banana Pro (up to 14 images), FLUX.2 Flex (up to 10), Seedream 4.5 (up to 14), and Runway Gen-4 (up to 3 with @character tagging). These let you feed in reference photos of a character to maintain consistent features across generations.


Pricing Comparison (December 2025)

StudioModelApproximate Cost per Image
Black Forest LabsFLUX.1 schnell~$0.003
Black Forest LabsFLUX.1 pro~$0.05
GoogleNano Banana~$0.04
OpenAIGPT Image 1~$0.04
Stability AISD 3.5Free (local) / ~$0.006 (API)
AlibabaQwen-ImageFree (open source)
ByteDanceSeedream 4.5~$0.01
Luma AIPhoton~$0.01-0.03
IdeogramV3~$0.02
RunwayGen-4 Image~$0.03

Prices approximate and may vary by resolution, tier, and volume.


Summary by Use Case

Photorealism with Img2Img

FLUX.2 Max/Pro/Flex, Nano Banana Pro, Seedream 4.5

Text in Images with Img2Img

Ideogram V3, Qwen-Image

Speed with Img2Img

FLUX.1 Dev, Luma Photon, Seedream 4

Open Source with Img2Img

Stable Diffusion 3.5, FLUX.1 Dev, Qwen-Image

Character/Style Reference

Luma Photon, Runway Gen-4, FLUX Kontext, Ideogram Character

Chinese Market

Qwen-Image, Seedream 4.5


This guide is updated regularly as new models are released. Last update: December 2025.



About studio formel

studio formel is an AI-powered creative platform built specifically for jewelry brands. We combine systematic research on AI generation with a flexible asset management system, helping jewelry sellers create professional images, videos, and ads at scale.

Learn more about our platform →

Ready to transform your jewelry photography?

Join jewelry brands using AI to create professional product images in seconds.

Get early access

Topics

AI image generation Replicate FLUX Stable Diffusion models