The AI image generation landscape has exploded in 2025. What started with a handful of players has become a competitive ecosystem of studios, each with multiple models optimized for different use cases.

This guide covers every major AI image generation studio and their available models, with a focus on models accessible via Replicate that support image-to-image (reference image) workflows.

TL;DR: Quick Recommendations

Need the absolute best quality? FLUX.2 Max or Imagen 4 Ultra. Both excel at photorealism with fine detail rendering. FLUX.2 Max supports reference images; Imagen does not.

Need text in your images? Ideogram V3 Quality is the industry leader. It handles long sentences, logos, and precise positioning better than any competitor.

Need speed on a budget? FLUX.1 Schnell (<1 second) or Photon Flash (~3 seconds). Both are dramatically cheaper than premium models while maintaining good quality.

Need open source? Stable Diffusion 3.5 Large or Qwen-Image. Run locally with no API costs, massive LoRA ecosystem, full customization control.

Need character consistency? Nano Banana Pro (up to 14 reference images), FLUX.2 Flex (up to 10), or Runway Gen-4 with @character tagging.

Need Chinese text? Qwen-Image is the only model with commercial-grade Chinese typography.

Visual Comparison: Same Prompt, 44 Models

We ran the same prompt through every Replicate-accessible model to show how each interprets identical instructions.

Prompt: “A young woman in a navy linen blazer, leaning against a terracotta-colored wall. Direct sunlight casting sharp geometric shadows across her face. She wears small gold hoop earrings and a delicate gold chain necklace. Natural makeup, slicked-back hair. She looks off-camera with a calm, confident expression. Shot on medium format, editorial fashion photography, Vogue Italia aesthetic.”

Black Forest Labs

FLUX.2 Max ↗ 2025
Highest fidelity. 4MP, ~30s. Best for hero images.

FLUX.2 Pro ↗ 2025
Production standard. 2MP, ~10s. Best quality/speed balance.

FLUX.2 Flex ↗ 2025
Up to 10 reference images. Best for character consistency.

FLUX.2 Dev ↗ 2025
Research/non-commercial. Open weights for experimentation.

FLUX.1 Pro ↗ 2024
Original flagship. 12B params, strong photorealism.

FLUX.1.1 Pro ↗ 2024
6x faster than 1.0 Pro. Better image quality, diversity.

FLUX.1.1 Pro Ultra — Pro Ultra ↗ 2024
4MP raw output. Up to 2K resolution for large prints.

FLUX.1 Dev ↗ 2024
Open weights, guidance-distilled. Fine-tuning base.

FLUX.1 Schnell ↗ 2024
Apache 2.0 license. 4 steps, fastest FLUX variant.

Kontext Max ↗ 2025
Text+image context. Highest quality editing model.

Kontext Pro ↗ 2025
Fast context editing. Good balance of speed and quality.

FLUX Dev LoRA — Dev LoRA ↗ 2024
Custom style training. Load your own LoRA weights.

FLUX Schnell LoRA — Schnell LoRA ↗ 2024
Fast LoRA inference. 4-step with custom styles.

FLUX Krea Dev — Krea Dev ↗ 2024
Krea-optimized FLUX Dev. Enhanced for creative tools.

Google

Imagen 4 Ultra ↗ 2025
Highest quality. Fine fabric/water/fur rendering.

Imagen 4 ↗ 2025
Standard tier. Enhanced typography for posters/cards.

Imagen 4 Fast ↗ 2025
Speed-optimized Imagen 4. Quick iterations, good quality.

Imagen 3 ↗ 2024
Previous flagship. Strong photorealism, reliable output.

Imagen 3 Fast ↗ 2024
Speed variant of Imagen 3. Good for prototyping.

Nano Banana Pro ↗ 2025
Gemini-based. Top benchmark performer, strong consistency.

Nano Banana ↗ 2025
Standard Gemini variant. Good balance of speed and quality.

Gemini 2.5 Flash ↗ 2025
Fast multimodal. Best for conversational image editing.

OpenAI

GPT Image 1.5 ↗ 2025
Native ChatGPT integration. Best for iterative, conversational workflows.

DALL-E 3 ↗ 2023
Natural language prompting. Uses GPT-4 to expand prompts automatically.

DALL-E 2 ↗ 2022
Pioneer of img2img. Inpainting, outpainting, variations.

Stability AI

SD 3.5 Large ↗ 2024
8B params, MMDiT architecture. Highest quality open-source option.

SD 3.5 Large Turbo — SD 3.5 Turbo ↗ 2024
Distilled Large. 4-step generation for fast iterations.

SD 3.5 Medium ↗ 2024
2.5B params. Quality/speed balance, lower resource usage.

SD 3 ↗ 2024
First MMDiT model. Triplet text encoder for better prompt understanding.

Ideogram

V3 Quality ↗ 2025
Best text rendering in the industry. Posters, logos, signage.

V3 Balanced ↗ 2025
Quality/speed sweet spot. Good for most production work.

V3 Turbo ↗ 2025
Fastest V3. Great for rapid prototyping with text.

V2a ↗ 2024
Enhanced realism over V2. Better anatomy and composition.

V2a Turbo ↗ 2024
Fast V2a for quick iterations. Budget-friendly.

V2 ↗ 2024
Core V2 model. 1280x1280 resolution, solid text rendering.

V2 Turbo ↗ 2024
Fast V2 variant. Lower cost, good for testing.

Alibaba

Qwen-Image ↗ 2025
Best for Chinese text. Strong multi-language support.

ByteDance

Seedream 4.5 ↗ 2025
Latest flagship. Fast generation, commercial-friendly license.

Seedream 4 ↗ 2025
Production stable. Reliable for batch processing.

Seedream 3 ↗ 2024
Previous generation. Good value, lower cost.

Dreamina 3.1 ↗ 2024
Consumer-focused. Optimized for everyday use.

Luma AI

Photon ↗ 2024
Fast, high-quality. From the Dream Machine team.

Photon Flash ↗ 2024
Ultra-fast variant. Best for real-time applications.

Runway

Gen-4 Image ↗ 2025
Reference-based generation. Strong style consistency.

Quick Reference: Studios with Replicate + Img2Img Support

Studio	Flagship Model	Key Strength	Img2Img Models
Black Forest Labs	FLUX.2 Max	Photorealism, text rendering	13
Google DeepMind	Nano Banana Pro	Quality, consistency	3
OpenAI	GPT Image 1.5	Conversational generation	3
Stability AI	SD 3.5 Large	Open source, customization	6
Ideogram	V3 Quality	Text in images	8
Alibaba (Qwen)	Qwen-Image	Chinese text rendering	4
ByteDance	Seedream 4.5	Speed, commercial use	5
Luma AI	Photon	Fast generation	2
Runway	Gen-4 Image	Reference-based	2

Black Forest Labs (FLUX)

Website: https://blackforestlabs.ai/
Replicate: https://replicate.com/black-forest-labs
Founded: August 2024
Headquarters: Freiburg im Breisgau, Germany
Founders: Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz
Valuation: $3.25B (December 2025)

Black Forest Labs was founded on August 1, 2024 by former Stability AI researchers who created Stable Diffusion. The founders previously researched AI at LMU Munich under Björn Ommer. The company achieved unicorn status within months and closed a $300M Series B in December 2025. Key partners include Adobe, Canva, Meta, and xAI (Grok integration).

Model Timeline: FLUX.1 (August 2024) → FLUX.1.1 Pro (October 2024) → FLUX.2 (November 2025)

Known for photorealism, accurate text rendering, and strict prompt adherence.

FLUX.2 vs FLUX.1: What Changed?

	FLUX.1	FLUX.2
Parameters	12B	32B (with Mistral-3 VLM)
Max Resolution	1-2 MP	4 MP
Reference Images	Limited	Up to 10
Typography	Good	Legible fine text, UI elements

Key FLUX.2 Improvements: New VAE, 32K token context, better skin/fabric micro-details.

Model Variants

Variant	Speed	Best For
Max	~30s	Hero images, final production
Pro	~5s	Professional workflows
Dev	~2s	Development, fine-tuning
Schnell	<1s	Rapid iteration

Kontext = text-based image editing (not generation)

FLUX.2 Series

FLUX.1 Series

FLUX Kontext Series

FLUX LoRA Variants

Hardware Requirements: FLUX.1 dev requires ~24GB VRAM

Google DeepMind (Imagen / Gemini)

Google’s AI lab brings deep pockets and research talent to image generation. Imagen 4 is their flagship, but the real story is Nano Banana Pro—a Gemini-based model that quietly dominated anonymous benchmarks before being identified. For reference image workflows, Nano Banana Pro supports up to 14 input images, more than any competitor.

Website: https://deepmind.google/
Replicate: https://replicate.com/google
Parent Company: Alphabet Inc.
CEO: Demis Hassabis
Merged: April 2023 (Google Brain + DeepMind)

The team uses SynthID invisible watermarking to prevent deepfakes and supports text rendering in 7+ languages.

Model Timeline: Imagen (May 2022) → Imagen 2 (December 2023) → Imagen 3 (August 2024) → Imagen 4 (May 2025, Google I/O)

Google’s image generation spans the Imagen series and Gemini-based models (marketed as “Nano Banana” on LMArena’s anonymous benchmark).

Imagen Evolution

	Imagen 2	Imagen 3	Imagen 4
Typography	Basic	Improved	Significantly enhanced
Detail	Standard	Fewer artifacts	Fine fabrics, water, fur
Speed	Standard	Standard	Fast variant 10× faster

Imagen Series

Gemini Image Models

Key Capabilities

Character consistency across generations
Image blending and editing
Accurate text rendering in multiple languages
Up to 4K resolution

OpenAI (GPT Image / DALL-E)

OpenAI invented the category with DALL-E in 2021, but their real advantage today is conversational iteration. GPT Image 1.5 integrates directly with ChatGPT, letting you refine images through natural dialogue: “make the background warmer” or “add a second person on the left.” If your workflow involves back-and-forth refinement, this is uniquely powerful.

Website: https://openai.com/
Replicate: https://replicate.com/openai
Founded: December 2015
Headquarters: San Francisco, California
CEO: Sam Altman

In March 2025, DALL-E 3 was replaced by GPT Image’s native multimodal generation in ChatGPT. All outputs include C2PA metadata for provenance tracking.

Model Timeline: Image GPT (June 2020) → DALL-E (January 2021) → DALL-E 2 (April 2022) → DALL-E 3 (October 2023) → GPT Image 1 (March 2025) → GPT Image 1.5 (December 2025)

OpenAI offers GPT Image models and the legacy DALL-E series.

GPT Image Models

DALL-E Models

Key Features

Native multimodal generation
Conversational refinement through chat
Context-aware iterations
C2PA metadata on all outputs

Stability AI (Stable Diffusion)

If you want to run models locally, train your own styles, or avoid recurring API costs, Stable Diffusion is the answer. It’s fully open source with a massive ecosystem of community fine-tunes, LoRAs, and tools. The trade-off: you’ll need a decent GPU (8GB+ VRAM) and some technical comfort. For maximum customization at minimum cost, nothing else comes close.

Website: https://stability.ai/
Replicate: https://replicate.com/stability-ai
Founded: 2020
Headquarters: London, UK
Founder: Emad Mostaque (resigned March 2024)
Current CEO: Prem Akkaraju (appointed June 2024)

Stability AI revolutionized the industry in August 2022 by making model weights freely available. SD 3.5 Large represents their current flagship.

Model Timeline: SD 1.x (August 2022) → SD 2.0 (November 2022) → SDXL (July 2023) → SD 3 (February 2024) → SD 3.5 (October 2024)

Pioneered open-source image generation. Important for customization and fine-tuning.

SD Version Comparison

	SD 1.5	SDXL	SD 3.5
Parameters	983M	3.5B	8B
Architecture	UNet	UNet	Diffusion Transformer
Text Generation	Poor	Better	Best in series
VRAM	~6GB	~12GB	~20GB

Trade-offs: SD 3.5 is slower (1+ min) but has market-leading prompt adherence. Still struggles with hands.

Stable Diffusion 3.5 Series

SD 3 ↗ 2024
First MMDiT model. Triplet text encoder for better prompt understanding.

Key Advantages

Fully open source
Massive ecosystem of LoRAs and fine-tuned models
Run locally without API costs

Hardware Requirements

SDXL: 8GB+ VRAM
SD 3.5: 12GB+ VRAM

Ideogram

Need text in your images that actually looks right? Ideogram is the clear leader. Whether it’s a logo, poster, storefront sign, or book cover, V3 Quality renders long sentences, precise positioning, and complex typography that other models mangle. No other model comes close for text-heavy designs.

Website: https://ideogram.ai/
Replicate: https://replicate.com/ideogram-ai
Founded: 2022
Headquarters: Toronto, Canada
Founders: Mohammad Norouzi (CEO), William Chan, Jonathan Ho, Chitwan Saharia

Founded by former Google Imagen researchers. Co-founder Jonathan Ho authored the foundational 2020 paper on diffusion models. First to render coherent text in images at launch.

Model Timeline: Ideogram 0.1 (August 2023) → Ideogram 1.0 (February 2024) → Ideogram 2.0 (August 2024) → Ideogram 3.0 (March 2025)

Stats: 7M+ creators, 600M+ images generated.

Leader in text rendering within images.

Version Evolution

	1.0	2.0	3.0
Text Clarity	Good	Improved	Complex layouts
Styles	Basic	20+	50+ presets
Key Feature	First coherent text	Realism + styles	Style references

V3 Variants: Quality vs Balanced vs Turbo

Variant	Speed	Cost	Use Case
Quality	~9s	$0.09	Final production
Balanced	~4s	$0.06	General use
Turbo	~1s	$0.02	Rapid iteration

Ideogram V3 Series

Ideogram V2 Series

V2a ↗ 2024
Enhanced realism over V2. Better anatomy and composition.

Key Capabilities

Long text strings including sentences
Precise text positioning
Multilingual text support
Style references (up to 3 images)

Alibaba (Qwen-Image)

Targeting the Chinese market or need proper Chinese typography in your images? Qwen-Image is the only model that renders Chinese characters with commercial-grade accuracy. It’s also fully open source (Apache 2.0), so you can run it locally without API costs—making it a compelling Stable Diffusion alternative for bilingual workflows.

Website: https://qwenlm.github.io/
Replicate: https://replicate.com/qwen
Parent Company: Alibaba Group (founded 1999)
Division: Alibaba Cloud / Qwen Team
License: Apache 2.0 (open source)

20B parameter MMDiT model with multi-line Chinese and English text layouts.

Model Timeline: Qwen2-VL (September 2024) → Qwen-Image (August 2025) → Qwen-Image-Edit (August 2025) → Qwen-Image-Layered (December 2025)

20 billion parameter model. First open-source model with accurate Chinese text rendering.

Image Generation Models

Key Capabilities

Commercial-grade Chinese text rendering
Bilingual (English + Chinese)
Multi-line text layouts
Layered output for editing

ByteDance (Seedream)

TikTok’s parent company quietly built one of the best image generators. Seedream 4.5 combines exceptional speed (~3 seconds for 2K images), high benchmark scores (ELO 1,222), and commercial-friendly licensing. If you need to generate images at scale with predictable costs, Seedream deserves serious consideration.

Website: https://www.bytedance.com/
Replicate: https://replicate.com/bytedance
Founded: 2012
Headquarters: Beijing, China
Products: TikTok, Doubao (100M+ MAU), Jimeng

Seedream 4.0 surpassed Gemini 2.5 Flash and OpenAI models on benchmarks. Generates 2K images in ~3 seconds with 94% text accuracy (Chinese and English).

Model Timeline: Seedream 2.0 (December 2024) → Seedream 3.0 (April 2025) → Seedream 4.0 (September 2025) → Seedream 4.5 (November 2025)

Doubao platform leads China’s AI market. Seedream models compete with GPT-4o and Midjourney.

Seedream Evolution

	3.0	4.0	4.5
Max Resolution	2K	4K	4K
Reference Images	Basic	Up to 10	Up to 14
Key Feature	3s speed	ELO 1,222	Story scenes

4.5 New: Group generation mode for story scenes and character variations.

Seedream Series

Key Capabilities

Speed: 2K images in ~3 seconds
94% text accuracy (Chinese and English)
Optimized for commercial use
Up to 4K resolution (Seedream 4)

Luma AI (Photon)

Luma AI is primarily known for video (Dream Machine, Ray3), but Photon deserves attention for image generation. It’s exceptionally fast—Photon Flash runs at $0.002/image—and excels at character consistency with adjustable reference weights. A sleeper pick for high-volume workflows where cost and speed matter more than bleeding-edge quality.

Website: https://lumalabs.ai/
Replicate: https://replicate.com/luma
Founded: September 2021
Headquarters: Palo Alto, California
Founders: Amit Jain, Alex Yu, Alberto Taiuti
Valuation: $4B+ (November 2025)

Raised $1.07B total, including $900M Series C in 2025. Partnered with Adobe to integrate Ray3 into Firefly. 30M+ users.

Product Timeline: Dream Machine (June 2024) → Photon (November 2024) → Ray3 (September 2025)

Stats: 30M+ users.

Photon is their image generation model. Known for speed.

Photon vs Photon Flash

	Photon	Photon Flash
Speed	~11s	~3s
Cost	$0.03	$0.002
Best For	Production	Iteration

Key Features: Character consistency, multi-reference support, adjustable reference weights.

Image Models

Key Capabilities

High generation speed (8x faster than competitors)
High-resolution output
Image, style, and character reference support

Pricing

Free tier available
Subscriptions: $9.99 - $99.99/month

Runway

Runway is the choice for film and TV production—their tools appear in Everything Everywhere All at Once and Amazon’s House of David. Gen-4 Image excels at maintaining character identity across scenes using @character and @location tagging. If you’re building visual narratives that need consistent characters across multiple frames, this is purpose-built for that workflow.

Website: https://runwayml.com/
Replicate: https://replicate.com/runwayml
Founded: 2018
Headquarters: New York City
Founders: Cristóbal Valenzuela, Alejandro Matamala, Anastasis Germanidis
Valuation: $3B+ (April 2025)

Co-released Stable Diffusion in August 2022. Total funding: $544M.

Model Timeline: Stable Diffusion co-release (August 2022) → Gen-1/Gen-2 (February 2023) → Gen-3 (June 2024) → Act-One (October 2024) → Gen-4 (April 2025) → Gen-4.5 (December 2025)

Industry Use: Everything Everywhere All at Once, The Late Show with Stephen Colbert, Amazon’s House of David (350+ AI shots in Season 2).

Known primarily for video, Runway also offers image generation capabilities.

Gen-4 Image vs Turbo

	Gen-4 Image	Gen-4 Turbo
Speed	Standard	2.5× faster
720p Cost	$0.05	Lower
1080p Cost	$0.08	Lower

Key Feature: Reference-based with 1-3 images. Tag with @character, @location for control.

Image Models

Key Features

Reference image support (up to 3 images)
High-quality generation
Turbo variant 2.5x faster

Notable Others

The following studios are significant in the AI image generation landscape but either lack Replicate access or don’t offer img2img capabilities via Replicate.

Midjourney

Website: https://www.midjourney.com/
Replicate: N/A
Founded: August 2021
Headquarters: San Francisco, California
Founder: David Holz (previously founded Leap Motion)

Unlike other AI startups, Midjourney is not VC-funded and has been profitable since August 2022. Runs Discord’s largest server (21M+ members as of May 2025). Web interface launched August 2024.

V7 vs V6 Comparison

	V6/V6.1	V7
Architecture	Previous gen	Completely rebuilt
Speed	~35s	Draft: 4-5s (10× faster)
Hands/Anatomy	Struggled	Significantly improved
Text Clarity	Basic	Near-perfect
Personalization	200+ images	5 minutes

V7 Key Features:

Draft Mode: 10× faster, half cost
Omni Reference (—oref): Blend styles, colors, lighting
Character Reference (—cref): Maintain identity across generations

When V6 is better: Stylized fictional world-building (V7 can feel “too clean”)

Availability: Discord bot, Web app, $10/month minimum

Leonardo AI

Website: https://leonardo.ai/
Replicate: N/A
Founded: December 2022
Headquarters: Sydney, Australia
Founders: JJ Fiasson, Ethan Smith, Jachin Bhasme
Acquired by: Canva (July 2024, ~$320M)

Originally focused on video game assets, Leonardo grew from 14,000 users (February 2023) to 19M users by end of 2023. Canva acquired Leonardo in July 2024; all 120 employees joined Canva. 1B+ images generated.

Model	Key Features
Phoenix 1.0 Ultra	5MP+ resolution
Phoenix 1.0 Fast	Speed-optimized

Adobe Firefly

Website: https://www.adobe.com/products/firefly.html
Replicate: N/A
Parent Company: Adobe Inc. (founded 1982)
Launched: March 2023 (beta)

Adobe Firefly focuses on commercial safety, trained on Adobe Stock and public domain content. 13B+ images generated since launch; ~1.5B assets/month.

Model Timeline: Firefly Beta (March 2023) → Image Model 2 (October 2023) → Image Model 3 (April 2024) → Image Model 4/4 Ultra (April 2025)

Model	Resolution
Image Model 4 Ultra	2K
Image Model 4	Standard

Layered output (objects as editable layers)
Trained on licensed content (commercial-safe)
Adobe Creative Cloud integration

Recraft

Website: https://www.recraft.ai/
Replicate: https://replicate.com/recraft-ai
Founded: 2022
Headquarters: San Francisco / London
Founder: Anna Veronika Dorogush (co-created CatBoost at Yandex)
Total Funding: $42M ($30M Series B in May 2025)

Recraft V3 (codenamed “Red Panda”) achieved #1 on Hugging Face’s Text-to-Image Leaderboard with ELO 1172, outperforming DALL-E and Midjourney (October 2024).

Stats: 4M+ users, $5M+ ARR.

Model	Replicate Link
Recraft V3	https://replicate.com/recraft-ai/recraft-v3
Recraft V3 SVG	https://replicate.com/recraft-ai/recraft-v3-svg
Recraft 20B	https://replicate.com/recraft-ai/recraft-20b
Recraft 20B SVG	https://replicate.com/recraft-ai/recraft-20b-svg

Long text generation (sentences, paragraphs)
Vector (SVG) output
No img2img support on Replicate

NVIDIA (Edify / SANA)

Website: https://www.nvidia.com/
Replicate: https://replicate.com/nvidia
Founded: 1993
Headquarters: Santa Clara, California
CEO: Jensen Huang

NVIDIA Edify (renamed from Picasso in September 2024) is the enterprise platform. SANA, developed with MIT, is 20× smaller and 100× faster than FLUX-12B while generating up to 4K images.

Product Timeline: Picasso/Edify (2023) → Edify rename (September 2024) → SANA (November 2024) → SANA-Video (October 2025)

Partnerships: Getty Images, Shutterstock, Adobe.

Model	Replicate Link
SANA	https://replicate.com/nvidia/sana
SANA Sprint 1.6B	https://replicate.com/nvidia/sana-sprint-1.6b

Edify platform for enterprise (4K, custom training)
SANA models for research
No img2img support on Replicate

Frequently Asked Questions

What is image-to-image (img2img) generation?

Image-to-image generation lets you use existing images as reference inputs alongside your text prompt. Instead of generating from scratch, the model incorporates visual elements from your reference—like a product photo, a style example, or a character’s face—into the output. This is essential for maintaining consistency across marketing campaigns, product catalogs, and brand assets.

Which AI image generator has the best quality in 2025?

For pure photorealism, FLUX.2 Max and Imagen 4 Ultra lead the pack. FLUX.2 Max excels at fine details like skin texture and fabric rendering, while Imagen 4 Ultra handles materials like water, fur, and metallic surfaces exceptionally well. The choice depends on whether you need img2img support (FLUX.2 Max has it; Imagen does not).

What’s the fastest AI image generator?

FLUX.1 Schnell generates images in under 1 second at ~$0.003/image. For slightly higher quality with similar speed, Photon Flash (~3 seconds, $0.002/image) and Ideogram V3 Turbo (~1 second, $0.02/image) are excellent choices. Seedream also generates 2K images in roughly 3 seconds.

Which model is best for generating text in images?

Ideogram V3 Quality is the industry leader for text rendering. It handles long sentences, logos, signage, and complex typography that other models mangle. For Chinese text specifically, Qwen-Image is the only model with commercial-grade Chinese typography.

Can I run these models locally?

Yes, but only open-source models. Stable Diffusion 3.5 and Qwen-Image (Apache 2.0 license) can run locally without API costs. You’ll need a GPU with 8GB+ VRAM for SDXL or 12GB+ for SD 3.5. FLUX.1 Dev and FLUX.1 Schnell also have open weights for local use.

What is Replicate?

Replicate is a cloud platform that hosts AI models with a simple pay-per-use API. You don’t need to manage infrastructure—just send requests and get results. Most models in this guide are accessible via Replicate, making it easy to test different options before committing to one.

How do I maintain character consistency across images?

Several models support multi-reference inputs: Nano Banana Pro (up to 14 images), FLUX.2 Flex (up to 10), Seedream 4.5 (up to 14), and Runway Gen-4 (up to 3 with @character tagging). These let you feed in reference photos of a character to maintain consistent features across generations.

Pricing Comparison (December 2025)

Studio	Model	Approximate Cost per Image
Black Forest Labs	FLUX.1 schnell	~$0.003
Black Forest Labs	FLUX.1 pro	~$0.05
Google	Nano Banana	~$0.04
OpenAI	GPT Image 1	~$0.04
Stability AI	SD 3.5	Free (local) / ~$0.006 (API)
Alibaba	Qwen-Image	Free (open source)
ByteDance	Seedream 4.5	~$0.01
Luma AI	Photon	~$0.01-0.03
Ideogram	V3	~$0.02
Runway	Gen-4 Image	~$0.03

Prices approximate and may vary by resolution, tier, and volume.

Summary by Use Case

Photorealism with Img2Img

FLUX.2 Max/Pro/Flex, Nano Banana Pro, Seedream 4.5

Text in Images with Img2Img

Ideogram V3, Qwen-Image

Speed with Img2Img

FLUX.1 Dev, Luma Photon, Seedream 4

Open Source with Img2Img

Stable Diffusion 3.5, FLUX.1 Dev, Qwen-Image

Character/Style Reference

Luma Photon, Runway Gen-4, FLUX Kontext, Ideogram Character

Chinese Market

Qwen-Image, Seedream 4.5

This guide is updated regularly as new models are released. Last update: December 2025.

Which AI Model Works Best for Jewelry Photography? — Practical recommendations based on our testing
The Complete Guide to Jewelry Photography — Every shot type you need for e-commerce
Head-to-Head Model Comparison (Research) — 270 pairwise evaluations of top models

About studio formel

studio formel is an AI-powered creative platform built specifically for jewelry brands. We combine systematic research on AI generation with a flexible asset management system, helping jewelry sellers create professional images, videos, and ads at scale.

Learn more about our platform →

TL;DR: Quick Recommendations

Visual Comparison: Same Prompt, 44 Models

Black Forest Labs

Google

OpenAI

Stability AI

Ideogram

Alibaba

ByteDance

Luma AI

Runway

Quick Reference: Studios with Replicate + Img2Img Support

Black Forest Labs (FLUX)

FLUX.2 vs FLUX.1: What Changed?

Model Variants

FLUX.2 Series

FLUX.1 Series

FLUX Kontext Series

FLUX LoRA Variants

Google DeepMind (Imagen / Gemini)

Imagen Evolution

Imagen Series

Gemini Image Models

Key Capabilities

OpenAI (GPT Image / DALL-E)

GPT Image Models

DALL-E Models

Key Features

Stability AI (Stable Diffusion)

SD Version Comparison

Stable Diffusion 3.5 Series

Key Advantages

Hardware Requirements

Ideogram

Version Evolution

V3 Variants: Quality vs Balanced vs Turbo

Ideogram V3 Series

Ideogram V2 Series

Key Capabilities

Alibaba (Qwen-Image)

Image Generation Models

Key Capabilities

ByteDance (Seedream)

Seedream Evolution

Seedream Series

Key Capabilities

Luma AI (Photon)

Photon vs Photon Flash

Image Models

Key Capabilities

Pricing

Runway

Gen-4 Image vs Turbo

Image Models

Key Features

Notable Others

Midjourney

V7 vs V6 Comparison

Leonardo AI

Adobe Firefly

Recraft

NVIDIA (Edify / SANA)

Frequently Asked Questions

What is image-to-image (img2img) generation?

Which AI image generator has the best quality in 2025?

What’s the fastest AI image generator?

Which model is best for generating text in images?

Can I run these models locally?

What is Replicate?

How do I maintain character consistency across images?

Pricing Comparison (December 2025)

Summary by Use Case

Photorealism with Img2Img

Text in Images with Img2Img

Speed with Img2Img

Open Source with Img2Img

Character/Style Reference

Chinese Market

Related Articles

About studio formel