The AI image generation landscape has exploded in 2025. What started with a handful of players has become a competitive ecosystem of studios, each with multiple models optimized for different use cases.
This guide covers every major AI image generation studio and their available models, with a focus on models accessible via Replicate that support image-to-image (reference image) workflows.
TL;DR: Quick Recommendations
Need the absolute best quality? FLUX.2 Max or Imagen 4 Ultra. Both excel at photorealism with fine detail rendering. FLUX.2 Max supports reference images; Imagen does not.
Need text in your images? Ideogram V3 Quality is the industry leader. It handles long sentences, logos, and precise positioning better than any competitor.
Need speed on a budget? FLUX.1 Schnell (<1 second) or Photon Flash (~3 seconds). Both are dramatically cheaper than premium models while maintaining good quality.
Need open source? Stable Diffusion 3.5 Large or Qwen-Image. Run locally with no API costs, massive LoRA ecosystem, full customization control.
Need character consistency? Nano Banana Pro (up to 14 reference images), FLUX.2 Flex (up to 10), or Runway Gen-4 with @character tagging.
Need Chinese text? Qwen-Image is the only model with commercial-grade Chinese typography.
Visual Comparison: Same Prompt, 44 Models
We ran the same prompt through every Replicate-accessible model to show how each interprets identical instructions.
Prompt: “A young woman in a navy linen blazer, leaning against a terracotta-colored wall. Direct sunlight casting sharp geometric shadows across her face. She wears small gold hoop earrings and a delicate gold chain necklace. Natural makeup, slicked-back hair. She looks off-camera with a calm, confident expression. Shot on medium format, editorial fashion photography, Vogue Italia aesthetic.”
Black Forest Labs
img2imgHighest fidelity. 4MP, ~30s. Best for hero images.
img2imgProduction standard. 2MP, ~10s. Best quality/speed balance.
img2imgUp to 10 reference images. Best for character consistency.
img2imgResearch/non-commercial. Open weights for experimentation.
img2imgOriginal flagship. 12B params, strong photorealism.
img2img6x faster than 1.0 Pro. Better image quality, diversity.
img2img4MP raw output. Up to 2K resolution for large prints.
img2imgOpen weights, guidance-distilled. Fine-tuning base.

Apache 2.0 license. 4 steps, fastest FLUX variant.
img2imgText+image context. Highest quality editing model.
img2imgFast context editing. Good balance of speed and quality.
img2imgCustom style training. Load your own LoRA weights.

Fast LoRA inference. 4-step with custom styles.
img2imgKrea-optimized FLUX Dev. Enhanced for creative tools.

Highest quality. Fine fabric/water/fur rendering.

Standard tier. Enhanced typography for posters/cards.

Speed-optimized Imagen 4. Quick iterations, good quality.

Previous flagship. Strong photorealism, reliable output.

Speed variant of Imagen 3. Good for prototyping.
img2imgGemini-based. Top benchmark performer, strong consistency.
img2imgStandard Gemini variant. Good balance of speed and quality.
img2imgFast multimodal. Best for conversational image editing.
OpenAI
img2imgNative ChatGPT integration. Best for iterative, conversational workflows.

Natural language prompting. Uses GPT-4 to expand prompts automatically.

Pioneer of img2img. Inpainting, outpainting, variations.
Stability AI
img2img8B params, MMDiT architecture. Highest quality open-source option.
img2imgDistilled Large. 4-step generation for fast iterations.
img2img2.5B params. Quality/speed balance, lower resource usage.
img2imgFirst MMDiT model. Triplet text encoder for better prompt understanding.
Ideogram
img2imgBest text rendering in the industry. Posters, logos, signage.
img2imgQuality/speed sweet spot. Good for most production work.
img2imgFastest V3. Great for rapid prototyping with text.
img2imgEnhanced realism over V2. Better anatomy and composition.
img2imgFast V2a for quick iterations. Budget-friendly.
img2imgCore V2 model. 1280x1280 resolution, solid text rendering.
img2imgFast V2 variant. Lower cost, good for testing.
Alibaba
img2imgBest for Chinese text. Strong multi-language support.
ByteDance
img2imgLatest flagship. Fast generation, commercial-friendly license.
img2imgProduction stable. Reliable for batch processing.

Previous generation. Good value, lower cost.

Consumer-focused. Optimized for everyday use.
Luma AI
img2imgFast, high-quality. From the Dream Machine team.
img2imgUltra-fast variant. Best for real-time applications.
Runway
img2imgReference-based generation. Strong style consistency.
Quick Reference: Studios with Replicate + Img2Img Support
| Studio | Flagship Model | Key Strength | Img2Img Models |
|---|---|---|---|
| Black Forest Labs | FLUX.2 Max | Photorealism, text rendering | 13 |
| Google DeepMind | Nano Banana Pro | Quality, consistency | 3 |
| OpenAI | GPT Image 1.5 | Conversational generation | 3 |
| Stability AI | SD 3.5 Large | Open source, customization | 6 |
| Ideogram | V3 Quality | Text in images | 8 |
| Alibaba (Qwen) | Qwen-Image | Chinese text rendering | 4 |
| ByteDance | Seedream 4.5 | Speed, commercial use | 5 |
| Luma AI | Photon | Fast generation | 2 |
| Runway | Gen-4 Image | Reference-based | 2 |
Black Forest Labs (FLUX)
- Website: https://blackforestlabs.ai/
- Replicate: https://replicate.com/black-forest-labs
- Founded: August 2024
- Headquarters: Freiburg im Breisgau, Germany
- Founders: Robin Rombach, Andreas Blattmann, Patrick Esser, Dominik Lorenz
- Valuation: $3.25B (December 2025)
Black Forest Labs was founded on August 1, 2024 by former Stability AI researchers who created Stable Diffusion. The founders previously researched AI at LMU Munich under Björn Ommer. The company achieved unicorn status within months and closed a $300M Series B in December 2025. Key partners include Adobe, Canva, Meta, and xAI (Grok integration).
Model Timeline: FLUX.1 (August 2024) → FLUX.1.1 Pro (October 2024) → FLUX.2 (November 2025)
Known for photorealism, accurate text rendering, and strict prompt adherence.
FLUX.2 vs FLUX.1: What Changed?
| FLUX.1 | FLUX.2 | |
|---|---|---|
| Parameters | 12B | 32B (with Mistral-3 VLM) |
| Max Resolution | 1-2 MP | 4 MP |
| Reference Images | Limited | Up to 10 |
| Typography | Good | Legible fine text, UI elements |
Key FLUX.2 Improvements: New VAE, 32K token context, better skin/fabric micro-details.
Model Variants
| Variant | Speed | Best For |
|---|---|---|
| Max | ~30s | Hero images, final production |
| Pro | ~5s | Professional workflows |
| Dev | ~2s | Development, fine-tuning |
| Schnell | <1s | Rapid iteration |
Kontext = text-based image editing (not generation)
FLUX.2 Series
img2imgHighest fidelity. 4MP, ~30s. Best for hero images.
img2imgFast (~5s), 8 reference images. Professional workflows.
img2imgOptimized for img2img editing and style transfer.
img2imgOpen weights. Best for development and fine-tuning.
FLUX.1 Series
img2img12B params. Original flagship for production use.
img2imgImproved quality over 1.0. Better prompt adherence.
img2imgHighest res in 1.x series. Up to 2MP output.
img2imgOpen weights. 28 steps, ~2s. For fine-tuning.

Fastest (<1s). 4 steps. For rapid prototyping.
FLUX Kontext Series
img2imgText-based editing. Transform style, clothing via prompts.
img2imgFaster Kontext. ~4s per edit. Maintains composition.
FLUX LoRA Variants
img2imgCustom style training. Load your own LoRA weights.

Fast LoRA inference. 4-step with custom styles.
img2imgKrea-optimized FLUX Dev. Enhanced for creative tools.
Hardware Requirements: FLUX.1 dev requires ~24GB VRAM
Google DeepMind (Imagen / Gemini)
Google’s AI lab brings deep pockets and research talent to image generation. Imagen 4 is their flagship, but the real story is Nano Banana Pro—a Gemini-based model that quietly dominated anonymous benchmarks before being identified. For reference image workflows, Nano Banana Pro supports up to 14 input images, more than any competitor.
- Website: https://deepmind.google/
- Replicate: https://replicate.com/google
- Parent Company: Alphabet Inc.
- CEO: Demis Hassabis
- Merged: April 2023 (Google Brain + DeepMind)
The team uses SynthID invisible watermarking to prevent deepfakes and supports text rendering in 7+ languages.
Model Timeline: Imagen (May 2022) → Imagen 2 (December 2023) → Imagen 3 (August 2024) → Imagen 4 (May 2025, Google I/O)
Google’s image generation spans the Imagen series and Gemini-based models (marketed as “Nano Banana” on LMArena’s anonymous benchmark).
Imagen Evolution
| Imagen 2 | Imagen 3 | Imagen 4 | |
|---|---|---|---|
| Typography | Basic | Improved | Significantly enhanced |
| Detail | Standard | Fewer artifacts | Fine fabrics, water, fur |
| Speed | Standard | Standard | Fast variant 10× faster |
Imagen Series

Highest quality. Fine fabric/water/fur rendering.

Standard tier. Enhanced typography for posters/cards.

Speed-optimized Imagen 4. Quick iterations, good quality.

Previous flagship. Strong photorealism, reliable output.

Speed variant of Imagen 3. Good for prototyping.
Gemini Image Models
img2imgGemini-based. Top benchmark performer, strong consistency.
img2imgStandard Gemini variant. Good balance of speed and quality.
img2imgFast multimodal. Best for conversational image editing.
Key Capabilities
- Character consistency across generations
- Image blending and editing
- Accurate text rendering in multiple languages
- Up to 4K resolution
OpenAI (GPT Image / DALL-E)
OpenAI invented the category with DALL-E in 2021, but their real advantage today is conversational iteration. GPT Image 1.5 integrates directly with ChatGPT, letting you refine images through natural dialogue: “make the background warmer” or “add a second person on the left.” If your workflow involves back-and-forth refinement, this is uniquely powerful.
- Website: https://openai.com/
- Replicate: https://replicate.com/openai
- Founded: December 2015
- Headquarters: San Francisco, California
- CEO: Sam Altman
In March 2025, DALL-E 3 was replaced by GPT Image’s native multimodal generation in ChatGPT. All outputs include C2PA metadata for provenance tracking.
Model Timeline: Image GPT (June 2020) → DALL-E (January 2021) → DALL-E 2 (April 2022) → DALL-E 3 (October 2023) → GPT Image 1 (March 2025) → GPT Image 1.5 (December 2025)
OpenAI offers GPT Image models and the legacy DALL-E series.
GPT Image Models
img2imgNative ChatGPT integration. Best for iterative, conversational workflows.
DALL-E Models

Natural language prompting. Uses GPT-4 to expand prompts automatically.

Pioneer of img2img. Inpainting, outpainting, variations.
Key Features
- Native multimodal generation
- Conversational refinement through chat
- Context-aware iterations
- C2PA metadata on all outputs
Stability AI (Stable Diffusion)
If you want to run models locally, train your own styles, or avoid recurring API costs, Stable Diffusion is the answer. It’s fully open source with a massive ecosystem of community fine-tunes, LoRAs, and tools. The trade-off: you’ll need a decent GPU (8GB+ VRAM) and some technical comfort. For maximum customization at minimum cost, nothing else comes close.
- Website: https://stability.ai/
- Replicate: https://replicate.com/stability-ai
- Founded: 2020
- Headquarters: London, UK
- Founder: Emad Mostaque (resigned March 2024)
- Current CEO: Prem Akkaraju (appointed June 2024)
Stability AI revolutionized the industry in August 2022 by making model weights freely available. SD 3.5 Large represents their current flagship.
Model Timeline: SD 1.x (August 2022) → SD 2.0 (November 2022) → SDXL (July 2023) → SD 3 (February 2024) → SD 3.5 (October 2024)
Pioneered open-source image generation. Important for customization and fine-tuning.
SD Version Comparison
| SD 1.5 | SDXL | SD 3.5 | |
|---|---|---|---|
| Parameters | 983M | 3.5B | 8B |
| Architecture | UNet | UNet | Diffusion Transformer |
| Text Generation | Poor | Better | Best in series |
| VRAM | ~6GB | ~12GB | ~20GB |
Trade-offs: SD 3.5 is slower (1+ min) but has market-leading prompt adherence. Still struggles with hands.
Stable Diffusion 3.5 Series
img2img8B params, MMDiT architecture. Highest quality open-source option.
img2imgDistilled Large. 4-step generation for fast iterations.
img2img2.5B params. Quality/speed balance, lower resource usage.
img2imgFirst MMDiT model. Triplet text encoder for better prompt understanding.
Key Advantages
- Fully open source
- Massive ecosystem of LoRAs and fine-tuned models
- Run locally without API costs
Hardware Requirements
- SDXL: 8GB+ VRAM
- SD 3.5: 12GB+ VRAM
Ideogram
Need text in your images that actually looks right? Ideogram is the clear leader. Whether it’s a logo, poster, storefront sign, or book cover, V3 Quality renders long sentences, precise positioning, and complex typography that other models mangle. No other model comes close for text-heavy designs.
- Website: https://ideogram.ai/
- Replicate: https://replicate.com/ideogram-ai
- Founded: 2022
- Headquarters: Toronto, Canada
- Founders: Mohammad Norouzi (CEO), William Chan, Jonathan Ho, Chitwan Saharia
Founded by former Google Imagen researchers. Co-founder Jonathan Ho authored the foundational 2020 paper on diffusion models. First to render coherent text in images at launch.
Model Timeline: Ideogram 0.1 (August 2023) → Ideogram 1.0 (February 2024) → Ideogram 2.0 (August 2024) → Ideogram 3.0 (March 2025)
Stats: 7M+ creators, 600M+ images generated.
Leader in text rendering within images.
Version Evolution
| 1.0 | 2.0 | 3.0 | |
|---|---|---|---|
| Text Clarity | Good | Improved | Complex layouts |
| Styles | Basic | 20+ | 50+ presets |
| Key Feature | First coherent text | Realism + styles | Style references |
V3 Variants: Quality vs Balanced vs Turbo
| Variant | Speed | Cost | Use Case |
|---|---|---|---|
| Quality | ~9s | $0.09 | Final production |
| Balanced | ~4s | $0.06 | General use |
| Turbo | ~1s | $0.02 | Rapid iteration |
Ideogram V3 Series
img2imgBest text rendering in the industry. Posters, logos, signage.
img2imgQuality/speed sweet spot. Good for most production work.
img2imgFastest V3. Great for rapid prototyping with text.
Ideogram V2 Series
img2imgEnhanced realism over V2. Better anatomy and composition.
img2imgFast V2a for quick iterations. Budget-friendly.
img2imgCore V2 model. 1280x1280 resolution, solid text rendering.
img2imgFast V2 variant. Lower cost, good for testing.
Key Capabilities
- Long text strings including sentences
- Precise text positioning
- Multilingual text support
- Style references (up to 3 images)
Alibaba (Qwen-Image)
Targeting the Chinese market or need proper Chinese typography in your images? Qwen-Image is the only model that renders Chinese characters with commercial-grade accuracy. It’s also fully open source (Apache 2.0), so you can run it locally without API costs—making it a compelling Stable Diffusion alternative for bilingual workflows.
- Website: https://qwenlm.github.io/
- Replicate: https://replicate.com/qwen
- Parent Company: Alibaba Group (founded 1999)
- Division: Alibaba Cloud / Qwen Team
- License: Apache 2.0 (open source)
20B parameter MMDiT model with multi-line Chinese and English text layouts.
Model Timeline: Qwen2-VL (September 2024) → Qwen-Image (August 2025) → Qwen-Image-Edit (August 2025) → Qwen-Image-Layered (December 2025)
20 billion parameter model. First open-source model with accurate Chinese text rendering.
Image Generation Models
img2imgBest for Chinese text. Strong multi-language support.
Key Capabilities
- Commercial-grade Chinese text rendering
- Bilingual (English + Chinese)
- Multi-line text layouts
- Layered output for editing
ByteDance (Seedream)
TikTok’s parent company quietly built one of the best image generators. Seedream 4.5 combines exceptional speed (~3 seconds for 2K images), high benchmark scores (ELO 1,222), and commercial-friendly licensing. If you need to generate images at scale with predictable costs, Seedream deserves serious consideration.
- Website: https://www.bytedance.com/
- Replicate: https://replicate.com/bytedance
- Founded: 2012
- Headquarters: Beijing, China
- Products: TikTok, Doubao (100M+ MAU), Jimeng
Seedream 4.0 surpassed Gemini 2.5 Flash and OpenAI models on benchmarks. Generates 2K images in ~3 seconds with 94% text accuracy (Chinese and English).
Model Timeline: Seedream 2.0 (December 2024) → Seedream 3.0 (April 2025) → Seedream 4.0 (September 2025) → Seedream 4.5 (November 2025)
Doubao platform leads China’s AI market. Seedream models compete with GPT-4o and Midjourney.
Seedream Evolution
| 3.0 | 4.0 | 4.5 | |
|---|---|---|---|
| Max Resolution | 2K | 4K | 4K |
| Reference Images | Basic | Up to 10 | Up to 14 |
| Key Feature | 3s speed | ELO 1,222 | Story scenes |
4.5 New: Group generation mode for story scenes and character variations.
Seedream Series
img2imgLatest flagship. Fast generation, commercial-friendly license.
img2imgProduction stable. Reliable for batch processing.

Previous generation. Good value, lower cost.

Consumer-focused. Optimized for everyday use.
Key Capabilities
- Speed: 2K images in ~3 seconds
- 94% text accuracy (Chinese and English)
- Optimized for commercial use
- Up to 4K resolution (Seedream 4)
Luma AI (Photon)
Luma AI is primarily known for video (Dream Machine, Ray3), but Photon deserves attention for image generation. It’s exceptionally fast—Photon Flash runs at $0.002/image—and excels at character consistency with adjustable reference weights. A sleeper pick for high-volume workflows where cost and speed matter more than bleeding-edge quality.
- Website: https://lumalabs.ai/
- Replicate: https://replicate.com/luma
- Founded: September 2021
- Headquarters: Palo Alto, California
- Founders: Amit Jain, Alex Yu, Alberto Taiuti
- Valuation: $4B+ (November 2025)
Raised $1.07B total, including $900M Series C in 2025. Partnered with Adobe to integrate Ray3 into Firefly. 30M+ users.
Product Timeline: Dream Machine (June 2024) → Photon (November 2024) → Ray3 (September 2025)
Stats: 30M+ users.
Photon is their image generation model. Known for speed.
Photon vs Photon Flash
| Photon | Photon Flash | |
|---|---|---|
| Speed | ~11s | ~3s |
| Cost | $0.03 | $0.002 |
| Best For | Production | Iteration |
Key Features: Character consistency, multi-reference support, adjustable reference weights.
Image Models
img2imgFast, high-quality. From the Dream Machine team.
img2imgUltra-fast variant. Best for real-time applications.
Key Capabilities
- High generation speed (8x faster than competitors)
- High-resolution output
- Image, style, and character reference support
Pricing
- Free tier available
- Subscriptions: $9.99 - $99.99/month
Runway
Runway is the choice for film and TV production—their tools appear in Everything Everywhere All at Once and Amazon’s House of David. Gen-4 Image excels at maintaining character identity across scenes using @character and @location tagging. If you’re building visual narratives that need consistent characters across multiple frames, this is purpose-built for that workflow.
- Website: https://runwayml.com/
- Replicate: https://replicate.com/runwayml
- Founded: 2018
- Headquarters: New York City
- Founders: Cristóbal Valenzuela, Alejandro Matamala, Anastasis Germanidis
- Valuation: $3B+ (April 2025)
Co-released Stable Diffusion in August 2022. Total funding: $544M.
Model Timeline: Stable Diffusion co-release (August 2022) → Gen-1/Gen-2 (February 2023) → Gen-3 (June 2024) → Act-One (October 2024) → Gen-4 (April 2025) → Gen-4.5 (December 2025)
Industry Use: Everything Everywhere All at Once, The Late Show with Stephen Colbert, Amazon’s House of David (350+ AI shots in Season 2).
Known primarily for video, Runway also offers image generation capabilities.
Gen-4 Image vs Turbo
| Gen-4 Image | Gen-4 Turbo | |
|---|---|---|
| Speed | Standard | 2.5× faster |
| 720p Cost | $0.05 | Lower |
| 1080p Cost | $0.08 | Lower |
Key Feature: Reference-based with 1-3 images. Tag with @character, @location for control.
Image Models
img2imgReference-based generation. Strong style consistency.
Key Features
- Reference image support (up to 3 images)
- High-quality generation
- Turbo variant 2.5x faster
Notable Others
The following studios are significant in the AI image generation landscape but either lack Replicate access or don’t offer img2img capabilities via Replicate.
Midjourney
- Website: https://www.midjourney.com/
- Replicate: N/A
- Founded: August 2021
- Headquarters: San Francisco, California
- Founder: David Holz (previously founded Leap Motion)
Unlike other AI startups, Midjourney is not VC-funded and has been profitable since August 2022. Runs Discord’s largest server (21M+ members as of May 2025). Web interface launched August 2024.
V7 vs V6 Comparison
| V6/V6.1 | V7 | |
|---|---|---|
| Architecture | Previous gen | Completely rebuilt |
| Speed | ~35s | Draft: 4-5s (10× faster) |
| Hands/Anatomy | Struggled | Significantly improved |
| Text Clarity | Basic | Near-perfect |
| Personalization | 200+ images | 5 minutes |
V7 Key Features:
- Draft Mode: 10× faster, half cost
- Omni Reference (—oref): Blend styles, colors, lighting
- Character Reference (—cref): Maintain identity across generations
When V6 is better: Stylized fictional world-building (V7 can feel “too clean”)
- Availability: Discord bot, Web app, $10/month minimum
Leonardo AI
- Website: https://leonardo.ai/
- Replicate: N/A
- Founded: December 2022
- Headquarters: Sydney, Australia
- Founders: JJ Fiasson, Ethan Smith, Jachin Bhasme
- Acquired by: Canva (July 2024, ~$320M)
Originally focused on video game assets, Leonardo grew from 14,000 users (February 2023) to 19M users by end of 2023. Canva acquired Leonardo in July 2024; all 120 employees joined Canva. 1B+ images generated.
| Model | Key Features |
|---|---|
| Phoenix 1.0 Ultra | 5MP+ resolution |
| Phoenix 1.0 Fast | Speed-optimized |
Adobe Firefly
- Website: https://www.adobe.com/products/firefly.html
- Replicate: N/A
- Parent Company: Adobe Inc. (founded 1982)
- Launched: March 2023 (beta)
Adobe Firefly focuses on commercial safety, trained on Adobe Stock and public domain content. 13B+ images generated since launch; ~1.5B assets/month.
Model Timeline: Firefly Beta (March 2023) → Image Model 2 (October 2023) → Image Model 3 (April 2024) → Image Model 4/4 Ultra (April 2025)
| Model | Resolution |
|---|---|
| Image Model 4 Ultra | 2K |
| Image Model 4 | Standard |
- Layered output (objects as editable layers)
- Trained on licensed content (commercial-safe)
- Adobe Creative Cloud integration
Recraft
- Website: https://www.recraft.ai/
- Replicate: https://replicate.com/recraft-ai
- Founded: 2022
- Headquarters: San Francisco / London
- Founder: Anna Veronika Dorogush (co-created CatBoost at Yandex)
- Total Funding: $42M ($30M Series B in May 2025)
Recraft V3 (codenamed “Red Panda”) achieved #1 on Hugging Face’s Text-to-Image Leaderboard with ELO 1172, outperforming DALL-E and Midjourney (October 2024).
Stats: 4M+ users, $5M+ ARR.
| Model | Replicate Link |
|---|---|
| Recraft V3 | https://replicate.com/recraft-ai/recraft-v3 |
| Recraft V3 SVG | https://replicate.com/recraft-ai/recraft-v3-svg |
| Recraft 20B | https://replicate.com/recraft-ai/recraft-20b |
| Recraft 20B SVG | https://replicate.com/recraft-ai/recraft-20b-svg |
- Long text generation (sentences, paragraphs)
- Vector (SVG) output
- No img2img support on Replicate
NVIDIA (Edify / SANA)
- Website: https://www.nvidia.com/
- Replicate: https://replicate.com/nvidia
- Founded: 1993
- Headquarters: Santa Clara, California
- CEO: Jensen Huang
NVIDIA Edify (renamed from Picasso in September 2024) is the enterprise platform. SANA, developed with MIT, is 20× smaller and 100× faster than FLUX-12B while generating up to 4K images.
Product Timeline: Picasso/Edify (2023) → Edify rename (September 2024) → SANA (November 2024) → SANA-Video (October 2025)
Partnerships: Getty Images, Shutterstock, Adobe.
| Model | Replicate Link |
|---|---|
| SANA | https://replicate.com/nvidia/sana |
| SANA Sprint 1.6B | https://replicate.com/nvidia/sana-sprint-1.6b |
- Edify platform for enterprise (4K, custom training)
- SANA models for research
- No img2img support on Replicate
Frequently Asked Questions
What is image-to-image (img2img) generation?
Image-to-image generation lets you use existing images as reference inputs alongside your text prompt. Instead of generating from scratch, the model incorporates visual elements from your reference—like a product photo, a style example, or a character’s face—into the output. This is essential for maintaining consistency across marketing campaigns, product catalogs, and brand assets.
Which AI image generator has the best quality in 2025?
For pure photorealism, FLUX.2 Max and Imagen 4 Ultra lead the pack. FLUX.2 Max excels at fine details like skin texture and fabric rendering, while Imagen 4 Ultra handles materials like water, fur, and metallic surfaces exceptionally well. The choice depends on whether you need img2img support (FLUX.2 Max has it; Imagen does not).
What’s the fastest AI image generator?
FLUX.1 Schnell generates images in under 1 second at ~$0.003/image. For slightly higher quality with similar speed, Photon Flash (~3 seconds, $0.002/image) and Ideogram V3 Turbo (~1 second, $0.02/image) are excellent choices. Seedream also generates 2K images in roughly 3 seconds.
Which model is best for generating text in images?
Ideogram V3 Quality is the industry leader for text rendering. It handles long sentences, logos, signage, and complex typography that other models mangle. For Chinese text specifically, Qwen-Image is the only model with commercial-grade Chinese typography.
Can I run these models locally?
Yes, but only open-source models. Stable Diffusion 3.5 and Qwen-Image (Apache 2.0 license) can run locally without API costs. You’ll need a GPU with 8GB+ VRAM for SDXL or 12GB+ for SD 3.5. FLUX.1 Dev and FLUX.1 Schnell also have open weights for local use.
What is Replicate?
Replicate is a cloud platform that hosts AI models with a simple pay-per-use API. You don’t need to manage infrastructure—just send requests and get results. Most models in this guide are accessible via Replicate, making it easy to test different options before committing to one.
How do I maintain character consistency across images?
Several models support multi-reference inputs: Nano Banana Pro (up to 14 images), FLUX.2 Flex (up to 10), Seedream 4.5 (up to 14), and Runway Gen-4 (up to 3 with @character tagging). These let you feed in reference photos of a character to maintain consistent features across generations.
Pricing Comparison (December 2025)
| Studio | Model | Approximate Cost per Image |
|---|---|---|
| Black Forest Labs | FLUX.1 schnell | ~$0.003 |
| Black Forest Labs | FLUX.1 pro | ~$0.05 |
| Nano Banana | ~$0.04 | |
| OpenAI | GPT Image 1 | ~$0.04 |
| Stability AI | SD 3.5 | Free (local) / ~$0.006 (API) |
| Alibaba | Qwen-Image | Free (open source) |
| ByteDance | Seedream 4.5 | ~$0.01 |
| Luma AI | Photon | ~$0.01-0.03 |
| Ideogram | V3 | ~$0.02 |
| Runway | Gen-4 Image | ~$0.03 |
Prices approximate and may vary by resolution, tier, and volume.
Summary by Use Case
Photorealism with Img2Img
FLUX.2 Max/Pro/Flex, Nano Banana Pro, Seedream 4.5
Text in Images with Img2Img
Ideogram V3, Qwen-Image
Speed with Img2Img
FLUX.1 Dev, Luma Photon, Seedream 4
Open Source with Img2Img
Stable Diffusion 3.5, FLUX.1 Dev, Qwen-Image
Character/Style Reference
Luma Photon, Runway Gen-4, FLUX Kontext, Ideogram Character
Chinese Market
Qwen-Image, Seedream 4.5
This guide is updated regularly as new models are released. Last update: December 2025.
Related Articles
- Which AI Model Works Best for Jewelry Photography? — Practical recommendations based on our testing
- The Complete Guide to Jewelry Photography — Every shot type you need for e-commerce
- Head-to-Head Model Comparison (Research) — 270 pairwise evaluations of top models
About studio formel
studio formel is an AI-powered creative platform built specifically for jewelry brands. We combine systematic research on AI generation with a flexible asset management system, helping jewelry sellers create professional images, videos, and ads at scale.