Stable Diffusion 3.5模型发布,图像生成更真实,性能提升,并专注于多样化输出和易用性。
StabilityAI昨天发布了其全新的Stable Diffusion 3.5系列 AI 图像模型,与之前的3.0版本相比,这次升级显著提高了图像的逼真度、对提示的响应能力以及文本渲染效果。
与 SD3.0 类似,Stable Diffusion 3.5有三个版本——大型版 (8B)、大型加速版 (8B Turbo) 和中型版 (2.6B)。这些模型都可以根据用户需求进行定制,并能在消费级硬件上运行,同时也可以通过稳定AI社区许可证使用。
简单来说,这一升级让任何用户都能更轻松地生成逼真的 AI 图像。在一份新闻稿中,StabilityAI承认今年6月发布的中型模型“未能完全达到我们的标准或社区的期望”。
公司进一步解释道:“在听取了宝贵的社区反馈后,我们决定花更多时间开发一个能够推进我们改变视觉媒体使命的版本,而不是快速修补。”
我们的AI编辑Ryan Morrison已经测试了3.5版,他认为这次升级显著提升,甚至可能超过最近发布的Flux 1.1 Pro的能力。
Stable Diffusion3.5有什么新功能?
StabilityAI 表示,新模型的重点是可定制性、高效性能和多样化输出。“Stable Diffusion3.5是我们迄今为止最强大的模型,体现了我们为创作者提供广泛可用且先进工具的承诺。”公司发言人解释道。
这意味着图像可以进行精细调整,模型可以“开箱即用”在消费级硬件上运行,生成的图像会更加独特。
Ryan Morrison 对Stable Diffusion 3.5的大型版进行了快速测试,发现其生成速度快,能够准确响应提示,且风格控制能力强。相比3.0版尤其是中型版,这次升级显著。
新版本还加入了更多的风格选择,包括摄影、绘画等,甚至可以通过标签提示来指定特定风格,如波西米亚风格或时尚风格。此外,通过在提示中突出关键字,可以引导模型朝特定方向发展。
公司分析指出:“Stable Diffusion 3.5大型版在提示响应方面处于市场领先地位,图像质量也与更大规模的模型相媲美。”
“Stable Diffusion 3.5加速版提供了同级别中最快的推理速度,且在图像质量和提示响应上也保持了高度竞争力,即便与其他同规模非蒸馏模型相比。”
“Stable Diffusion 3.5中型版则在中型模型中表现优异,兼顾了提示响应和图像质量,是高效且高质量表现的理想选择。”
该模型可供非商业用途免费使用,包括科研项目,以及年收入不超过100万美元的小型和中型企业使用。超过这一收入范围的企业则需获得企业许可证。
Github:https://github.com/Stability-AI/sd3.5
stable-diffusion-3.5-large
Huggingface: stabilityai/stable-diffusion-3.5-large
Stable Diffusion 3.5 Large 是一个多模式扩散变换器(MMDiT)文本到图像模型,在图像质量、排版、复杂提示理解和资源效率方面都有改进。
├── text_encoders/
│ ├── README.md
│ ├── clip_g.safetensors
│ ├── clip_l.safetensors
│ ├── t5xxl_fp16.safetensors
│ └── t5xxl_fp8_e4m3fn.safetensors
│
├── README.md
├── LICENSE
├── sd3_large.safetensors
├── SD3.5L_example_workflow.json
└── sd3_large_demo.png
** File structure below is for diffusers integration**
├── scheduler/
├── text_encoder/
├── text_encoder_2/
├── text_encoder_3/
├── tokenizer/
├── tokenizer_2/
├── tokenizer_3/
├── transformer/
├── vae/
└── model_index.json
快速上手
pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
image = pipe(
"A capybara holding a sign that reads Hello World",
num_inference_steps=28,
guidance_scale=3.5,
).images[0]
image.save("capybara.png")
我手头上24GB也 out of memory /(ㄒoㄒ)/~~
使用扩散器量化模型 减少 VRAM 使用量,让模型适合低 VRAM GPU
pip install bitsandbytes
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
import torch
model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
pipeline = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
prompt=prompt,
num_inference_steps=28,
guidance_scale=4.5,
max_sequence_length=512,
).images[0]
image.save("whimsical.png")
注意:至少12GB的显存,由于enable_model_cpu_offload,内存也最好多点,我 T4 的环境没起来,就在ram爆了。P100环境能运行,建议用专业平台。(RAM 21.2GB/29 GB VRAM 11GB/16GB )
stable-diffusion-3.5-large-turbo
Huggingface: stabilityai/stable-diffusion-3.5-large-turbo
Stable Diffusion 3.5 Large Turbo 是一款多模态扩散变换器 (MMDiT) 文本到图像模型,采用了对抗扩散蒸馏 (ADD),在图像质量、排版、复杂提示理解和资源效率方面的性能都有所提高,重点是减少了推理步骤。
├── text_encoders/ (text_encoder/text_encoder_1/text_encoder_2 are for diffusers)
│ ├── README.md
│ ├── clip_g.safetensors
│ ├── clip_l.safetensors
│ ├── t5xxl_fp16.safetensors
│ └── t5xxl_fp8_e4m3fn.safetensors
│
├── README.md
├── LICENSE
├── sd3_large_turbo.safetensors
├── SD3.5L_Turbo_example_workflow.json
└── sd3_large_turbo_demo.png
** File structure below is for diffusers integration**
├── scheduler/
├── text_encoder/
├── text_encoder_2/
├── text_encoder_3/
├── tokenizer/
├── tokenizer_2/
├── tokenizer_3/
├── transformer/
├── vae/
└── model_index.json
pip install -U diffusers
import torch
from diffusers import StableDiffusion3Pipeline
pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large-turbo", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")
image = pipe(
"A capybara holding a sign that reads Hello Fast World",
num_inference_steps=4,
guidance_scale=0.0,
).images[0]
image.save("capybara.png")
90 也没起来
减少 VRAM 使用量,让模型适合低 VRAM GPU
pip install bitsandbytes
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
from transformers import T5EncoderModel
import torch
model_id = "stabilityai/stable-diffusion-3.5-large-turbo"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
t5_nf4 = T5EncoderModel.from_pretrained("diffusers/t5-nf4", torch_dtype=torch.bfloat16)
pipeline = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
text_encoder_3=t5_nf4,
torch_dtype=torch.bfloat16
)
pipeline.enable_model_cpu_offload()
prompt = "A whimsical and creative image depicting a hybrid creature that is a mix of a waffle and a hippopotamus, basking in a river of melted butter amidst a breakfast-themed landscape. It features the distinctive, bulky body shape of a hippo. However, instead of the usual grey skin, the creature's body resembles a golden-brown, crispy waffle fresh off the griddle. The skin is textured with the familiar grid pattern of a waffle, each square filled with a glistening sheen of syrup. The environment combines the natural habitat of a hippo with elements of a breakfast table setting, a river of warm, melted butter, with oversized utensils or plates peeking out from the lush, pancake-like foliage in the background, a towering pepper mill standing in for a tree. As the sun rises in this fantastical world, it casts a warm, buttery glow over the scene. The creature, content in its butter river, lets out a yawn. Nearby, a flock of birds take flight"
image = pipeline(
prompt=prompt,
num_inference_steps=4,
guidance_scale=0.0,
max_sequence_length=512,
).images[0]
image.save("whimsical.png")
注意:至少12GB的显存,由于enable_model_cpu_offload,内存也最好多点,我 T4 的环境没起来,就在ram爆了。P100环境能运行,建议用专业平台。(RAM 27.7GB/29 GB VRAM 8.3GB/16GB )
最后
跑分神器 SD family:
- T0: SD3.5
- T1: SD3
- T2: SDXL
- T3: SD 1.5/2.1
你的电脑在哪一级,是不是该换了?