Stability AI launches Stable Diffusion XL 1.0, a text-to-image model

Stability AI has announced Stable Diffusion XL 1.0, a text-to-image model that the company describes as its "most advanced" version to date.

Stability AI said that SDXL 1.0 can produce more vivid and accurate colors, has been enhanced in contrast, light and shadow, and can produce 1 million pixel images (1024×1024). It also supports post-editing of generated images directly on the web page.

4aba73476c72ae5ec1a08d0bca1afe03.jpeg

Cue words can also be simpler than before. This is because the basic model parameters of SDXL 1.0 have reached 3.5 billion, and the understanding ability is stronger. Compared with the basic version of Stable Diffusion, the number of parameters is only about 1 billion. As a result, SDXL 1.0 has become one of the largest open image models currently.

The Stability AI blog presents more technical details of SDXL 1.0. First, the model breaks new ground in both scale and architecture. It innovatively uses a base model (base model) + a refiner model (refiner model), the parameters of which are 3.5 billion and 6.6 billion respectively.

83053fde8496bb70e4d64b58db939ad1.jpeg

This also makes SDXL 1.0 one of the largest open graphics models currently available.

Emad Mostaque, founder of Stability AI, said that a larger number of parameters can allow the model to understand more concepts and teach it deeper things. At the same time, the RLHF enhancement was also carried out in the SDXL 0.9 version.

This is why SDXL 1.0 now supports short prompts, and can distinguish between the Red Square and a Red Square.

In the specific synthesis process, in the first step, the base model generates noisy latent, and then the refined model performs denoising.

The basic model can also be used as an independent module. The combination of these two models can generate better quality images without consuming more computing resources.

Test results:

07e5788e20a4e12a78357324d2f6504c.jpeg


Install:

1. Clone the repo

git clone [email protected]:Stability-AI/generative-models.git

cd generative-models

2. Set up a virtual environment

This is assuming you've navigated to the generative-models root after the clone.

NOTE: This was tested under python3.8 and python3.10. For other python versions you may have version conflicts.

PyTorch 1.13

# install required packages from pypi

python3 -m venv .pt13source .pt13/bin/activate

pip3 install -r requirements/pt13.txt

PyTorch 2.0

# install required packages from pypi

python3 -m venv .pt2source .pt2/bin/activate

pip3 install -r requirements/pt2.txt

3. Install sgm

pip3 install .

4. Install sdata for training

pip3 install -e git+https://github.com/Stability-AI/datapipelines.git@main#egg=sdata

Package

This repository uses the PEP 517-compliant packaging method hatch.

To build distributable wheels, install hatch and then run hatch build (specifying -t wheel will skip building sdist, which is not necessary).

pip install hatch

hatch build -t wheel

You will find the built package in dist/. You can install wheels with pip install dist/*.whl.

Note that this package is not currently specified as a dependency; depending on your use case and PyTorch version, you will need to manually install the required packages.

reasoning

We provide a trickle text-to-image and image-to-image sampling demo scripts/demo/sampling.py. We provide the file hash of the full file, as well as the file hash of only the tensors held in the file (see model specs for the script to evaluate this). The following models are currently supported:

· SDXL-base-1.0

File Hash (sha256): 31e35c80fc4829d14f90153f4c74cd59c90b779f6afe05a74cd6120b893f7e5b

Tensordata Hash (sha256): 0xd7a9105a900fd52748f20725fe52fe52b507fd36bee4fc107b1550a26e6ee1d7

· SDXL-Refiner-1.0

File Hash (sha256): 7440042bbdc8a24813002c09b6b69b64dc90fded4472613437b7f55f9b7d9c5f

Tensordata Hash (sha256): 0x1a77d21bebc4b4de78c474a90cb74dc0d2217caf4061971dbfa75ad406b75d81

· SDXL-base-0.9

· SDXL-Refiner-0.9

· SD-2.1-512

· SD-2.1-768

Weight of SDXL :

SDXL-1.0: SDXL-1.0 weights are available (under the CreativeML Open RAIL++-M license) here:

· Basic model: https://hugging face .co/stability ai/stable-diffusion-XL-base-1.0/

·  Refiner model: https://hugging face . co/stability ai/stable-diffusion-XL-refiner-1.0/

SDXL-0.9: SDXL-0.9 weights are available and subject to a research license. If you would like to access these models for research, please apply using one of the links below: SDXL-base-0.9-models, and SDXL-refiner-0.9. This means you can apply for either of these two links, and if you are approved, you will have access to both. Please log in to your Hugging Face account with your organizational email to request access.

After getting the weights, put them into checkpoints/. Next, use

streamlit run scripts/demo/sampling.py --server.port <your_port>

Invisible Watermark Detection

Images generated with our code use the invisible watermark library to embed invisible watermarks into the model output. We also provide a script to easily detect watermarks. Note that this watermark is different from previous stable diffusion 1.x/2.x releases.

To run the script, you need to have a working installation as above or try an experimental one using only a minimal number of package imports:

python -m venv .detectsource .detect/bin/activate


pip install "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25"

pip install --no-deps invisible-watermark

To run the script, you need to have a working installation as described above. The script can be used in the following way (don't forget to activate your virtual environment in advance, e.g. source .pt1/bin/activate):

# test a single file

python scripts/demo/detect.py <your filename here># test multiple files at once

python scripts/demo/detect.py <filename 1> <filename 2> ... <filename n># test all files in a specific folder

python scripts/demo/detect.py <your folder name here>/*

Training:

We provide an example training configuration in configs/example_training. To start training, run

python main.py --base configs/<config1.yaml> configs/<config2.yaml>

The configurations are merged from left to right (later configurations will override the same value). This can be used to combine models, training and data configuration. However, all of these can also be defined in a single configuration. For example, to run class-conditional pixel-based diffusion model training on MNIST, run

python main.py --base configs/example_training/toy/mnist_cond.yaml

Note 1: To configure configs/example_training/imagenet-f8_cond.yaml, configs/example_training/txt2img-clipl.yaml and configs/example_training/txt2img-clipl-legacy-ucg-training.yaml for training with a non-toy dataset, set Needs to be edited according to the dataset used (the dataset is expected to be stored in web dataset-format). To find the part that needs to be modified, search for comments containing USER: in the respective configuration.

Note 2: This repository supports both pytorch1.13 and pytorch2 for training generative models. However, for autoencoder training, such as configs/example_training/autoencoder/kl-f4/imagenet-attnfree-logvar.yaml, only pytorch1.13 is supported.

Note 3: Training the underlying generative model (eg configs/example_training/imagenet-f8_cond.yaml ) needs to retrieve the hugging face from the checkpoint and replace the CKPT_PATH placeholder in this line. Do the same for the provided text-to-image configuration.

Build a new diffusion model

Regulator

This GeneralConditioner is passed conditioner_config. Its only attribute is emb_models , a list of different embedders (all inheriting from AbstractEmbModel ) used to condition the generated model. All embedders should define whether they are trainable (is_trainable, default False), use the bootstrap loss rate without classifiers (ucg_rate, default 0), and the input key (input_key), e.g. txt for text conditioning or cls for class conditioning. When evaluating the condition, the embedder will get batch[input_key] as input. We currently support 2D to 4D conditions, and conditions from different embedders are properly concatenated. Note that embed programs are important in conditioner_config.

network

The neural network is passed network_config. This used to be called unet_config, which isn't common enough yet, as we plan to experiment with transformer-based diffusion backbones.

fail

Loss is configured via loss_config in the following way. For standard diffusion model training, you must set sigma_sampler_config.

Sampler configuration

As mentioned above, samplers are model independent. In sampler_config we set the type of numerical solver, the number of steps, the type of discretization, and e.g. a bootstrap wrapper for classifier-less bootstrapping.

Dataset processing

For large-scale training, we recommend using our data pipeline data pipeline project. The project is included in the requirements and following the installation section. Small map style datasets should be defined in a repository (e.g. MNIST, CIFAR-10, ...) and return a dictionary of data keys/values, e.g.,

example = {"jpg": x, &nbsp;# this is a tensor -1...1 chw

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;"txt": "a beautiful image"}

We expect images in -1...1, channel-first format.


According to the official introduction, SDXL 1.0 can run on a consumer-grade GPU with 8GB VRAM, or on the cloud. In addition, SDXL 1.0 has also been improved in fine-tuning, and can generate custom LoRAs or checkpoints.

The Stability AI team is also now building a new generation of task-specific structured, styled and combined controls, with T2I/ControlNet specifically for SDXL.



Guess you like

Origin blog.csdn.net/specssss/article/details/132000821