It seems Midjourney generates better results than SD or Dall-E. What's with the ...

schleck8 · on Sept 9, 2022

Those are prompt engineering keywords. SD is way more reliant on tinkering with the prompt than midjourney

throwaway1851 · on Sept 10, 2022

MidJourney needs a lot of prompt engineering too. And Dall-E also. If you look at the prompt as an opportunity to describe what you want to see, the results are often disappointing. It works better to think backwards about how the model was trained, and what sorts of web caption words it likely saw in training examples that used the sorts of features you’re hoping it will generate. This is more of a process of learning to ask the model to produce things it’s able to produce, using its special image language.

Ckalegi · on Sept 9, 2022

The metadata and file names of the images in the source data set are also inputs for the model training. These keywords are common tags across images that have these characteristics, so in the same way it knows what a unicorn looks like, it also knows what a 4k unicorn looks like compared to a hyper rez unicorn.

cma · on Sept 10, 2022

Midjourney uses SD under the hood (you can see in their license), but they augnment the model in various ways.

qudat · on Sept 10, 2022

The results in midjourney are significantly better than SD. I find it much easier to get to a good result in MJ and I've been trying to understand why. Anymore insight you could share?

michannne · on Sept 10, 2022

Good engineering. Midjourney likely has a lot going on under the hood before your prompt actually gets to Stable Diffusion. As an example you can check out this research paper [0] which seeks to add prompt chaining to GPT-3 so you can "correct" it's outputs before it reaches back to the user. There's also no rule that states you can only make one call to SD, MJ likely bounces around a picture through a pipeline they've tuned to ensure your generated image looks more reasonable.

[0]: https://arxiv.org/abs/2110.01691

MattRix · on Sept 10, 2022

Midjourney takes their base models and does further training/guidance on them to bring out intentional aesthetic qualities. One of their main goals is to ensure that that their “default” style is beautiful no matter how simple the user’s prompt is.

napier · on Sept 10, 2022

Opinionated background injected prompt suffixes varying based on user input + post processing pipelines.

thedorkknight · on Sept 10, 2022

Midjourney is doing "secret sauce" post-processing to enhance the image returned from the model. SD just gives you back what the model spits out. That's how I understand it at least