Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems Midjourney generates better results than SD or Dall-E.

What's with the "hyper resolution", "4K, detailed" adjectives which are thrown left and right, while we are at it?



Those are prompt engineering keywords. SD is way more reliant on tinkering with the prompt than midjourney

https://moritz.pm/posts/parameters


MidJourney needs a lot of prompt engineering too. And Dall-E also. If you look at the prompt as an opportunity to describe what you want to see, the results are often disappointing. It works better to think backwards about how the model was trained, and what sorts of web caption words it likely saw in training examples that used the sorts of features you’re hoping it will generate. This is more of a process of learning to ask the model to produce things it’s able to produce, using its special image language.


The metadata and file names of the images in the source data set are also inputs for the model training. These keywords are common tags across images that have these characteristics, so in the same way it knows what a unicorn looks like, it also knows what a 4k unicorn looks like compared to a hyper rez unicorn.


Midjourney uses SD under the hood (you can see in their license), but they augnment the model in various ways.


The results in midjourney are significantly better than SD. I find it much easier to get to a good result in MJ and I've been trying to understand why. Anymore insight you could share?


Good engineering. Midjourney likely has a lot going on under the hood before your prompt actually gets to Stable Diffusion. As an example you can check out this research paper [0] which seeks to add prompt chaining to GPT-3 so you can "correct" it's outputs before it reaches back to the user. There's also no rule that states you can only make one call to SD, MJ likely bounces around a picture through a pipeline they've tuned to ensure your generated image looks more reasonable.

[0]: https://arxiv.org/abs/2110.01691


Midjourney takes their base models and does further training/guidance on them to bring out intentional aesthetic qualities. One of their main goals is to ensure that that their “default” style is beautiful no matter how simple the user’s prompt is.


Opinionated background injected prompt suffixes varying based on user input + post processing pipelines.


Midjourney is doing "secret sauce" post-processing to enhance the image returned from the model. SD just gives you back what the model spits out. That's how I understand it at least




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: