These are contradictory statements. “From thin air” and “no database” is in direct conflict with “trained on 5 billion images”.
They are not, at least, not if we are being precise and accurate with our terms.
The neural network architecture used to generate these images has no assistance from a database at runtime. There is a database that was used to train the neural network, but then again, the same thing could be said that there was a database of lived imagery indispensable to the growth of any human artist.
The difference between the two is significant if you consider the ramifications of “>5 billion training images result in a <2GB checkpoint file.” Even if you did set out to program a photobashing/referencing/art tracing robot, it wouldn’t be technologically possible (even remotely) to achieve the necessary image compression rates. The AI can only do what it does because, much like a human, it has learned the most generalizable features and how to adapt them in different contexts, and it’s forgotten all the rest.
If you ask an A.I. to paint something involving Sunburst as a subject matter, the A.I. needs to know what “Sunburst” is defined as, and to do so, it requires looking at references and making comparisons to as closely defining what elements exist regarding the subject (i.e. Sunburst).
Let me ask you, how do you know what Sunburst is, and how would you know how to draw him (even if sketchily) without a reference? Would it be because you’ve seen (“trained on”) images of him before? And would it be true that even if you, for example, didn’t have a precise memory for what his goatee or his cloak pendant looked like, you could fake it with something that bore a passing resemblance?
Although the comparison of neural networks to the human brain is a tired cliché, it’s a more accurate comparison than are the hardcoded expert systems of old you seem to be referring to.
You’d have to be an illogical fool to believe those 2 don’t have any correlation. We artists keep volumes of images that we like, showing poses, approaches, concepts, details, or applications of different tools we’d like to study and use for our own approach… and then modify how we do art in the future based on what we’ve discovered. It can also happen subconsciously. This doesn’t have to be tracing or direct copy-pasting to exist. We do know there’s a difference between references, inspiration, and tracing.
I’m not sure if you mean to gatekeep the term “we artists” from people like me who’ve ever touched an AI generator, but that’s certainly how it comes across, especially with the insult. I’ve labored over my own art in pre-AI times, and I’m no authority on the subject. I don’t automatically take you to be one, either.
Out of thin air means using zero references of any kind to create the image. There are artists who can actually do this, but they are few, rare, and extremely experienced.
There is no artist who can draw Sunburst without ever having seen Sunburst, or having had him described to them. If “drawing from reference” means having access to an accurate rendition of the subject during the time of drawing, then generative diffusion models are not guilty of this (unless you pursue img2img). If “drawing from reference” means ever having seen an accurate picture of the subject (or simply having seen imagery related to the individual terms constituting the prompt), then the AI are only as guilty of this as any human artist.
Believe it or not, “from thin air” is a pretty good approximation of [how generative diffusion models work)
What I mean by this specifically is, there is no database that these AI consult, nor do they have the capacity for any sort of implicit database.
Stable Diffusion was trained on >5 billion images after all, and properly pruned, its code checkpoint is less than 2GB in size.
These are contradictory statements.
“From thin air” and “no database” is in direct conflict with “trained on 5 billion images”. If you ask an A.I. to paint something involving Sunburst as a subject matter, the A.I. needs to know what “Sunburst” is defined as, and to do so, it requires looking at references and making comparisons to as closely defining what elements exist regarding the subject (i.e. Sunburst). It needs to know how to not just approximate, but directly cycle relevant and related content for improved accuracy.
This repeats ad infinitum with all the other elements involved that the system is trying to form. It’s how facial recognition works; it’s drawing up logical comparisons to approximate as close to 100% matching of features it can compare to, and within reason confirms exactness to define the subject. It’s why paintings with facial structures and statues can ping facial recognition because it identifies elements that match the configuration of human-esque features of “a face”.
Once recognition is identified, the A.I. is going to need to replicate the elements simply for accuracy sake, otherwise you get chaos; that is, little or no semblance to identify any consistencies matching with anything.
@Chopsticks
and I don’t think it’s because one knowingly referenced or traced another—for example, what do you make of these two pieces? ABA+B
You’d have to be an illogical fool to believe those 2 don’t have any correlation.
We artists keep volumes of images that we like, showing poses, approaches, concepts, details, or applications of different tools we’d like to study and use for our own approach… and then modify how we do art in the future based on what we’ve discovered. It can also happen subconsciously.
This doesn’t have to be tracing or direct copy-pasting to exist. We do know there’s a difference between references, inspiration, and tracing.
.
Out of thin air means using zero references of any kind to create the image. There are artists who can actually do this, but they are few, rare, and extremely experienced.
@Chopsticks
Believe it or not, “from thin air” is a pretty good approximation of how generative diffusion models work. It doesn’t follow a humanly intuitive process of explicitly referencing prior work, but is instead just a very complicated statistical predictor that has “forgotten” (as it never memorized) the specifics of every piece of art it was trained on.
What I mean by this specifically is, there is no database that these AI consult, nor do they have the capacity for any sort of implicit database. Stable Diffusion was trained on >5 billion images after all, and properly pruned, its code checkpoint is less than 2GB in size. That corresponds to less than half a byte of information per image it’s learned from. Unless an image was present many thousands of times in its training (like Mona Lisa or Starry Night), it’s just not possible that the neural network knows enough to reference specific images.
If it winds up aping familiar poses or elements, it’s just because these are clichés, or because there are only so many ways to draw a certain thing. I’ve had deja vu plenty of times while looking at human artwork, and I don’t think it’s because one knowingly referenced or traced another—for example, what do you make of these two pieces? ABA+B
@Paracompact
I didn’t say anything about photo bashing, though I did mean to say “for reference”.
The hair whip was what I initially noticed, and overall it made it look like Sunset with a beard and glasses.
Though it still has to grab something for references, it doesn’t pull them from thin air. I can occasionally pick out which specific images it used for certain items and poses.
@Chopsticks
Not really how the AI works. It doesn’t photobash like that. It has a definitive bias for feminine eyes and muzzles in general, though, given how many more mares than stallions appeared in its training.
Might as well go and fix the eyes up myself, though.
Edited
We artists keep volumes of images that we like, showing poses, approaches, concepts, details, or applications of different tools we’d like to study and use for our own approach… and then modify how we do art in the future based on what we’ve discovered. It can also happen subconsciously.
Edited
Believe it or not, “from thin air” is a pretty good approximation of how generative diffusion models work. It doesn’t follow a humanly intuitive process of explicitly referencing prior work, but is instead just a very complicated statistical predictor that has “forgotten” (as it never memorized) the specifics of every piece of art it was trained on.
Edited
I didn’t say anything about photo bashing, though I did mean to say “for reference”.
The hair whip was what I initially noticed, and overall it made it look like Sunset with a beard and glasses.
Not really how the AI works. It doesn’t photobash like that. It has a definitive bias for feminine eyes and muzzles in general, though, given how many more mares than stallions appeared in its training.
Edited