One of the things I noticed is that coming up with good prompts is a skill unto itself. Not the same skill as drawing, mind you. It’s more like being a director and a key grip. I’ve been using it for something a bit more ambitious. I’ll find a base image, then tweak the prompts to make a series of images that look like they’re from the same “set”, and that tell a story. Assembling a series like this is more like film editing. A big thing I noticed is that it’s basically impossible to prevent continuity errors.
I think AI drawing gets as far as it does because a single static image is way less complicated; a song is a highly technical thing with a lot of moving parts. The other thing is that basic art literacy is fairly accessible, and it’s relatively easy to translate that literacy to a prompt. In fact, you don’t even need art literacy; a working understanding of a booru’s tagging system is more than enough. Most people do not have basic music literacy. I’ve taught myself composition and music theory, and I feel like by the time a person has the knowledge to generate good prompts, they would have an easier time just firing up a DAW and applying their knowledge directly.
Another big thing is efficiency. I can look at a generated image and instantly know if the quality is acceptable. With a song, you have to listen to the entire song. I imagine trying to create a bug-free AI song is exponentially more frustrating than creating a single image.
Even then, I doubt it’s going to have an effect Kevin MacLeod hasn’t already had.