The strong bias to 3D really negatively affects its 2D style coherence. You can tell the mouths are trying to move based on like… an underlying 3D form.
It’s still fascinating how a model not specifically finetuned on ponies can still achieve this close to what was requested. I doubt this will ever have an open source variant, but it would be fun to see what could come of it if it were.