• By -


Holding an umbrella


SDXL can't do action except poorly. For example a punch to the face. The fist connecting to the face and the person being hit showing some proper reaction. Either in a comic book style or realistic. Dalle 3 can do this to a certain extent, at least much better than SD.


Action is a limitation of all models due to training. It’s hard to create action poses with motion blur when you trained the model on high quality images free of blur. We need an sdxl model trained specifically on action images.


I've heard it speculated Dall-E chooses different models to feed its prompts to depending on what the prompt is. I don't know if it's official or proven.


DallE is comically bad. I got a subscription for it and had to cancel because it just would not listen. ChatGPT even agreed that it was embarrassing how many times it would falsely flag a prompt




All training images are generally stationary. So there’s nothing to caption.




You’re talking about something different than I am. I’m saying models are not currently trained on images with motion. You’re saying they can be. Please try actively reading what I’m saying and reply to my thoughts or start a new thread.




This post was about SDXL.




I still don’t understand why you’re talking to me about something I never said.


I wil tend to believe your opinion on this u/HotNCuteBoxing


Quick glance at their profile, and…yep. Will defer to their opinion lol


[I think someone heard your challenge and accepted.](https://civitai.com/models/487118/punch-in-the-face?modelVersionId=541681)


Hahaha. I appreciate the effort.


Empty swimming pools, people holding guns, people smoking cigarettes, animals driving a 1992 Chevy Blazer, QWERTY keyboards, monster trucks made of silicone breast implants etc etc etc


Guns are the most baffling to me, so many action scenes and such we could make but no gun ever looks good, straight and at the right angle. Sooooo, so much porn options but no guns


Its because every gun in the training data is labelled "gun". It's vague and non descriptive. A gun can mean a pea shooter, a revolver, an smg, a sniper rifle, a staple gun, or many others. All of these are radically different looking. It's always a captioning problem. OpenAI proved this. It was further established with PonyXL


Mechanical objects are harder than organic shapes. People have no straight lines and shapes and proportions vary naturally. On the other hand for any specific angle there's only one right way to draw a Glock17. Any variation becomes a visible defect. There's also probably lots of very inaccurate illustrations of guns in the training data since lots of artists suck at drawing them or are unconcerned about technical accuracy.


That’s because AI is smart. It knows what *really* matters.


"make love not war" - Stable Diffusion


Objectification and dehumanization hardly qualifies as "love".


I tried to make a LoRA for guns and smoking both. The results are hit or miss.


For the animal driving a car and the monster truck, you could probably get that to work using the prompt scheduling syntax. I once tried making pictures of ginger bread men with guns fighting in a trench made of cake. It didn't work as a basic prompt but if I started with a picture of a soldier and switched the prompt to a gingerbread man at the right point, I was able to get what I wanted.


upside down people


Can't make Mussolini 🙃


Flip the laptop around


pls don’t be mean to australians


Yeah, someone doing a cart wheel is very difficult for all image AI, I have tried it on quite a few, only Dalle.3 came close. *


https://preview.redd.it/09s77q2twl3d1.jpeg?width=1024&format=pjpg&auto=webp&s=62d1d8af454142ee7b6e3a6643d3acaade2af5fc Dalle.3


Can't even do thumbs down.


An archer shooting a bow. I've never seen **any** image generation model so far get the alignment of the arrow and the bowstring right even remotely consistently. (The fact that both of these are really thin probably doesn't help with encoding)


Oh yeah, I've tried to generate some archery shots and it's hilarious. If you specify a compound it can almost look convincing at a glance, until you realise the cables are going all over the place.


Anything where the subject has to interact with something. Holding or grabbing objects in non weird ways, using a leash correctly, using guns, swords, using stones to make a fire, unusual stuff like going swimming in a hazmat suit etc.


Everything about complex designs. I think it’s just great for rendering and melting things together, ultra realistically but it doesn’t do design properly


People lying down, especially when trying to get someone lying on their back, seen from the side, at about the same height as the surface. This usually just produces nightmare fuel.


Not an issue with lineart, but if you're not using lineart you have a very very low chance of getting a person lying down correctly. It can happen but it's a mess. You should be able to get people lying down if you provide a decent lineart and use pryacanny in fooocus or some kind of canny thing in your favorite AI program.


Pretty much every time I try to show a woman lying down it changes it to a child regardless of negative prompts 🤦 And don't try to have a grown woman with a teddy bear lol


Time to pick a different checkpoint.


The PonyXL models know how to do this quite well. You could use one of them to get the initial image and then use another checkpoint to refine it.


outfit consistency. If I am using lora, it changes in small parts.


Hardest are related to weapons: trident, archer, hammer warrior, etc (and people with 3 eyes or just one eye)


I have difficulties creating literally everything apart from portraits, creatures, close-ups or landscapes. Any action (like opening a door, holding a weapon, texting on a phone, ...) leads me to Inpaint, ControlNet and Photoshop work that can last for hours to get fixed.


Sexualized content. Mind you, I'm only using checkpoints and loras from religious sites... but *surely* there must be a way to see some risqué imagery.


I need a Mormon bubble porn Lora asap


I made a LoRa to show some ankle, I got you covered.


Hot in 1890


Literally anything that isn’t common. For example, time ago I wanted a couple photorealistic illustrations for a novel of mine where the protagonist is a 6’6” tall woman, standing next to the coprotagonist—a male of a more “normal” stature, perhaps 6’—and it was impossible: it would render almost everything else pretty well (their other physical features, clothes, background, etc.), but the woman would be consistently *shorter* or, at best, his same height. I assume it’s because there are so few pictures of unusually tall women standing next to average-heighted men, that the models simply don’t know how to visualize it. The only way would be to “trick” the model by describing instead a midget-like man, but that would usually draw him in limb proportions typical of people with dwarfism. I assume the models just don’t have enough imagery of what I actually asked. More generally, anything that isn’t common in nature. The other day a guy complained in the ChatGPT subreddit that he couldn’t make DALL-E draw him a spider with two legs, they all came with six or so. Similarly, I assume it’s just the model doesn’t have in its training any two-legged spiders to get an idea of how that looks.


Did you try using controlnet and regional prompting?


It definately knows spider and legs, it does okish with numbers, but didn't realize that the six spindly things that are part of "spider" are "legs" that you could count and number. Probably sees them as part and parcel of "spider."


>Literally anything that isn’t common. Tell me about it. Ask it for an image of Thomas the Tank Engine all cheeked up and it just doesn't know what the fuck to do.


Could train a lora with 70+ pictures I think. Would probably need more images than usual though to break the model bias. 200+ quality images and it might be easy.


Most of your stable diffusion models have no way to specify height. For this you need to draw it first or make a reference picture.


Consistent hands


Particular species of dinosaurs. Mythological creatures like griffins or centaurs. Scenes with a dozen people in them, each reacting to the situation in a different, but appropriate way (like The Last Supper). A child hanging upside down from monkey bars. Video game buildings from a top-down, front-facing point of view, not isometric.


https://preview.redd.it/nn0gbvsbyk3d1.png?width=1024&format=png&auto=webp&s=69995fbca8b998ee5e36795bebb37a1d0d11423d and they think AI is going to kill us all


An upside down umbrella. Edit: This is my benchmark for Ai being able to use reasoning to turn an object upside down when it’s almost certainly not being trained on images of upsidedown umbrellas. GPT4o got close but still looked really wonky.


How did you get access to 4o image generator? Public version of 4o still uses DALLE.


Ok, whatever is currently running on 4o, then.


many, many things, which I will not say


Undressing or naked or half naked people with the rest of their clothes on the floor. Both SD1.5 and SDXL can do, with the right models (and not some obscure LoRA or checkpoint or something): - Naked people - Clothed people - Half naked people (e.g. swimwear, underwear, etc.) But it's almost impossible to create realistic images of people in the process of putting on their clothes or taking them off.. or, say, a picture of a naked couple making out with their clothes on the floor. I've even tried training LoRAs and TIs (1.5 only, didn't have the time and resources for SDXL ones) for these concepts to no avail.


Characters standing with their back towards the viewer. Possible but a lot of janky results.


I have 99% good results with the "showing back" prompt.


“Their back is to the camera” or “view from behind” also works well


I make such photos all the time. Just use «photo from behind» and «looking into the distance/horizon» or «looking away»




Here you go, with Ideogram : https://preview.redd.it/mcr2p4vlzj3d1.png?width=1874&format=png&auto=webp&s=d6f0b42ccdfaf11515777f6535ec88b9841e0226 (Sorry for the double post, replied to the wrong one 😅)


​ https://preview.redd.it/p65sj3ds0k3d1.jpeg?width=1536&format=pjpg&auto=webp&s=ea0be0e7fa0624def20ce0d24604f9b7694ea292


wtf 🤯


can confirm. i have tried with dall-e, midjourney, SD1.5/SDXL and no one generated square wheels. wtf


Ideogram can do it apparently, first try with prompt "a bicycle with square wheels", style "illustration" : https://preview.redd.it/3zl93qhhzj3d1.png?width=1874&format=png&auto=webp&s=bb0885c24b1ad2b511e37b176455785ef6d48e77


To me the bigger wtf is how the hell Ideogram is actually able to do it. In the ontologies that these models have learned a square wheel is probably almost as much of an oxymoron as a square circle.


Yeah it's interesting. At least square wheel is contradictory enough to be a known phrase that appears in pop culture. I'd be more surprised if they could do something like "triangle wheels".


The Ideogram team came from Google and they were at the forefront of Diffusion pioneering there so, they're good and it's only going to get better, they're the real deal.


Expressing a broken cup is extremely difficult. Even ControlNet isn't much help, and not only with SD but also with other AI image generation services, it is rendered very unnaturally.


Closed Eyes, actually an easy concept but I can't find any models that do it ( maybe I haven't explored enough). No matter what I put in the prompt it's always open eyes, looking straight to me like she wants me so bad.


Instead of "closed eyes" use "sleeping eyes". Sometimes a combination of the two works better also. Like using BonoboXL "sleeping eyes" works almost every time, but using EpicrealismXL I find I need to use both to get it consistent.


Well the eyes are closed but she's lying in bed now. Is that some kind of sign?


Haha did you describe an action first? This one was "a photo of a girl standing at a bus stop, sleeping eyes" https://imgur.com/a/eXUgrR0


Yes, this is exactly what I wanted to do, which model did you used?


That one was epicrealism v6


👍 Thanks


https://i.imgur.com/fnFfJYk.png Which one?


I was using the SDXL version. This one here https://civitai.com/models/277058?modelVersionId=484695


Thank you.


use "reading a book" and the eyes never open


That works but not for the scenario i want. However the other guy tips "sleeping eyes" "closed eye" both together works most of the time.


opened eyes in neg also




"cutoff" content. Like part of person's arm is hidden inside a box but their hand is visible. While AI can do that but there always errors


I find "intangible" in the negatives helps somewhat with that.


Katana and archery, two of my favorite sports


Tools such as wrenches and real power tools.


I’m unable to get a three headed dragon wearing cowboy hats. SDXL/SD3 keeps generating three separate dragons. Dall-e seems to follow the prompt fine. Same with a samurai riding a sea horse. SDXL/SD3 keeps generating a horse.


anybody talking on a telephone. SD3 can't do this either


"A man is being loud and obnoxious in a coffee date, the woman is bored and unamused, maybe even annoyed, and its apparent in her face." Or any other variation of that, NEVER worked for me, ANYWHERE, ANYMODEL, not just sdxl Somehow coffee date ALWAYS imply an image of BOTH smiling


Don't expect A.I. to understand the nuances of human language. Just replace "coffee date" with "in a coffee shop", and describe each of the subjects as you want them to appear in the image. https://preview.redd.it/vflgpay67n3d1.jpeg?width=1216&format=pjpg&auto=webp&s=d3555a8dd01d23fe3251e3f0f1dcc91e0b9ee77a Photo of a young man and a young woman in a coffee shop. The man is talking loudly and animated. The woman is bored.


Designing simple yet elegant wood furniture. There's always something wrong with the perspective or it is missing legs. Some of the idea's are great thougjh. (but I'm also still learning to use all the tools, so there's that)


Carrying person...i recently tried to create an image of a superhero carrying someone and nope, couldnt do it


txt to 3d stl


Guitar smashing. Probably smashing *anything* that's not normally, well... smashed. But SD/SDXL just cannot fathom a way to interact with a guitar other than one hand by the pickups, one hand on the neck. Also, really weirdly, athletic footwear. SDXL seems to have zero clue what softball/football/soccer/baseball cleats or spikes are, gives hilariously inappropriate results if you use the Commonwealth term "football boots", and mostly draws either basketball hi-tops or plimsolls if you prompt "track shoes" or "running shoes".


Predefined Text on a banana that is in some scenery


Anything under water is difficult


This is totally random, but I had the hardest time trying to get any yellow fruit on a yellow background today. I wanted to stay with the same aesthetic as the other images in the series, so I didn't venture out of my family of checkpoints, but every other color worked just fine... I just ended up with a banana on a black background, or a lemon on an orange background. Weird quirk 🤷🏻‍♂️ https://preview.redd.it/a1ztx0ajil3d1.png?width=8192&format=png&auto=webp&s=d86cb373b7bb3db2e4291429a70db6a881973f4d


In case it might be helpful ([For Real XL v0.5](https://civitai.com/models/432244/forrealxl)) https://preview.redd.it/rf941o88do3d1.png?width=896&format=png&auto=webp&s=81f41196e9be6511b5db58a0fb7a078feb9fe34d


I actually figured out what I was doing wrong. I was messing with the color tones by passing a black image in and even at 1.00 denoise, it was still somehow impacting only the background of only yellow images. Weird.








Holding cards, like in a game of poker. Usually one very distorted card appears, but never a full hand of cards.


buildings with shifted perspective, all our trials ended up wanky


Industrial complex




Doing Spartan helmets without the fur on top


AI is very good at reaching 90% due to statistical techniques but the last 10% is always much harder. There is always going to be some hybrid human/AI colab to some extent with stuff like inpaint and controlnet


Any industrial equipment. Boxes and shipping containers apparently are tagged often enough to give it at least *some* ideas (or maybe they're simple enough that errors don't come up), but a conveyor belt? An extruder? A drill? Nothing sensible.


normal hands every time


Mirrors, try and make a girl doing her makeup, It has no idea how she would be reflected.


Hair covering a persons face Sadako style.


Historic North American Aboriginal peoples, their clothing, dwellings, way of life. Buffalo hunting, teepees, canoeing, kayaking, etc. Prairie life homesteading, wagons, horses, cowboys and RNWMP. I'd also like to see the ability to accurately depict turn of the century coal mining and realistic steam trains.


So far no AI can illustrate the trolly problem 


Beastmen in anything but anime style. https://preview.redd.it/0s6x1n860o3d1.jpeg?width=1840&format=pjpg&auto=webp&s=825b0f752f6732f21f6e2470b9ae85ccff171a26


Warhammer 40k, does not matter what subject. Nothing looks good.


Clothes if they’re not being worn and are not neatly folded. It doesn’t understand what a casually strewn or hanging garment should look like


It can't draw a crab


Legit spent half an hour trying to get it to generate someone floating face-down in a lake. :(


A {funny|cute} barn owl {holding|waving with} a red griddle pan. That's InvokeAI syntax for dynamic prompts. And yes, I'm trying to get that on a birthday card *lol*


Power armor. Now how am I supposed to touch up my SamusxMaster Chief hentai?


photorealistic food


Anything that challenges a common concept. AI can do a person holding an apple but can't do it the other way around.




Mermaids are so full of fail. There are a few cliche mermaid poses/looks/styles that it does well, but if you try to get it to do anything nuanced it will give you monsters. I just sketch and inpaint "fish tail" now rather than prompt for mermaids.


A triangle. A simple line drawing of a triangle. https://preview.redd.it/7v283hr0yk3d1.jpeg?width=474&format=pjpg&auto=webp&s=07d9f505b6cf7fdab16bc1f669f99382f02f6652


"simple black line drawing of a perfect triangle, geometric, symmetrical, white background", [Mohawk v2](https://civitai.com/models/144952/mohawk) https://preview.redd.it/87hhoqizeo3d1.png?width=1024&format=png&auto=webp&s=33b6198a4631aebbb8e1e0c664aa62843bb70e75