Image Prompt Generator

Check out my Hugging Face space for the Image Prompt Generator (AKA Image Fungineer).

Get amazing image generations from simple 1-3 word prompts.

Models like Midjourney, Stable Diffusion and Dalle 2 are capable of generating beautiful images, but require detailed prompts that can be time consuming to imagine and write. The fungineer takes just a few words and prompt engineers them into something far more imaginative. The following examples show a simple prompt on the left and a re-imagined prompt on the right:

“Marble Statue”
A mesmerizing image depicting a majestic marble statue suspended in the air, surrounded by a dazzling display of swirling, sunlit gasps of air. The statue, suspended in mid-air, is adorned with intricate, intricate designs, from leaves to veins, glowing with vibrant colors and contrasting patterns. The surrounding area is bathed in an ethereal, ethereal light, casting dappled shadows and highlighting the intricate details of the sculpture. wide aperture, standard lens, medium depth of field
basketball
An imaginative image of a basketball player leaping into the air as they strike the ball. The player, dressed in an off-the-shoulder outfit, is adorned with a dazzling array of colorful and metallic studs, displaying their incredible speed and power. Their body is draped in a dazzling array of high-tech equipment, including a power rack, a motion-sensitive visor, and a ball gown. The background features a breathtaking cityscape filled with people in colorful attire, such as colorful sneakers, and a dramatic cityscape filled with skyscrapers. wide-angle lens, wide-angle lens, high contrast

More examples can be found on the Hugging Face Space.

There are two models, one is 355 million parameters and another is 7 billion parameters. Both were trained on a dataset that used ChatGPT to convert boring image prompts in to exciting ones.

I found that the 355M parameter model tended to create some incoherence in the prompts, but still coherent enough to make beautiful images. The incoherence of the prompts often let to far more imaginative and interesting results.

The 7B parameter model consistently produced well-worded prompts, but that resulted in less interesting images from the image generators. Despite being larger, my preference for final images actually came from the smaller model.

I tested many different pre-trained base models along with LoRA and Prefix Tuning, with increasing number of parameters, I was able to minimize loss, but the cost and time for inference was much higher for only slightly better results. Here are the test results:

Leave a Reply

Your email address will not be published. Required fields are marked *