Nowadays, apparently text-based image generators occupy the majority of approaches in generative deep learning models. The embedding of textual concepts in the space of controllable parameters of a generator gives us the impression that machines are starting to understand human visual needs. It seems that humans just need to know how to explain things to machines by engineering text prompts.

In this course we will not use DALL-E, Midjourney, Imagen and other Diffusion models, or we can if you want. But it is much more fun to try to build your own algorithm from scratch, which uses a similar approach, and to see what other possibilities it hides besides what already exists in publicly released models. Our aim is "to open the hood” and to look inside the process of “machine’s comprehension” or maybe simply jabber, and to see if the machine can understand it. We will use Python to help the machine understand us better.