Record utterance "AudioVisual Programming: <br>How to explain things to machines "

Nowadays

,

apparently

text

-

based

image

generators

occupy

the

majority

of

approaches

in

generative

deep

learning

models

.

The

embedding

of

textual

concepts

in

the

space

of

controllable

parameters

of

a

generator

gives

us

the

impression

that

machines

are

starting

to

understand

human

visual

needs

.

It

seems

that

humans

just

need

to

know

how

to

explain

things

to

machines

by

engineering

text

prompts

.

<

br

> <

br

>

In

this

course

we

will

not

use

DALL

-

E

,

Midjourney

,

Imagen

and

other

Diffusion

models

,

or

we

can

if

you

want

.

But

it

is

much

more

fun

to

try

to

build

your

own

algorithm

from

scratch

,

which

uses

a

similar

approach

,

and

to

see

what

other

possibilities

it

hides

besides

what

already

exists

in

publicly

released

models

.

Our

aim

is

"

to

open

the

hood

”

and

to

look

inside

the

process

of

“

machine

’

s

comprehension

”

or

maybe

simply

jabber

,

and

to

see

if

the

machine

can

understand

it

.

We

will

use

Python

to

help

the

machine

understand

us

better

.