Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Against reality: AI co-creation as a powerful new programming tool
(USC Thesis Other)
Against reality: AI co-creation as a powerful new programming tool
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Against Reality:
AI Co-Creation as a Powerful New Programming Tool
by
Olivia Peace
A Thesis Presented to the
A FACULTY OF THE USC SCHOOL OF CINEMA TIC ARTS
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF FINE ARTS
(CINEMA TELEVISION)
May 2022
Copyright 2022 Olivia Peace
Dedication
This project is dedicated to my ancestors past and present who have guided my flight
both in the dreamspace and in this realm.
Fig 1. quote from Octavia Butler’s book Parable of the Sower
ii
Acknowledgements
Thank you so much to Andreas, Peter, and Kevin who have signed on to advise me on
this project during an increasingly more and more tumultuous time in this world. Thank you for
granting me the patience, space, curiosity, incredible references and excitement you’ve shown
me throughout each stage of this project’s evolution.
To my IMGD cohort, this has been the best educational experience I’ve ever had and that
has to do with how brilliant, kind, and caring each person in our little group is. I’ve learned so
much from each and every person in my cohort that has made me a better designer, artist, and
human. It’s been real.
Thank you to my friends, my beloved community, for cheering me on in this process.
Jenny and Dr Chukwu for helping me hold it together every time I tried to quit. Emiliana and
Leah for embarking on the first iteration of this project with me. Shakinah for introducing me to
the hero Uncle Butch who helped me construct the physical installation for this piece. Justin for
the killer sound design on always very short notice. Ethan for looking at endless iterations of this
project. Lolia for strengthening my theories on everything. And of course, my neighbor Jenny
Funkmeyer whose work in lucid dreaming formed the foundation for this project.
Finally, I would also like to thank both of my seminal texts: my dad, Richard Peace for
speaking life into me as a young artist. And to my mom Tamela Peace for letting me read my
USC admissions essay at 11:30pm PST, 2:30am EST over the phone so that I would have the
audacity to press submit.
Thanks for always taking my calls. Now we’re both going to have master’s degrees.
iii
Table of Contents
Dedication ii
Acknowledgments iii
List of Figures v
Abstract vi
Chapter 1: Introduction
1.1 So, what exactly is VQGAN anyway?
1.2 The Title Against Reality
1.3 Experience Overview
1
1
3
5
Chapter 2: Title 8
Chapter 3: Soundscape 10
Chapter 4: Technology 13
Chapter 5: Digital Bias 18
Chapter 6: Thinking like a Programmer 23
Chapter 7: Programming as a Precursor to Worldbuilding 26
Chapter 8: Moving Forward 28
Bibliography 29
iv
List of Figures
Fig. 1. Illustrated quote from Octavia Butler’s book Parable of the Sower ii
Fig. 2. Preliminary Diagram of Gallery Space Layout 6
Fig 3. Fig 3. Screenshot from nightmare scene in ‘Against Reality’ 9
Fig. 4. Screenshot of “solarpunk futuristic society” from “Against Reality” 9
Fig. 5. Screencapture of my own early VQGAN generated image of “grocery store
parking lot”
13
Fig. 6. Screencapture of a generated image of “grocery store aisle” created using a
target image
15
Fig. 7. The original “grocery store aisle” target image found in a Google search 16
Fig. 8. Screencapture of the 50th frame of “black baptist church” 18
Fig. 9. Final completed 300th frame of “black baptist church” 19
Fig. 10. Final frame of “busy church on sunday morning | darkskinned churchgoers” 20
Fig. 11. Final frame of a failed render overrun with watermarks and logos 22
Fig. 12. Kehinde Wiley’s stained glass window entitled Mary, Comforter of the
Afflicted II
24
Fig. 13. “Against Reality” poster generated using Kehinde Wiley’s Mary, Comforter
of the Afflicted I as its target image long with keywords such as “grapes”
25
v
Abstract
“Against Reality” is a short surreal autobiographical documentary video built in the AI
art generation tool VQGAN + Clip. The larger version of this project is an interactive art
installation intended to be experienced in a gallery space where guests are invited to press their
bodies into a stretchy projector screen, thereby “entering” the dreamrealm. The documentary
video then plays on the other side of the screen, on top of the shifting imprints of each guests’
body.
The story that forms the foundation of “Against Reality” is a peek into the
autobiographical account of how I first began to re-learn how to dream. If this story followed the
formatting of Alice in Wonderland, this piece documents the part of the tale where she falls down
the rabbit hole.
It is the story of how beginning to pay attention to and learning to trust my own mind
changed my life.
There are many ways to make films that feel dreamlike aesthetically, but for this project,
I set my sights on using VQGAN + Clip to mimic the wavy imperfect imagery I’d witnessed in
the dreamspace. My VQGAN tutor, Michael Carychao, who I found on Tik Tok, was the first
person I heard use the term “AI co-creation.” Michael prefers the term “co-creation” to
“generation” so as to not minimize the essential partnerships between the artist, the machine, and
the millions of people who have both knowingly and unknowingly contributed to forming the
program’s training set.
VQGAN + CLIP currently sits right on the border between being considered a tool and
being a full-time intelligent designer. The word “collaborator” seems fitting at this time as it has
vi.
served me as a heavy lifter to generate and render out complex animations that under normal
circumstances would take me weeks to ideate, produce, and iterate on. With this technology, I
was able to complete full scenes in a matter of hours. And, after learning the basics of VQGAN,
the final piece was made in about a week including multiple iterative design cycles.
This paper documents how AI Art co-creation tools like VQGAN + CLIP can be used as
powerful immersive worldbuilding tools even as they are marred by the same prejudices and
biases that plague us in our waking lives. It is in better understanding the systems affecting our
representational imagery, labels, and training sets that we might be able to steer these tools away
from being used for more nefarious purposes. For, similar to dreams, what you bring into these ai
tools deeply colors the whole experience of what you are able to create with them.
As time goes on and the technology improves and becomes more accessible to the
masses, honing one’s own vision as an artist alongside learning to think like a programmer will
be vital to advancing this new artistic medium to be more and more expressive of meaningful
aspects of the human experience.
vii.
Chapter 1: Introduction
AI Co-created Artwork has been quietly rising to online popularity in an extremely
notable way. Examples of AI creative writing, voice generation, music, and what’s most relevant
to this project, visual artwork, are already flooding social media timelines around the world. AI
Co-created NFTs are in high demand. Many of these pieces were made in Google Collab
notebooks. There are many notebooks I could have used for this project, but the one I settled on
ultimately for this iteration was Zooming VQGAN+CLIP Animations. I chose this notebook in
particular because it struck a great balance between being able to render out high fidelity images
without crashing my 2019 Macbook Pro. It also allows artists to zoom, pan, and angle towards
and away from images as though they were directing a camera in a traditionally animated scene.
As a filmmaker, my interest was piqued.
1.1 So, what exactly is VQGAN anyway?
VQGAN, or Vector Quantized Generative Adversarial Network, is an “architecture” or,
as I’ve come to think of it, a type of synthesizer which can be used both to learn and co-create
novel images based on previously documented data. VQGAN was first introduced in the paper
“Taming Transformers” (2021) by Esser, Rombach, and Ommer although the fuller concept of
GANs were initially proposed as a concept by Ian Goodfellow in 2014.
The GAN part of VQGAN is a neural network system made up of two competing sides.
On one side of the system, are the “Generators,” which as its name implies, generate novel
images. On the other side are the “Discriminators,” which judge the images in order to determine
1
how accurate they are. Overtime, with some practice and intentional training, the GANs
Generator will get better at generating images with less noticeable errors and the Discriminator
will get better at spotting mistakes in the first place. This design dichotomy is intended to mirror
the ways in which our own brain is perceived to work.
One side of our brain is colloquially understood to be more analytical, the other more
creative, and together the fused unit helps us to both perceive and make decisions about both the
world we currently live in as well as the one we would like to create for ourselves. The
neurobiological basis of lucid dreaming is still currently unknown, however it has been observed
that active collaborations between both sides of the brain are necessary in order for frequent
lucid dreamers to be able to control their dreams. (Baird, Castelnovo, Gosseries, 2018).
So how does the discriminator manage to differentiate between accurate images and non
accurate unsuccessful images? The answer lies in the ways in which it has been trained.
Training sets for AI artwork are made from online collections of images paired with
descriptive words. These words either come from captions added to the images, or they are
gleaned from the text on web pages that the images happen to be posted on. These descriptions
are what allow people to write prompts in plain speak that the GANs then develop into more and
more crystalized imagery. The training set I used for my video came from the popular database,
ImageNet.
ImageNet has some very interesting affordances and hindrances, but we’ll circle back to
that in the Digital Bias chapter.
A guiding principle in this project came from artist and educator Melanie Hoff whose
brainchild “Always Already Coding” reads as follows:
Everyone who interacts with computers has in important ways always already been
programming them.
2
Every time you make a folder or rename a file on your computer, the actions you take
through moving your mouse and clicking on buttons, translate into text-based commands
or scripts which eventually translate into binary.
Why are the common conceptions of what a programmer and user is, so divorced from
each other? The distinction between programmer and user is reinforced and maintained
by a tech industry that benefits from a population rendered computationally passive. If we
accept and adopt the role of less agency, we then make it harder for ourselves to come
into more agency.
We've unpacked the "user" a little, now let's look at the "programmer." When a
programmer is writing javascript, they are using prewritten, packaged functions and
variables in order to carry out the actions they want their code to do. In this way, the
programmer is also the user. Why is using pre-made scripts seen so differently than using
buttons that fire pre-made scripts?
When we all build up and cultivate one another’s agency to shape technology and online
spaces, we are contributing to creating a world that is more supportive, affirming, and
healing (Hoff, 2020).
It troubles me greatly that even as I have created this piece, I have struggled to identify as
a coder or a programmer. A chasm is opening in our society between those who are fluent in a
small handful of computing languages and those like me who, for whatever reason, feel
themselves too ill equipped to learn. However, through collaborating with the various AI
generation tools, I have come to the conclusion that, just as Hoff says, we are always already
Coding. This project turned me into a programmer and it is my intention that the final resulting
piece encourages others to begin to see themselves and the data that they share online as
programming as well.
1.2 The Title Against Reality
The beginnings of this project found me very interested in the question of what does it
even mean for something to be labeled as reality? Why are dreams not considered real when
3
people frequently come away from them so greatly affected? Are things generated with AI
somehow less “real” and therefore less authorful than things created using more traditional
media tools?
Through the creation of this project, I have learned a lot about the serious implications
and consequences of allowing tools that are intended to replicate human creativity to be
inundated by human prejudices– or more specifically, as scholar bell hooks would refer to it,
Imperialist White Supremacist Heteropatriarchy.
The title of the piece “Against Reality” was born cheekily from my own desire to create
work that is intended to leave audiences broadening their understanding of “realness.” More
specifically, I’d like to use the workflow embedded in AI art generation tools to be in
conversation with the workflow embedded in lucid dreaming. In her powerhouse text, Glitch
Feminism, writer and curator Legacy Russell refuses to use the term “real life” to describe the
non-digital landscapes in which we spend much of our waking life. Russell instead prefers the
term Away From Keyboard or AFK for short.
AFK as a term works toward undermining the fetishization of “real life,” helping us to
see that because realities in the digital are echoed offline and vice versa, our gestures,
explorations, actions online can inform and even deepen our offline, or AFK, existence.
This is powerful, (Russell 2020).
The concept of “reality” is occasionally used to compel people into upholding
anachronistic systems that oppress them. When these old systems are challenged, an advocate for
the system might invoke realism, or more colloquially, “being a realist” as a retort.
4
In dreams, we are given a nega-world that operates using many of the elements derived from our
“reality” but rearranged or remixed in ways that serve as a natural counterbalance for the stark
reality that the dreamer spends the majority of their waking life inside of. They have been known
to provoke, reveal, and expand. So we have to wonder- could getting better in touch with one’s
own dreams help them to see reality for what it is- a thin film of previously agreed upon rules
and ideas held together by us as a collective force. Before learning to get lucid in a dream, I
commonly found myself in dream scenarios that, though in hindsight were nonsensical and
completely disjointed, at the time I fully accepted and went along with.
Our collective perceptions of reality have changed often throughout history. However, as
the sand shifts under our feet, we find ourselves diminishing the majesty of these occasions just
barely after they’ve passed. Prematurely forgetting the dream shortly after we wake up.
1.3 Experience Overview
“Against Reality” is a two part experience: it is both an interactive installation as well as
a flat 2D film made in collaboration with Artificial Intelligence tools. For the installation portion,
I wanted to find a way to allow the piece to be simply and cheaply changed by each guest who
visits it. In this way it would mirror the central mechanic present in AI co-creation tools wherein
it is through a combination of seen and unseen collaboration both on the part of the user and
medium itself that an artistic piece is created.
Guests begin the interactive portion of the experience backstage on the “A SIDE” where
they wait in what they believe to be a queue to enter the exhibition experience. As guests wait in
line, they are positioned behind a tall and wide wall made of stretchy spandex fabric held in
place by a heavy frame made of PVC piping.
5
Guests are instructed to take their time writing in permanent black marker on the back of
the screen an intention. A few intentions are written on the fabric beforehand, for example “I am
aware that I am dreaming” and “I will remember my dreams.”
Guests are then instructed to “press” the intention into the wall with whatever body part
they would like in order to send it into the dreamspace. From there, guests push themselves into
the back of the screen, briefly imprinting themselves into the soft stretchy fabric before they
move on to the next part of the exhibition.
As guests come around to the other side of the screen, to the “B SIDE,” they are met with
a single wooden church pew where they can choose to sit and watch the film projected on the
clean side of the projector screen.
Fig 2. Preliminary Diagram of Gallery Space Layout
6
The film is designed to loop and my own voice narrates the piece. The AI co-created
imagery illustrates the story of how I went from not dreaming at all to becoming lucid.
Guests are encouraged to stay as long as they like, or even nod off on the B Side of the
experience.
7
Chapter 2: Artistic Look & Feel
Initially, “Against Reality” was conceptualized as a 360 video experience, but as the
project progressed, it became apparent that it would not be possible to make a high resolution
rendering in VQGAN + CLIP that would suit a headset experience. One of the affordances I
found in the Google Collab Pytti 5 notebook was that I could render out AI co-created imagery
in 4K quality without it crashing and going offline whereas with VQGAN + CLIP Animations,
the most I could manage rendering out consistently without errors was a tepid 1080p. And so,
once I’d come to this conclusion, I scrapped the 360 video idea and settled on making a piece for
a flat screen (albeit, one that people can walk around and interact with). Artistically, I am known
for loving to experiment with alternate non-traditional screens. My latest feature film, TAHARA,
is filmed and presented almost entirely in a 1:1 aspect ratio. Creating an analog interactable
screen seemed like a great next step.
As far as the feel, the video portion of “Against Reality” exists as one smooth slow dolly
movement through surreal, abstracted, ever shifting landscapes. All of these stunning
atmospheric spaces were brought to life with VQGAN + CLIP.
By maintaining consistent parameters for my zoom and translation integers, I was able to
achieve a slow steady “push forward” motion throughout the piece as though audience members
were being lead deeper and deeper into the dream realm.
8
Fig 3. Screenshot from nightmare scene in ‘Against Reality’
Fig 4. Screenshot of “solarpunk futuristic society” from “Against Reality”
9
None of these images are meant to be perfect representational renderings of anything
concrete. Similar to dream logic, I wanted this imagery to be abstracted, much more emotionally,
or as the churched would say “spirit lead,” than fitted to stark realism. I took my cues from the
surrealists in this way.
Andreas Kratky wrote about the connection between surrealist automatisms and AI
co-creation in the 2022 paper “Poetic Automatisms” noting the similarities and differences
between the two approaches to art making. Of the surrealists he wrote:
Drawing from the Surrealist Manifestoes we get the sense that the intention of the
surrealists was to shift the human brain into a different state of mind that suspends considerations
of utility etc. which normally govern our decision-making processes. The use of automatisms
serves two main purposes: it is a way of producing perceivable traces, Breton tends to refer to
“images,” which, when externalized and perceived, put the human brain into a state of
imaginative activity that goes beyond the normal responses; the second purpose is, by way of
producing those traces that trigger mental images, they reshape the human perception of the
world and the value sets that are brought to it. (Kratky 2022).
And so it was by first deciding to leave realism at the door, that I was able to get my
audience members closer to experiencing the visceral truth of what it feels like inside of my own
mind at night.
Chapter 3: Soundscape
Sound is an extremely crucial part of making this experience work. My voice rides at
different levels in the mix depending on where we are in the story. A combination of
environmental sounds, swelling Hammond B3 organ riffs, and heavy spatialized foley helped to
bring the dreamspaces I am describing in my voiceover narration to life.
When I told my sound designer, Justin Enoch, about the “crunchy, layered” audio quality
I hoped to elicit, I sent them a few tracks as samples: namely Ana Roxanne’s “Untitled” song off
10
of her 2020 Because of a Flower album as well as the iconic “Start” track from Frank Ocean’s
2012 hit record Channel Orange. We also talked a lot about composer Caroline Shaw and the
ways in which she uses echoing vocalizations to create rich soundscapes in her work as well as
Moses Sumney’s album Græ, in which I found a gorgeous tonal kindred spirit.
My voice serves as the sole narrator for the piece. This was my first time doing so, and
though quite nervous about the vulnerability of lending a tangible part of myself to the
experience, it felt imperative to do so in service of the overall theme of collaboration. My voice
was necessary to deliver the sermon of the piece. I recorded my voice myself mostly in one take
with my iPhone X’s voicenote app and then sent the .wav file over to Justin whom I told to
generate a dense soundscape with the audio– to make it feel spatialized so that even without
imagery, guests would feel present in the spaces I was talking about.
Viewers feel enveloped, not just visually, but also by the soundscape. It is designed to
mimic the experience of being in a dream, leaving viewers free to decide exactly where onscreen
they’d like to look and rest their eyes at any given moment. The piece is designed to deliver
multiple points of interest throughout its duration even in the sound mix. The sound is
furthermore designed to feel just as surreal as the visuals and imitates dream-logic, in that it
prioritizes how something feels rather than aiming for more concretely replicating how it sounds.
One of my favorite parts of our sound design is the backing track we created of organ
music that plays throughout the piece. For this portion, I was inspired by my late grandfather
Richard C. Peace I from Oxford, North Carolina. He was at once a small town farmer, a Detroit
factory worker, and a strikingly talented organ player who could often be found sneaking away in
quiet moments to noodle on the Hammond B3 he had stationed in his living room.
11
For our piece, we sampled the sound of musician Derrick Jackson improvising on the
Hammond B3 organ from a clip uploaded to Youtube. In the clip, Jackson absolutely shreds on
the organ during an entirely improvised solo known as a “praise break” or a pause between
preaching and the more traditional praise and worship featuring a choir. Simply put, it’s an
organist’s chance to show off. The Hammond B3 sound is one especially notorious in Black
Pentacostal church tradition. It is a comparatively smallish electric organ, known for generating
pitched sounds strikingly similar to the human voice. Hammond B3’s often cry and wail
intertwined with a gospel singer’s vocals until the two become one force- pushing one another
along. Scholar Ashon Crawley considers the Hammond B3 organ to be integral to the Black
church experience writing:
Described as sounding human, the Hammond organ offers a way to think about the
breakdown between human and machines. In a Testimony given at Rev. F.W. McGee’s
Blackpentecostal church, January 28, 1930, one brother asks the saints to pray, “that I
may be used as an instrument in his hand.” This desire for instrumentality, I argue,
structures the Blackpentecostal imagination such that any object can be sacralized, made
holy. People not only beat tambourines and stomp feet, but play washboards with spoons
and blow whistles. The Hammond organ is in this tradition, the utilization of any object
for sacred possibility. And in such making sacred of objects, the instrument is not the
Hammond on the one hand or the musician on the other: the instrument is the sociality of
the Spirit Filled musician with the musical object working together.
I found the Hammond B3 sound to be the perfect backing track for my own vocals seeing
as it was exactly this type of collaboration between artist and machine that I wished to examine
for this project.
12
Chapter 4: Technology
The piece itself is designed almost entirely in the Google Collab notebook Zooming
VQGAN + Clip Animations. AI co-created artworks have a hazy surrealistic quality that lends
itself well to subject matter about dreams. The elements in AI co-created artworks often do make
sense though the ways in which they connect to the piece as a whole are often filled with what I
will refer to as “realism errors.”
For example:
One of my earliest test pieces involved attempting to make a grocery store parking lot
Fig 5. Screencapture of my own early VQGAN generated image of “grocery store parking lot”
13
Though, at a glance, this image is overall successful in being able to suggest the idea of a parking
lot outside of a grocery store, there are a fair amount of realism errors that fill the piece. An
interesting one is the yellow lines on the ground that are meant to be parking spaces. After
examining the image, I realized that this was due to the GANs having a good grasp on how
something looks- it’s been fed good data that shows it that parking lots should have yellow lines
on them. However, it doesn’t have as clear of a grasp on why something might look the way it
does. In this image, the lines are all over the place because they are divorced from their “real
life” purpose, which is to guide cars into neat space saving rows.
It is here that I, as the programmer, must step in and help to further train and sharpen the tool.
14
Fig 6. Screencapture of a generated image of “grocery store aisle” created using a target image
I created the image above using the same tool but with a few adjustments. Firstly, I
learned that selecting less literal image prompts that read closer to Instagram captions than stark
commands worked best. So instead of “long grocery store aisle with cereal boxes on shelves and
narrowing in the distance” I would have better luck trying out “empty grocery store aisle” I also
learned that feeding the AI target images profoundly helped to guide the AI towards specific
aesthetics. For each run, I could temporarily add more and more of my own imagery and
therefore direction to the dataset.
15
The target image I used in order to create that green grocery store aisle image was this one:
Fig 7. the original “grocery store aisle” target image found in a Google search
You’ll notice that there are still reality errors when comparing the two images, but also
some interesting additions within the AI’s rendering. Legendary filmmaker Halle Gerima once
said during a talk on USC’s campus that, “in my imperfections, I have found my accent.”
Meaning, that quite often it is what at first appears aberrant in a new medium that later on is
recollected as its signature style. And VQGAN has a very distinct accent given to it by the
imagery pulled from online.
“The clearest commonality between AI-generated artworks is this hazy surrealism you’ll
pick up on. Elements in isolation make sense, but the way they connect feels off. As a
result, images that aim for realism might look convincing from a distance, but when you
16
get up close, things fall apart… A GAN might have a solid grasp on how something
looks, but it misses the why. The image sets that train AI help them recognize objects, but
not truly understand them… So while they’re pretty good with form, they get the logic of
their scenes all wrong” (Alexandre 2021).
Furthermore, I found that the AI used in this project displayed a notable ability to pull out
and distill an image down into qualities that might not be as readily well-known to the human
artist. For example, apparently a good majority of the grocery store aisle images pulled from the
ImageNet database show that grocery stores tend to be green and use fluorescent lighting. There
is a lot we can learn from the data inside of the training sets that these AI art generators use.
Another thing that I learned unfortunately is that racism, sexism, and consumerism have
also carried over from our “real world” into the AI training sets.
17
Chapter 5: Digital Bias
My first run in with this happened when I was attempting to switch from my preferred
VQGAN + CLIP notebook to the alternate Google Collab notebook Pytti 5. Pytti 5 boasts 3D
movements, video style transfer, and far more settings to play with and so I was excited to try my
hand at using it to create some of the imagery for my film. I immediately ran into an issue when I
used the prompt “black baptist church”
Fig 8. Screencapture of the 50th frame of “black baptist church”
The above initial created image looked promising.
18
Fig 9. Final completed 300th frame of “black baptist church”
However, it soon deteriorated into this ghoulish ape-like figure.
I assumed it must be a problem with my prompt and went on to try to be more specific
with my languaging. I decided to try simplifying to “african american church” assuming that
perhaps the horrifying image had to do with the word “black” being associated with evil.
However, much to my concern, I found that the program kept returning images of all
kinds of apes whenever my prompt attempted to create imagery of distinctly African people.
The workaround I eventually came up with is to stack up my direct image prompts and
instead of saying black, african american, or african, the best I could do is to vaguely type “dark
skin” and hope for the best though many of these results came back vague and shapeless.
19
Fig 10. Final frame of “busy church on sunday morning | darkskinned churchgoers”
This was the best rendition of a “busy church on sunday morning | darkskinned churchgoers” that
I could manage after first feeding the AI a handful of imagery of black, non-ape, churches.
The other workaround I found was to simply eliminate all training set results that were
labeled anything even close to the word monkey. This proved particularly tedious however as
each scene averaged 30 minutes to 4 hours to render out and many final results were ruined by
what can only be described as old-timey racism. In the end, I found myself having to program
into each and every render a command fully eliminating the words ape, gorilla, monkey,
orangutan, chimp, chimpanzee, and baboon from the training set results. Figuring this out took
hours as each time a monkey began appearing onscreen, I’d have to abort the render and start all
over again with even more strict stipulations.
I didn’t realize it at the time, but this troubleshooting methodology was also mirrored in a
decision the tech behemoth Google made in 2015. Journalist James Vincent wrote about the
incident for The Verge:
20
Back in 2015, software engineer Jacky Alciné pointed out that the image recognition
algorithms in Google Photos were classifying his black friends as “gorillas.” Google said
it was “appalled” at the mistake, apologized to Alciné, and promised to fix the problem.
But, as a new report from Wired shows, nearly three years on and Google hasn’t really
fixed anything. The company has simply blocked its image recognition algorithms from
identifying gorillas altogether — preferring, presumably, to limit the service rather than
risk another miscategorization.” Google photos was still censoring the image categories
“gorilla,” “chimp,” “chimpanzee,” and “monkey” as late as 2018 without a clear solution
or a more nuanced path forward. ImageNet, the preferred training set feeding VQGAN,
announced in 2019 that it would remove some 600,000 poorly/ offensively labeled
images from its online database (Vincent 2018).
In a later statement in 2019, ImageNet acknowledged “Issues of Fairness and
Representation” with its training sets citing three key issues that lead to poor labels being applied
to various images.
The first issue is that WordNet, the English database that creates sets of synonyms that
pairs and groups congruent words together, contains offensive terms that were then used to label
and tag images. The second problem is that ImageNet was often used to label people as concepts
or even potential careers based solely on looks. And then finally and possibly one of the most
obvious issues is that there is overall insufficient representation for people who are not white,
cisgender men.
These problems become even more apparent and complex if, for example you were to ask
the AI to generate imagery of mugshots, which along with featuring ceramic cups, also only
feature Black people. Asking the neural networks to generate imagery of women run rampant
with oddly graphic pornography.
And so I began to realize I was working with a medium that was not only not intended to
create edifying imagery of historically marginalized communities, but even worse was actually
hostile to my attempts to do so. This tedium was only further complicated by the heavy
prevalence of advertisements in the Pytti Limited Palette training set which ruined images in a
21
different way by slapping trademarks, watermarks, and logos over otherwise perfectly nice
images.
Fig 11. Final frame of a failed render overrun with watermarks and logos
I learned to navigate around this by eliminating all images that featured “words, logos, brands,
text or watermarks” but only after waiting hours for this particular image to crystalize before
realizing it was being overrun by AI generated gibberish attempting, in its own way, to reproduce
an image similar to the high quality ones it was pulling down from websites like ArtStation. In
the essay “Excavating AI: The Politics of Images in Machine Learning Training Sets,”
artist/researchers Kate Crawford and Trevor Paglen write:
Images do not describe themselves. This is a feature that artists have explored for
centuries. Agnes Martin creates a grid-like painting and dubs it “White Flower,” Magritte
paints a picture of an apple with the words “This is not an apple.” We see those images
differently when we see how they’re labeled. The circuit between image, label, and
referent is flexible and can be reconstructed in any number of ways to do different kinds
of work. (Crawford and Paglen 2019).
22
And so it was after a frustrating couple of weeks that I learned how to more effectively program
according to the needs of my piece.
Chapter 6: Thinking Like A Programmer
Art without an author has precedent. Roland Barthes writes in Death of the Author about
a more figurative death wherein audiences should effectively ignore an artist’s background or
perspective when examining a piece of work. In working with GANs though, I found this
perspective to be nearly impossible. I had gone from being able to perceive this machine as a
blank tool that merely spits out imagery and at once became quite aware that it instead held clear
markers from the millions of hands, themselves unwitting artists, who had played a role in
programming it. The lines between artist, user, and programmer have become impressively
blurry in this process.
So I had a problem: it would appear that in an attempt to make the training sets that my
GAN used less racist, Black people as a visual concept had been rendered basically invisible.
And even still I was instead getting images of primates.
It was here that I decided to get more specific with the imagery I would use for my target
and initial imagery. And just like that, I went from claiming the roles of artist/user/ programmer
to also becoming a photo compiler and even a photographer. I began uploading my own photos,
sampling images from my own personal archives. I also used images from some of my favorite
artists: Richard Mayhew, Carrie Mae Weems, Kahlil Joseph, Jamel Shabazz, Kwame Brathwaite
etc. to provide scaffolding for what Blackness could look like in my piece and allowed the
23
machine to remix them within reason, to make something totally new out of them. Legacy
Russell writes,
To remix is to rearrange, to add to, an original recording. The spirit of remixing is about
finding ways to innovate with what’s been given, creating something new from
something already there… remixing is an act of self-determination; it is a technology of
survival, (Russell 133).
Remixing and sampling is a powerful tool that is very familiar and well loved within the
Black art world. Sampling plays an unmistakably essential role in hip hop as well as in
contemporary Black film like Arthur Jafa’s groundbreaking work “Love is the Message, The
Message is Death” as well as in the work of contemporary painters like Kehinde Wiley.
Fig 12. Kehinde Wiley’s stained glass window entitled Mary, Comforter of the Afflicted II
24
As a Black person, the idea of being able to digitally remix imagery piqued my interest in further
exploring this medium to tell this story.
Fig 13. “Against Reality” poster generated using Kehinde Wiley’s Mary, Comforter of the Afflicted II as its target
image along with keywords such as “grapes”
25
Chapter 7: Programming as a Precursor to Worldbuilding
The central theme of this project draws direct parallels between lucid dreaming and
working with AI collaborative tools. Just as your own mind samples from your subconscious
internal dataset to create dreams, GANs pull from a mostly hidden shared repository of imagery
in order to create surrealistic imagery. Both practices are distinctly marked by the residue of
people’s own beliefs, time period, and circumstances. If there were no artists uploading images
to various sites like Flicker and Artstation, nothing would have been able to be generated for this
project. The AI would not be able to synthesize an image without our collective input. In this
way, we are all functioning as programmers- uploading bits and pieces of our perspectives online
that can then be used to create something else.
It is imperative to maintain a level of curiosity about where our collectively created
images are coming from. It is also imperative to be critical of what the “something else” that gets
created is. What will all of our information be used for? In the dreamspace there are many
theories about what dreams are “supposed” to be for. Dreams have been theorized to do
everything from helping people to prepare for imminent threats (Valli, K., Revonsuo, A., Pälkäs,
O., Ismail, K. H., Ali, K. J., & Punamäki, R. L., 2005) to revealing important personal messages
from our subconscious (Jung, 1968).
Lucid dreaming is a step removed from the typical everyday dreaming experience in that
instead of reacting unconsciously, “unable to reflect on your current situation, you now hold the
reins– your mind is awake enough to call the shots… No longer confined to a physical body, you
have the freedom to travel over large distances, move at incredible speeds, or even transcend
time as you know it,” (Zeizel, Tuccillo, Peisel, 2013). Lucid dreaming also has the affordance of
26
allowing dreamers to build entirely new worlds, unbound by the laws of society or even physics.
It is a powerful and still currently free prototyping tool.
I would like to posit here that the everyman learning to assume responsibility as a
programmer is an essential first step to being able to use AI co-creation tools to imagine,
prototype, and maybe even later on, build new worlds. Programming cannot solely be left up to
those employed at tech giants. It is an effort of collaboration made by everyone, though the
degrees of creative freedom decrease as decisions are made in a program’s design pipeline. For
example, the original programmers using the Python programming language had far more
freedom to decide that this iteration of VQGAN + clip would not be compatible with creating
360 videos than I did as an ancillary programmer using prompts and translation integers within
the bounds of their Google Colab notebook. As Andre Breton wrote in his Manifesto of
Surrealism:
This imagination which knows no bounds is henceforth allowed to be exercised only in
strict accordance with the laws of an arbitrary utility (Breton, 1924).
Understanding ourselves as active participants– as programmers en route to becoming
worldbuilders is essential to the ethos of “Against Reality.” This project itself is an imperfect
first attempt at training audiences in simple ways to think of themselves as active participants in
the art making process. We are not just users. We are always consciously or not, co-creators.
27
Chapter 8: Moving Forward
We all play an invaluable role in programming even the seemingly most advanced
machines. Your online daily habits, the things you buy, the information and imagery you share
are actively shaping both the digital as well as the offline worlds that we live in each day.
I see this iteration of “Against Reality” as a highly informative first step in taking up
more space as a programmer and worldbuilder. I intend to continue iterating on the VQGAN
co-created imagery as I continue to document and then render out these digitally synthesized
versions of my own nighttime adventures.
I cannot wait to continue evangelizing about the joys of lucid dreaming while hopefully
in the process helping to democratize tools like VQGAN. Both tools have a lot in common
including inspiring a certain level of fear from the everyday person, who might assume what
we’ve all been incorrectly taught about things that take a lot of time and effort: either you’ve got
it or you don’t. I can assure you that in both cases I most certainly did not have even an ounce of
it. But I have plenty now.
28
Bibliography
Alexandre, L. (2021). Why Ai-Generated Art Is So Beautiful. YouTube. YouTube.
Retrieved March 17, 2022, from
https://www.youtube.com/watch?v=Bi4sJEE8wCs&list=PLyqruvCZrFY8HL36DI6tD4lk
Pt8tNPEg5&index=5&t=164s.
Baird, B., Castelnovo, A., Gosseries, O. et al. Frequent lucid dreaming associated with
increased functional connectivity between frontopolar cortex and temporoparietal
association areas. Sci Rep 8, 17798 (2018). https://doi.org/10.1038/s41598-018-36190-w
Barthes, Roland. The Death Of The Author. 1st ed., University Handout, 1968, Accessed
13 Mar 2017.
Breton, André, 1896-1966. (1972). Manifestoes of surrealism. Ann Arbor :University of
Michigan Press,
Crawley, Ashton. “Nothing Music: The Hammond B3 and the Centrifugitivity of
Blackpentecostal Sound.” Ashon Crawley Dot Com, Ashon Crawley Dot Com, 21 Sept.
2019,
https://ashoncrawley.com/blog/2014/04/30/nothing-music-the-hammond-b3-and-the-cent
rifugitivity-of-blackpentecostal-sound.
Hoff, M. (n.d.). Always Already Programming. GitHub. Retrieved March 18, 2022, from
https://gist.github.com/melaniehoff/95ca90df7ca47761dc3d3d58fead22d4
Jung, C. (eds) (1968) Man and His Symbols USA: Dell Publishing.
Kate Crawford and Trevor Paglen, “Excavating AI: The Politics of Training Sets for
Machine Learning (September 19, 2019) https://excavating.ai
Kratky, Andreas. (2022). Poetic Automatisms: A Comparison of Surrealist Automatisms
and Artificial Intelligence for Creative Expression. 10.1007/978-3-030-95531-1_25.
Russell, L. (2020). Glitch feminism: A manifesto.
Valli, K., Revonsuo, A., Pälkäs, O., Ismail, K. H., Ali, K. J., & Punamäki, R. L. (2005).
The threat simulation theory of the evolutionary function of dreaming: Evidence from
dreams of traumatized children. Consciousness and cognition, 14(1), 188–218.
https://doi.org/10.1016/S1053-8100(03)00019-9
Vincent, J. (2018, January 12). Google 'fixed' its racist algorithm by removing gorillas
from its image-labeling tech. The Verge. Retrieved March 18, 2022, from
https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-
algorithm-ai
29
Yang, K., Qinami, K., Fei-Fei, L., Deng, J., & Russakovsky, O. (n.d.). Towards fairer
datasets: Filtering and balancing the distribution of the people subtree in the ImageNet
hierarchy. ImageNet. Retrieved March 17, 2022, from
https://image-net.org/update-sep-17-2019
Zeizel, J., Tuccillo, D., Peisel, T. (2013). A Field Guide to Lucid Dreaming: Mastering
the Art of Oneironautics. United States: Workman Publishing Company.
30
Abstract (if available)
Abstract
“Against Reality” is a short surreal autobiographical documentary video built in the AI art generation tool VQGAN + Clip. The larger version of this project is an interactive art installation intended to be experienced in a gallery space where guests are invited to press their bodies into a stretchy projector screen, thereby “entering” the dreamrealm. The documentary video then plays on the other side of the screen, on top of the shifting imprints of each guests’ body.
The story that forms the foundation of “Against Reality” is a peek into the autobiographical account of how I first began to re-learn how to dream. If this story followed the formatting of Alice in Wonderland, this piece documents the part of the tale where she falls down the rabbit hole.
It is the story of how beginning to pay attention to and learning to trust my own mind changed my life.
There are many ways to make films that feel dreamlike aesthetically, but for this project, I set my sights on using VQGAN + Clip to mimic the wavy imperfect imagery I’d witnessed in the dreamspace. My VQGAN tutor, Michael Carychao, who I found on Tik Tok, was the first person I heard use the term “AI co-creation.” Michael prefers the term “co-creation” to “generation” so as to not minimize the essential partnerships between the artist, the machine, and the millions of people who have both knowingly and unknowingly contributed to forming the program’s training set.
VQGAN + CLIP currently sits right on the border between being considered a tool and being a full-time intelligent designer. The word “collaborator” seems fitting at this time as it has served me as a heavy lifter to generate and render out complex animations that under normal circumstances would take me weeks to ideate, produce, and iterate on. With this technology, I was able to complete full scenes in a matter of hours. And, after learning the basics of VQGAN, the final piece was made in about a week including multiple iterative design cycles.
This paper documents how AI Art co-creation tools like VQGAN + CLIP can be used as powerful immersive worldbuilding tools even as they are marred by the same prejudices and biases that plague us in our waking lives. It is in better understanding the systems affecting our representational imagery, labels, and training sets that we might be able to steer these tools away from being used for more nefarious purposes. For, similar to dreams, what you bring into these ai tools deeply colors the whole experience of what you are able to create with them.
As time goes on and the technology improves and becomes more accessible to the masses, honing one’s own vision as an artist alongside learning to think like a programmer will be vital to advancing this new artistic medium to be more and more expressive of meaningful aspects of the human experience.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Reversal: tell a story about social hierarchy and inequality without using text and dialogue
PDF
Reversal: expressing a theme through mechanics
PDF
Untitled: an exploration of virtual reality as a cinematic medium
PDF
That's not how it happened: unreliable narration through interactivity
PDF
Developing a playful situational creator for mixed reality: design analysis on Neon City, a city building game
PDF
The Cleaner: recreating censorship in video games
PDF
The art of Cervus blade
PDF
A monster's hearth: seeing yourself through monsters
PDF
Penrose Station: an exploration of presence, immersion, player identity, and the unsilent protagonist
PDF
duOS
PDF
Coding.Care: guidebooks for intersectional AI
PDF
Morana: explore healing potential of virtual reality storytelling
PDF
Garden designing a creative experience with art and music orchestra
PDF
Creatively driven process: using a virtual production workflow to achieve a creative vision
PDF
Last broadcast: making meaning out of the mundane
PDF
Fall from Grace: an experiment in understanding and challenging player beliefs through games
PDF
“Mystical” VR: an experimental VR experience exploring concepts and symbolism of ancient Andean worldview based on the symbol ""La Chakana""
PDF
Hedge hug -- a narrative-driven interactive exploration on low self-esteem and social anxiety
PDF
The Darkest Plague: insights on the development of an action adventure game
PDF
Val’s pelvic health journey: the educational power of storytelling
Asset Metadata
Creator
Peace, Olivia
(author)
Core Title
Against reality: AI co-creation as a powerful new programming tool
School
School of Cinematic Arts
Degree
Master of Arts
Degree Program
Cinema Television
Degree Conferral Date
2022-05
Publication Date
04/22/2022
Defense Date
04/22/2022
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
against reality,AI,AI co-creation,AI generated artwork,animation,artificial intelligence,clip,dream,dreaming,experimental animation,GaN,GANS,interactive media and games,interactive media and games division,lucid dreaming,OAI-PMH Harvest,Olivia Peace,VQGAN,VQGAN + clip
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fullerton, Tracy (
committee chair
), Lemarchand, Richard (
committee member
), Nealen, Andy (
committee member
)
Creator Email
contact.oliviapeace@gmail.com,opeace@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111099213
Unique identifier
UC111099213
Document Type
Thesis
Format
application/pdf (imt)
Rights
Peace, Olivia
Type
texts
Source
20220425-usctheses-batch-932
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
against reality
AI
AI co-creation
AI generated artwork
animation
artificial intelligence
clip
dream
dreaming
experimental animation
GaN
GANS
interactive media and games
interactive media and games division
lucid dreaming
Olivia Peace
VQGAN
VQGAN + clip