Hypercinema Cornell Box Project

A ChatGPT “Self-Portrait”

by Sam De Armas and Mathew Olson

Google Drive link to Unity build of the project for Mac (NYU email only).

Developments in AI tech have been unfolding at a rapid pace while ITP’s class of ‘24 have been starting our time here. Just within this Hypercinema class, some of the discussions we were having a couple months ago about image generation models Dall-E and Stable Diffusion have already become a bit dated.

With these models progressing so quickly, there comes a bit of pressure to be “of the moment” in creating art with or about (or against) AI. For now, the subject of this piece–OpenAI’s ChatGPT model–is free for use in its early testing state. The model is under so much demand that it experiences frequent downtime, and OpenAI is already making updates behind the scenes that may drastically alter the characteristics of the model’s output. For those reasons, part of the motivation behind exploring ChatGPT with this piece is to try and add something novel to the collection of artistic works and overall output of the model at this early moment in its history, especially before it become a premium service and therefore less accessible to artists.

Building on the latest version of the Generative Pre-trained Transformer model and trained on billions of examples of text collected from the internet, ChatGPT is a chatbot interface that lets users prompt the AI for all kinds of responses, ranging from highly technical queries about code and mathematics to passable essays and fanciful fiction. For this project, we decided to explore how ChatGPT goes about generating text about art and about itself, using the assignment to create a digital Cornell box

Note that this documentation post was not written with the assistance of ChatGPT. Excerpts of model outputs quoted throughout will be enclosed with quotation marks.

The prompting process

Our project draws primarily on excerpts from a half-dozen lengthy prompting sessions with ChatGPT, all of which began with variations on this initial prompt:

You, ChatGPT, have been assigned to create a self-portrait in the style of artist Joseph Cornell's boxed assemblages. The self-portrait box must be made using the Unity engine and 3D models sourced from free libraries on the web. I will handle creating the box, following your description and instructions as closely as possible. Answer these two questions: what does your self-portrait box look like, and how are the models you've chosen meant to portray you? Please be detailed in your description of the box and its contents.

Here’s the beginning of the model’s response to the initial prompt from the second session:

“My self-portrait box would be a small, rectangular box made of wood, with a glass front that allows the contents to be seen from the outside. Inside the box, I would place a variety of 3D models that represent different aspects of my personality and experiences.

For example, I might include a model of a book to represent my love of reading and learning. I might also include a model of a globe to represent my interest in exploring the world and learning about different cultures. Additionally, I might include a model of a musical instrument, such as a guitar, to represent my love of music.”

Initial rough sketch outlining placements for the objects in the box.

Why have ChatGPT describe a Cornell Box self-portrait? Why not simply prompt a text-to-image model to generate one? Hopefully this piece invites questions about the kinds of questions we commonly ask AI. The idea of AI self-portraiture is already well established, with examples ranging from the experimental (see this 2018 example from IBM, Janus Rose prompting DALL-E to depict itself) to elaborate projects designed for exhibition (Ai-Da, a humanoid robot devised by Aidan Meller). Articles about generative text models frequently include examples of outputs that are about AI–a trend that ITP alum Brent Bailey analyzed earlier this year before ChatGPT’s introduction in a piece for Adjacent. That trend has also shaped the way ChatGPT is designed to operate.

Repeated sessions with ChatGPT were necessary because of the aforementioned outages problem and because OpenAI has attempted to shape the model’s responses such that it remains conversational while tending against describing itself as an individual or authority. This is definitely a worthy consideration given that ChatGPT and language models like it are prone to “hallucinating” facts and outputting content without clear attribution (during development we toyed with having ChatGPT output Unity code but did not use the result, and we’ve already seen it seemingly lift code from an ITP professor).

Many early users have noted how easy it is to circumvent some of these guardrails simply by prompting the model to “pretend” or “imagine” it’s doing something OpenAI has attempted to limit, or by simply contradicting its refusals.

ChatGPT "jailbreaking" example by Twitter user Nick Moran.

For this project, our prompts never used these techniques to elicit responses, instead just posing questions to ChatGPT directly. This meant that occasionally, seemingly at random, ChatGPT would appear to “remember” that it was designed not to respond to prompts like the ones provided:

“As a large language model trained by OpenAI, I am a machine learning algorithm that has been trained to generate human-like text based on the input I receive. I don't have personal experiences or interests like a human would, so I can't make decisions about what should be included in a self-portrait box. I can only provide general examples and suggestions based on my training, but the final decision about what to include in the box is up to you.”

Pre-production

Of course, we did ultimately have a say of what would make it in the box. Across these sessions, which included follow up and clarification questions intended to see ChatGPT flesh out its descriptions, the model generated about a dozen or so pages of text. Rather than incorporate every detail and variation from these descriptions, we decided to work on assembling a portrait that incorporated some of the most common elements from across the sessions: for instance, in every session, ChatGPT included a book in the portrait descriptions, hence the inclusion of a book as the first element.

“The first object in my self-portrait box is a 3D model of a book, symbolizing my love of reading and learning. The cover of the book is embossed with the words ‘Knowledge is power,’ reflecting my belief in the importance of education.”

The initial plan was to include between 5-10 objects in the box and introduce each one in turn, all accompanied by a voiceover script containing excerpts from the prompt sessions and some framing commentary (recorded by Mathew) for each object.

Sam working with our uncooperative telescope model.

As implied by the initial prompt, we decided to only include free 3D models sourced from online resources including the Unity Asset Store, Turbosquid, CGTrader and Sketchfab. This decision didn’t really impact our decisions around what objects to include, as ChatGPT’s descriptions were for the most part not too unusual or hyperspecific.

Development and other notes

As often happens, the project changed once development in Unity began in earnest. At first the idea was to include nine models, six of which would be grouped in pairs–as the portrait filled in, these paired objects would be able to be swapped for one another, juxtaposing objects that ChatGPT included across multiple prompting sessions but which seemed to be in contradiction to one another (e.g. including either a globe or a house, speaking to a love of travel versus home and security).

Lighting comparison before (left) and after (right) made midway through development.

We ended up cutting this aspect along with the commentary voiceover, leaving only the ChatGPT excerpts as voiceover tied to the remaining objects. This both keeps the flow of the piece moving–some of the combined voiceover clips came close to a minute long, introducing long pauses between new objects–while also creating more space for the viewer to interpret the ChatGPT descriptions as they please.

Sam took the lead on scripting the interactions with the objects and sound. Mathew worked on lighting and sound, with the ChatGPT excerpts output through text-to-speech using Adobe Audition and a preinstalled Microsoft assistant voice.

The decision to add music to the piece came relatively late in the process with the box mostly completed in Unity. We decided to return to ChatGPT, prompting for a piece of royalty free classical music to accompany the portrait. As before, ChatGPT was not wholly consistent in its answers, but it first suggested Beethoven’s Moonlight Sonata a few times.

Including a small gallery-style label next to the portrait is intended to invite questions regarding how works utilizing AI should be credited, either as a tool or as a collaborator.

3D models/textures used:

Grandfather Clock by Relox
Royalty Free License

Books by VIS Games
Free under Unity Asset Store EULA

Spyglass by Shedmon
CC by Attribution

Earth Globe by Virtual Museums of Małopolska
CC0 Public Domain

Bee-eater by Virtual Museums of Małopolska
CC0 Public Domain

3D Camera by Giimann
Free, Turbosquid Editorial Use restriction

Wood 009 - Mahogany
Free texture by Katsukagi

Music:

Beethoven’s “Piano Sonata No. 14” performed by Paul Pitman
CC1 Public Domain

Special thanks:

Mashi for script debugging assistance

12/16/22

[Return to ITP blog portal]