Degrees of Comfort, Levels of Control
How to piece together the puzzle of procedural generation, hand-crafted content and the products of machine learning models.
I recently saw a screenshot of Elite (Braben & Bell 1984), one of the first games that gave me a sense of freedom. I didn’t know it back then, but the reason the universe felt so expansive was procedural generation. Born out of necessity, rather than by deliberate artistic choice, it allowed for an experience that wouldn’t have been possible if the game had been hand-crafted.
The Elite universe contains eight galaxies, each with 256 planets to explore. Due to the limited capabilities of 8-bit computers, these worlds are procedurally generated. A single seed number is run through a fixed algorithm the appropriate number of times and creates a sequence of numbers determining each planet's complete composition (position in the galaxy, prices of commodities, and name and local details; text strings are chosen numerically from a lookup table and assembled to produce unique descriptions, such as a planet with "carnivorous arts graduates"). This means that no extra memory is needed to store the characteristics of each planet, yet each is unique and has fixed properties. Each galaxy is also procedurally generated from the first. (Wikipedia based on The Guardian)
Out of these technical restrictions, whole genres were born. The modern rogue-like is maybe the most obvious example. The original Rogue (1980) featured permadeath1 and procedural dungeons. Modern rogue-likes are often signified by carrying something over from one run to the next. They often unlock more content over time as the player beats the game again and again. Rogue-likes excel at replayability thanks to this formula. Procedural content generation becomes a tool for making replays of the same content exciting.
Related genres have used procedural generation differently. Bethesda uses a host of procedural generation techniques in all their recent games. While Oblivion just had some procedurally generated dungeons, trees, and randomised Oblivion gate placement, Elder Scrolls 6 will supposedly feature a procedurally generated map. A lot of procedural generation tools are used during design time as opposed to play time, meaning that the assets they generate are curated by a designer.
Quite often these assets get mixed with hand-crafted ones or manually altered before making it into the game. Just like using the paint bucket tool in Photoshop can be regarded as procedural generation, we use a lot of small design automations that are of varying degrees of sophistication when creating any digital artefacts. Content-aware fill is a step towards a more clever fill operation. Generative fill (also called “deep fill”) is using machine learning for an even more “intelligent” fill operation. What is the difference between these three methods? Paint bucket fill uses only data specified explicitly by the user. They select a colour and then fill a region. Content-aware fill uses data from the image itself – indirectly provided by the user – to fill the area in question. Generative fill uses data from other images to guess what it should put into an empty space. External data is suddenly added to the mix.
The fact that the filling of the empty area is accomplished by creating the statistically most probable content for the area in question inevitably leads to a kind of mainstreaming. Its aim is to surprise as little as possible. It works best if the image is predictable and similar to other images the machine learning model has seen. Yet the user has less control than when working manually and arguably also less control than when working with techniques based on their own content.
This loss of control is the most frightening but also the most exciting element of machine learning. And it requires a lot of learning before we can harness it in different contexts. What we get in exchange for the control is a peek into the latent space. A snapshot from a statistical analysis of the training data that relates to the context we are using to query said data (the surrounding area of the hole in the image in the above case). In case of generative fill the training data is not known and can not be altered. In many other generative systems we have access to the training data and to a host of techniques for selecting what aspects of the training data to emphasise. Fine-tuning allows us to radically focus a model on a specific subset of its data and the data provided to the fine-tuning process. Suddenly we find ourselves with much more control over the expressive range of the machine learning model we’re working with2. If we then curate the results instead of using the model at runtime we find ourselves in a position where we have full control over our creative output again. More than that, we might not even use the content that we’ve generated directly but instead only use the bouncing with a model to develop our thinking, our voice, our style, or a tiny corner of our narrative.
Elite could not have been created without procedural generation – it contains too much content to generate all of it by hand. Manually editing a photo takes much longer than using an advanced fill method. Creativity profits a lot from bouncing with other minds, natural or artificial. So, in a way, there is also a tradeoff between necessary effort and loss of control that has to be navigated when choosing what generative tools to employ during an artistic process.
Control is something we don’t necessarily want in every moment of every creative process3. Video games, as an art form, generally require the designer to let go of control, handing it over to the player. Maybe that’s another reason why procedural generation is so prevalent in that medium. They are, in this regard, more related to live music than to a more fixed medium like film, books, or studio albums. There are of course techniques for maintaining more control even when a machine learning model is used in realtime. Filtering results is a simple one, alignment generally refers to methods of keeping the output within certain parameters. These techniques reduce the expressive range of the system and thereby make the output more predictable4.
It is safe to say at this point that there is no way to safely launch a video game that uses large language models on console platforms. Apart from unresolved legal issues, age-rating systems demand guarantees that a language model can not fulfil. There’s always a danger of inappropriate content slipping through. While that’s not so much of an issue in other segments of the games market – AI Dungeon launched very successfully on iOS after all and is also available on Steam – it prevents traditional game studios from seriously engaging with these new technologies, at least during runtime. But that is going to change as regulatory and legal frameworks slide into place and as people get more used to free-form interaction with computers.
But what are ethically and legally sound forms of using these sophisticated technologies that go beyond procedural generation right now? Here are three things one can do right now:
Walk through the latent space like a tourist, taking snapshots whenever one feels like it.
Explore the past by reassembling it in the now.
Make your own system that is based on models you have full control over.
Bounce with models that don’t directly work with the content but indirectly offer guidance and feedback.5
It’s a fascinating thing to compress culture into a messy interactive database and make it alive again. There are a lot of ethical issues left. How we will use these powerful new extensions of procedural generation is important. And what it means for existing professions has to be taken into account. But while we iron out how to ethically mine the data necessary to build machine learning models, we have to work on processes, aesthetics, and the formation of taste. The current results are predictably aesthetically unexciting6, a fact that is stemming from their use as well as the technologies and their makers’ choices. The processes enabled are already all the more fascinating. The next generation of creators will inevitably grow up with these tools at their fingertips – it’s time to put them to proper work and pick the right level of comfort vs. control for every situation.
Intriguingly, a concept closer related to real-world death than traditional video game death in that death means losing all progress and not just respawning.
Yes, this is what we’re doing at Write with LAIKA.
Think of guitar distortion, blues singing, expressionism, improv theatre, and synthesiser music – the interesting stuff often happens at the brink of losing control.
Needless to say, the more freedom the user has in providing the querying information (a.k.a. “prompting” in Generative AI), the easier it is for them to circumvent those guardrails.
Soon you’ll hear more from us about that!
The first cave paintings are cute but artists were not yet on the level of the Sistine Chapel.