Ethical Generative AI

A manifesto

Dec 06, 2022

There is a war going on on Twitter, Mastodon, and even in classical media. Apart from alignment-induced fear-mongering, valid criticism of practices in data collection, and other similarly grave challenges for artificial intelligence, there are more and more two camps of people out there: a camp that expects a new age of enlightenment brought forward by radical democratisation of access to creative tools based on AI, and a camp that looks at all that is lost once a computer can create media seemingly out of nothing (but practically out of everything). Fear is an appropriate reaction to what a capitalism-fuelled technological invention can do to culture. And yet the other side is correct, too, in that technological breakthroughs that change the status quo as radically as Generative AI promises to can lead to entirely new forms of living. New forms of interacting with the world and our cultural heritage.

New forms of interacting with the world and our cultural heritage, by Caspar David Friedrich, matte painting trending on artstation HQ (created with Stable Diffusion)

As a maker of a tool in this space I spend a lot of time thinking about the implications of what this new wave of AI will have. Here’s an attempt at structuring those thoughts that I’m writing for you, the reader, me, the creator, and everyone who is in my position. It’s a discussion starter and as a manifesto it is formulated sharply to have sufficient impact. It’s still just a standpoint. Hopefully one that makes you think.

Ethics in Generative AI is challenged in four areas that are all connected:

Data collection, ownership of training materials
Ownership of results
Bias in results
Sustainability

Disclaimer: Everything here concerns generative AI and does not simply apply to other applications of machine learning. We’re talking about entertainment, creating cultural artefacts, and art – don’t apply the same thinking to predictive policing. Also, in case that’s not clear – I’m neither a lawyer nor an ethicist.

Data Collection

The data that goes into a machine learning model, the training material, is defining the potential output. Digital technology created immense data collections, from museums digitising all their properties to social media aggregating voices from around the globe. Data has also become one of the key assets of the digital economy, with company values being tied to the size of their social graph. Machine learning started out in academic contexts and science has a long history of privileged access to data – and also of having infrastructure and processes to handle sensitive data. Science also has a history of being very transparent about data. Even today, the data collection part of training large models is often outsourced to universities, yet the trained model is commercialised. The fact that this is legal says something about how far legislation is lagging behind technological innovation. But even if countries are willing to legislate this area there is an ongoing arms race, especially between the US, Europe and China, about technological leadership. Lawmakers will continue to have a hard time finding the sweet spot between harnessing the power that was unleashed and protecting the rights and privacy of their citizens. Given that technological innovation will stay ahead of legal frameworks in the foreseeable future, ethical frameworks that go beyond legal requirements are needed.

To ethically collected training data, the following criteria have to be met:

Legal requirements have to be fulfilled.
Implications of licenses that were created before the advent of modern machine learning have to be taken seriously.
Training corpora have to be fully published.
Decisions about what to include in a training corpus have to be made and documented.
Mechanisms to prevent the use of your copyrighted materials for training have to be established and implemented.

Ownership of Results

A number of court cases are in preparation, and some jurisdictions have already found verdicts that regulate who owns the copyright to the output of Generative AI systems. While there will always be corner cases, it is already clear that there are some principles that will define how things shake out. The intellectual property created usually belongs to whoever contributed creativity. Most jurisdictions seem to have a hard time assigning the creative contribution to the machine. It might still depend on how exactly the human user has interacted with the system but it seems hard to universally establish that the maker of a machine learning system owns its output. A weird case is of course if the system reproduces copyrighted material it was trained on – maybe even if it reproduces copyrighted material it was not trained on. In this case there is a clear copyright violation and at the same time the implications are the same as when an author accidentally comes up with a sentence that exists: it is the author’s responsibility to produce and not to reproduce.

An ethical ownership policy thus needs to assign the intellectual property created to the human interacting with the generative AI, offering the same protection they would have when they had created the work with more established tools.

Addendum: This becomes much more complicated in the case of fine-tuning by a user, because there is a mixed authorship of the model. Writing, but also curating, texts for fine-tuning can be a creative act. The results is not a work of art, though, so it’s still not straight-forward to apply the above to a fine-tuned model. It all depends on the specific case.

Jane Austen and her values, by Caspar David Friedrich, matte painting trending on artstation HQ

Bias in Results

All machine learning models are biased. All fine-tuning leads to even more bias. Bias is a gigantic problem in some areas of machine learning – if a hiring system is racist, in predictive policing, even in face recognition – but can be mitigated in creative applications. To put it simply: if I fine-tune a language model with Jane Austen – like we have done in LAIKA – I, of course, get output that uses a 200 year old value system. I would train that model potentially to replicate that value system, e.g. to write a character that lives in that world. This is a positive case of bias because the system in question deliberately does not aim to provide a universal experience. The case is of course different for an image generation algorithm that is presented as a generic imagination machine.

To ethically handle bias, systems must:

Be transparent about their biases
Stop pursuing universality and instead target specificity
Rely on its responsible use by humans

Sustainability

Training a model for 20000 hours on a modern GPU consumes a lot of energy – in the case of Stable Diffusion v1, the equivalent of 15 people flying from London to New York. We live in an age of mass extinction of animals. As the planet is heating up, it feels more and more like an anachronism to use intentionally inefficient methods like machine learning to achieve superficial goals like entertainment. And yet what higher purpose than producing culture is there. Still, being mindful of the resources we are using will have to become more and more the norm of how we make technological, and all other, decisions. What does that imply for generative AI?

Ethical generative AI systems must:

Document the CO2-emissions during training and in production. Here’s a tool for this
Be optimised to consume as little energy as possible

15 people flying from London to New York, by Caspar David Friedrich, matte painting trending on artstation HQ (created with Stable Diffusion)

Disclaimer 2: As with everything I ever wrote, this little manifesto is – while I’m being totally serious – a provocation for you to think for yourself. I would never tell people what to do in such a direct way were it not the goal of a manifesto. Please look past the simplicity of my arguments and become conscious about the complexity of the subject.

The Space Dog Chronicles