Nobody’s good at everything.
That’s the attitude with which I’m wading into the discussion surrounding AI art. It’s a hotly-debated topic, and for good reason. Concerns about intellectual and artistic theft, undercutting already-under-compensated artists, and how the models source their training material are all valid points of contention in a field that’s growing faster than ethics, legislation, and philosophy can keep pace.
It’s also liberating.
As someone who is not a trained visual artist, I’ve found the technology to be extremely helpful in game development and tabletop game homebrew. A few months of experience with AI tools has shaped my opinion outside of the general discourse into one of careful optimism, with some caveats.
When I started tinkering with RenPy, the python-based visual novel creation tool behind titles like Doki Doki Literature Club and one of our Side Quests picks from late last year, Roadwarden (along with, yes, many many anime girl dating sims), I started exploring ways to create or solicit game assets. I’m a decent programmer and I enjoy writing, but I am not a visual artist.
I had some previous experience with Midjourney generating homebrew NPC, landscape, and magic-item art for both in-person and Discord-hosted Dungeons & Dragons games, so I decided to fire up the model once again for concept art and asset generation.
Midjourney, created by San-Francisco based Midjourney, Inc., is a Discord-based AI model-and-service with a focus on generating “artistic” images. Users submit English-language prompts to a bot or channel using an /imagine
Discord command, with some options for tweaking the size and style of the generated output. From there, the model generates a series of images for users to mutate or upscale.
Midjourney is also the model behind the highly contentious piece “Théâtre d'Opéra Spatial”, credited to Jason M. Allen, which drew some ire for taking first place in the 2022 Colorado State Fair’s Digital Art competition.
Yep. That AI art model.
It’s worth a detour to tackle some of the arguments surrounding the technology, before I go into how — and in what ways — it seems to actually be useful.
This is reductive, but the arguments against the technology fall broadly under one of two banners:
Generating images via AI models does not require sufficient effort on the part of prompt-writers to consider them ‘artists‘. Ergo, creators like Jason M. Allen would not be eligible for prizes like the one he claimed.
The models are trained on existing art, which can be predatory, especially set against the backdrop of a capitalist economy. Artists are potentially training their (cheaper, faster) competition by creating their own works.
1. The Question of Effort
In my experience, the first argument gets murky when you start to understand the process behind using these AI models, and it’s something I can at least partially address with an example.
Let’s say, for whatever reason, I want a frame of an arcade-style fighting game, where the two combatants are a raccoon and an iguana. I also want it to be cel-shaded, because I think that’d be cute.
My initial prompt is: a raccoon fighting an iguana in an arcade-style fighting game, cel-shaded.
Here’s what I got back:
Some elements are there, but this initial set of results is unusable. So now I enter what I call the “tweaking“ phase. This involves the subtle art — yes, I do consider it that — of learning to speak to the model. It’s an iterative process, feeding strange poetry to an opaque and sometimes very obtuse printer.
In this next attempt, I tried centering what I really want first: frame of an arcade fighting game where the two fighters are a raccoon and an iguana, in a cel-shaded art style
Now we’ve got the proper perspective, but the iguana just isn’t making an appearance, and many of the raccoons have… let’s call them “anatomical irregularities“.
You’ll also notice that some assumptions are made about the elements in the generated images; the “raccoons“ are anthropomorphized into buff, bipedal creatures, due to the association with fighting games. AI-whispering involves understanding the associations the model makes, and then attempting to circumvent or upend them.
I iterated on the prompt for about 30 minutes to see how close I could get.
Ultimately, the closest I got was this:
Closer, but still hardly a usable result.
For another great example of this process, check out Karen X Cheng’s excellent compilation video illustrating the process behind her AI-generated cover for Cosmopolitan magazine. Ultimately, they chewed through thousands of options during the creative process before settling on a winner.
Generating images using any of the existing AI image models hardly resembles the simple, text-to-art laziness that critics unfamiliar with the process imagine, but the tools still represent a compression of time and effort the magnitude of which is difficult to grasp — and easy to take for granted — for those who aren’t trained artists.
It’s a shortcut. There’s no way around that. Any single one of the images from my raccoon fighter experiment would take hours to create, and those hours would rest upon the foundation of thousands spent practicing and honing the crafts of digital art, 3d modeling, or drawing.
So from here the first question fractures into more complex ones: how much effort should it take to create art? Does it make sense to measure art in effort at all? And couldn’t trained artists leverage these tools too, to compress their time and effort? Aren’t all advances in artistic tooling, from improved paints and brushes to digital art tools, about compressing or eliminating effort?
To look at those questions from a different angle, I can draw from my other experience with AI assistance, in programming.
Several weeks ago, I began using Github’s Copilot tool in my dayjob; for those unfamiliar, the tool draws on training data to offer code suggestions.
It’s somewhat useful to lower the barrier to entry in programming, but it has a ceiling, just as the image generation models do, when it comes to specific solutions.
That ceiling is still only broken by real expertise, and there’s no shortcut for that. In a similar way, I wonder whether Midjourney or DALL-E (another image generation model) could become tools that expert artists reach for to get, say 80%, of the way to a concept, and then set aside in favor of their expertise to get that last 20% done.
Ultimately, If time and effort are measures of artistic merit, then using an AI model to generate a specific vision certainly qualifies to a point. There’s also nothing stopping digital artists from combining these new tools with their hard-earned expertise.
2. The Question of Economics
I want to preface this by saying that artists should hold ownership over their art, and that means their use in training models should be contingent upon permission and / or compensation as they dictate. That much is not in question.
To address this, Midjourney has made avenues available for artists to request that their works be removed from the model’s training data, but this “ask forgiveness not permission“ approach is not one I’ll defend beyond saying it’s the bare minimum that Midjourney, Inc. should be doing. As much as I think art is an iterative process, and that putting something on the internet should come with the assumption that it could be circulated, questions about sourcing of training data have unsatisfactory answers at the moment.
The idea that artists are training their competition — especially if their works are used in training data without their express approval — is concerning. Models like Midjourney do make visual art more accessible, but they stand on the shoulders of every artist on which they’re trained, almost always without citation.
The root of this problem is buried in the demands of capitalism. I doubt anyone would argue that making tools of art creation available to layfolk is a bad thing — but the reality of a competitive economy where art is commercialized and commodified sours that principle. In this context, the criticism surrounding AI art is just another head on the hydra of anti-automation that emerges in response to technological leaps.
The central problem isn’t AI art — it’s that our economy pits labor against progress. In the inherently competitive space of the market, no work of art is entitled to success, and keeping up with advancement is part of the competition — as it is in any field. Commercial artists may have to adopt AI-based tools to remain competitive, and I think combining these tools with their own expertise will still be lucrative when compared to AI users who can’t tweak the model’s output because they lack the requisite skillset.
None of this is to say art created without cutting-edge tools isn’t valuable, or that it isn’t exquisite work; just that it may not be commercially viable.
So What is it Good For Right Now?
Speed and flexibility.
Getting back to my original story about tinkering with RenPy: I wanted assets that I could drop into place quickly and cheaply, and I’m not a visual artist.
Enter Midjourney. Within a few hours, I’d abandoned the idea of using it for any actual production assets, but the value in rapidly prototyping things became apparent. Could I produce the exact scenes and character art I’d envisioned? No. Somewhere along the line, I’d have to enlist the help of a talented human artist to get exactly what I wanted in terms of quality and vision.
But in the meantime, Midjourney provided excellent placeholders and even iterated some new directions for the ideas I fed into it. It’s a prototyping tool, and in that niche its strengths are highlighted without encroaching on the domain of human artwork. In the context of a large project, it’s a temporary tradeoff between speed and precision.
And even in cases where AI art might suffice completely for an indie game or tabletop gaming module, I don’t see that as a bad thing.
Nobody’s good at everything — but everyone deserves to make art. AI tools make the creative process easier and more accessible, and that’s a good thing.
The other compromise that can highlight Midjourney’s capabilities is flexibility. When a player in my D&D game asks what a random shopkeeper or tavern server looks like, I can feed Midjourney a general description and get back a result quickly. I don’t particularly care what the end result looks like in specific, just that some basic criteria are met.
Moreover, in that context it’s easy to accept some of Midjourneys quirks — feet that blend into the floor, hands with questionably-numbered digits, and so on — because of the low-stakes context.
It’s also cool to just type in a title or phrase that popped into your head when ideating. It’s neat to see how, even with completely made-up concepts, names, or phrases, the model works to stitch together some understanding of the words. This is maybe my favorite way to use the service; it’s like playing a game with the model, testing how it responds to the motifs and memetics embedded in its collective psyche through training data and learned response.
AI in its current iteration is a great rapid prototyping mechanism provided the stakes are low or you have the expertise to tweak its output. It’s a tool, like the syntax highlighting of an IDE or the innumerable capabilities present in modern digital art software. They lower the skill floor, but their full potential is still only unlocked in the hands of an expert.
And when it comes to the sphere of art for art’s sake, there’s an open argument here over whether hours iterating strange poetry to a capricious, opaque model meets expectations of effort some have for the artistic process.
But then, arguing over what qualifies as art is hardly anything new.