Generative AI in explanatory film production - a model for the future?

Generative AI in explainer video production - a model for the future?
10-08-23 | 6 minutes reading time 

Jacques Alomo
AI Innovation

What you will take away from this article

How and whether AI makes the production of explainer videos more efficient
Where things are going well in the production process and where there are still problems
What role humans play in this
How we realize an AI film project together with you

For more than 15 years, we have been inspiring customers from all industries with our explainer videos. As a premium provider, we place the highest demands on transparent project management and the quality of the final film: the focus is on linguistic elegance, visual aesthetics and didactic standards.

Our efficiently designed processes enable us to implement a large number of projects in perfect quality. But in view of the continuing enthusiasm for generative AI, we asked ourselves: Can AI make us even more efficient? Faster? Maybe even both? We present the answer to these questions in this blog article.

Directly to the film produced with the help of AI

The explaniner video "Phishing " not only gets to the heart of a complex topic in two and a half minutes, but we also made full use of the new possibilities of generative AI in the conception and production of the film. (Status: August 2023)

Let's first break down the abstract production process into its essential phases:

Creative text draft: Based on the client documents or briefing, we draft the speech text.
Visual storyboard: We create the storyboard with the appropriate images for each scene of the speech text - if desired, even with a visual style specially designed for the customer.
Lively animation: Finally, we breathe life into the images and their elements, add sound and voiceover - and the explainer video is ready.

Eine mit KI generierter Illustration, sie zeigt einen jungen Mann der vor einem Laptop sitzt und im Hintergrund fliegen verschiedene Objekte die an Notizbücher und elektronische Geräte erinnern herum.

Let's be clear: it took us longer to produce our film with the help of AI than with conventional production (as of August 2023)! Most of the time was spent on the completely new illustration workflow, the design of the graphics. The other work steps were completed just as quickly or faster.

Let's take a closer look at the differences in the result:

Standard film production

A classic youknow film production with great attention to detail.

Speech text written by our experienced concept team
Drawing planning generated by concentrated motion design power
Hand-designed, minimalist graphics ensure an elegant, understated look with manageable effort
Real voice enables more voice variation
Manual animation by our motion design experts

Film production with AI support

Here, too, a lot of love went into the production, flanked by the use of smart and fast AI tools

Speech text through GPT4
Drawing planning by GPT4
Elaborate AI graphics create a "premium look"
Human-like voice generated with AI
Manual animation by our motion design experts

Our conclusion

As far as the temporal realization is concerned, the overall duration - despite the first-time experiment - was only slightly longer than that of a normal film production. The quality varied greatly in the different project phases and trades.

Text creation

Here we achieved a result that was acceptable to us in a much shorter time. The AI not only delivered text fragments, but also a solid text as a starting point. The "human fine-tuning" took only a fraction of the time normally required for text creation. However, this is only possible if the briefing for the language model is extensively prepared. 👍

CAUTION: Be sure to check where the data of the generative AI tools you use is stored, whether on American servers or in compliance with GDPR on European servers. Be sure to check this before uploading sensitive or confidential data.

Image generation

Figuratively speaking, the process was always one step ahead and two steps back. The AI is good at developing image ideas and offering alternatives in terms of style, scene and impression. This allows scenic depths to be generated for more expression in significantly less time. It also brings fresh impetus and can be an inspiring sparring partner. 👍

But the devil is in the detail:

Fine details and small sections caused difficulties for the AI. The smaller the object in the generated image, the lower the quality or probability that the object will be displayed correctly.
The initial effort required to create a style prompt can take a long time.
Many iterations are required to achieve a consistent scene representation.
Graphical or abstract representations, such as icons or interfaces, are difficult or even impossible to create, as the models cannot yet draw perfect lines and interfaces depend on the correct representation of many small elements.
The generated images must be subsequently edited in order to prepare the elements for animation. This is much more time-consuming as you are not working with vector graphics, but with rasterized graphics. there is also inconsistency in the play of light and shadow.
In addition, the AI "thinks" in images (although of course it doesn't really think), whereas experienced (motion) designers think in cinematic sequences right from the start. We are sure that it is possible to get the AI there, but this requires a (still) high investment in training the AI. The total effort therefore adds up to at least twice the original time.
New models and their combination will make more control possible in the future. Nevertheless, a new way of working is needed first. 👎

Error during image generation

Diese Illustration, die mit AI generiert wurde, zeigt eine Person mit Kopfhörern, die an einem Schreibtisch sitzt und ein Telefon hält. Der Schreibtisch ist mit einem Laptop, einer Tasse und einem Stiftehalter ausgestattet. Im Hintergrund ist eine Stadtlandschaft durch große Fenster zu sehen. Ein Fehler ist der fehlende linke Arm.

Diese mit AI generierte Illustration zeigt eine Person, die mit den Rücken zum Betrachtenden an einem Schreibtisch mit einem Laptop sitzt. Der Raum ist modern eingerichtet, mit einem Bücherregal auf der linken Seite, einer Stehlampe und einem Schrank auf der rechten Seite. Im Hintergrund sind große Fenster zu sehen, die abstrakte Formen zeigen. Der Fehler ist der Körper beziehungsweise die unnatürliche Haltung der Beine der Person.

Diese von KI generierte Illustration zeigt eine weibliche person die an einem Laptop arbeitet, die Szene wird von der Seite gezeigt. Der Laptop ist viel zu groß proportioniert und die Hände sehen unförmig aus.

Eine von KI generierte Illustration, die einen Laptop zeigt und im Hintergrund fliegen Symbole von Briefumschlägen und Texten herum. Der Laptop ist unproportional.

Diese mit AI generierte Illustration zeigt einen Mann der vor einem Laptop sitzt, neben ihm stehen Pflanzen und eine Tasse. Der Hintergrund erinnert an einen Sternenhimmel. Der Fehler ist, dass der Mann dem Betrachtenden den Rücken zuwendet.

Diese mit AI generierte Illustration zeigt eine Person, die vor einem Computerbildschirm sitzt und einen Finger auf den Bildschirm richtet. Der Schreibtisch ist mit einer Tastatur, einer Maus und einem Becher ausgestattet. Im Hintergrund sind zahlreiche schwebende Briefumschlagsymbole zu sehen, die E-Mails darstellen. Der Fehler ist dass die Person zwischen Tastatur und Bildschirm steht und nicht am Schreibtisch im Vordergrund steht.

Diese Illustration, die mit AI generiert wurde, zeigt eine beziehungsweise zwei Personen in einem Büro, die an einem Schreibtisch arbeiten. Die Person hält ein Telefon und einen Stift, während sie auf ein Dokument schaut. Die Szene ist in Blau- und Orangetönen gehalten, was eine Abendstimmung suggeriert. Im Hintergrund sind Fenster mit einer Stadtlandschaft zu sehen. Ein Fehler im Bild ist, dass die Personen die eigentlich nur eine sein sollte, ineinander verschmolzen sind.

Animation

The figures had to be animated using the so-called "Puppet Tool". This involves adding dots to the figure's image, which can then be moved, distorting the image accordingly. The movements are very limited and are restricted to slight head and arm movements, for example. Figures in our films are usually divided into individual components (e.g. eyes, head, arms, legs, etc.) during the illustration process. AI-supported image creation makes such a division even more complex, as the figures would have to be divided up afterwards. The Puppet Tool is therefore more time-saving and can depict simple movements well.
Effect stock material was used to breathe additional life into the images. Sun reflections, dust particles and digital noise were used. The challenge here was to match the material to the AI-generated images.
The majority of the scenes were realized with the help of so-called "2.5D tracking shots". This involves fanning out the flat 2D elements in a three-dimensional space and filming them with an artificial camera. This creates a spatial feeling that allows us to generate dynamism and tension despite the static image elements. We already use this technique from time to time in our current films. However, the individual elements there are less static than in our AI film. The camera movements are ultimately intended to conceal the fact that not too much is moving in the picture.
The images were created in 4K resolution in order to ensure that the "2.5D tracking shots" described above always have a high level of sharpness. The resulting larger data volumes slowed down the work in the animation program somewhat. The final playout of the film also took about 5 times longer than usual. Our current films are created in Full HD, as are the graphics used for them.

The overall animation effort was roughly the same as that of an animade explainer video. The stock footage and the many tracking shots make the film surprisingly lively and dynamic. The animation focused more on enhancing the AI images rather than animating the various elements of the picture in a detailed way. As long as the characters don't have to perform complex movements and the images are always kept scenic, this type of production offers a good alternative to our previous workflow.

Overall result (as at: August 2023)

It is obvious that our proven processes cannot simply be replaced by AI to be more efficient and simpler in production. The use of AI can offer many advantages, but requires a different approach and adjustments to the overall workflow. Text-based elements can already be designed more efficiently, but when it comes to image generation and animation in particular, there are currently no good solutions on the market that enable a real economy of scale. It is important to remain vigilant here and to constantly test new models and technologies and integrate them into workflows. We assume that it will only be a few months before AI can also deliver first-class results here. However, there are also limits to the flexibility of AI. The creative freedom and control over results are less than with manual "creation", but the results can be unique!

Please note: It is clear to us that humans will continue to play the central role when it comes to evaluating the results of AI, selecting the best results, linking them together and processing them. This requires someone with the necessary vision and experience to carefully curate the results.

AI vs. standard - the results in comparison

Standard film production

Film production with AI support

Good to know

This experiment was conducted by a professional explainer video production team. I, Jacques Alomo, Head of AI Innovation at youknow & Founder of creamlabs AI, have been working in film production myself for a long time and have been involved with generative AI from a very early stage. This combination made it possible for us to quickly gather insights through experimentation and make meaningful improvements in each iteration. Whether for text, images or moving images: The right prompt is ultimately the key to success. Quick adaptations require a deep understanding of common AI models, how they work and how to use them.

We keep the ball rolling and continue our experiments. Would you like to join us?You are welcome to contact us by e-mail. We are planning to delve deeper into this topic with three interested parties as part of further AI-supported productions and will reward you with a generous discount!

A final treat 🎧😂🙀: We used ElevenLabs (text-to-speech software) for the voice-over (as of August 2023). We had different voices "repeat" a sentence from the spoken text - a kind of digital casting - and then decided on a voice avatar. Once we had fed in the entire text, we came up with this bizarre result:

AI tools used for production:

ChatGPT via www.creamai.de
Stable Diffusion XL via www.creamai.de
Image upscaling via www.creamai.de with model "realesrgan-x4plus-anime"
ElevenLabs www.elevenlabs.io

Transform learning into a success driver for your organization
You have the goals, we have the solutions – together we will find the best one for you.

Contact form

+49 89 30 66 880-0