Make-A-Video is a new artificial intelligence model created by Meta researchers. This AI is capable of creating 5-second videos, from a simple text description written by the user. Find out all you need to know about this revolutionary tool, and how to use it.
Throughout 2022, Text-to-Art” artificial intelligences have generated a lot of interest. By enabling Internet users to create images by typing text, these AIs have become a viral phenomenon on social networks.
Beyond entertainment, tools such as DALL-E, MidJourney or Stable Diffusion are used by artists, designers and even architects to boost their productivity. To find out all you need to know about Text-to-Art AI image generators or synthesizers, read our full report at this address.
Now, Meta, formerly Facebook, has unveiled the next logical step in the Text-to-Art movement an artificial intelligence capable of generate short videos from text entered by the user.
Simply write a simple text descriptionfor example “ a dog wearing a superhero costume with a red cape flying in the sky “and the AI creates a five-second video illustrating the words.
So far, the results match the description, but the image quality is not yet up to scratch. Nevertheless, this new system offers a glimpse into the future of generative artificial intelligence. It’s the next step in a technology whose evolution will be unstoppable…
AI trained on three datasets
The system training process Make-a-Video is more complicated than that of Text-to-Image AIs, since large datasets of high-quality videos coupled with text do not exist. To overcome this obstacle, Meta has combined data from three image and video datasets to train its model.
The AI was able to learn the name and appearance of objects thanks to labeled image datasets, and the video database allowed him to understand how these objects are supposed to move. The combination of these different sources has enabled Make-a-Video to learn how to generate videos from text.
How does it work?
Make-a-Video AI works in a similar way to Text-to-Image models such as Stable Diffusion. As the Meta researchers write, ” a model that has seen only text describing images is surprisingly effective at generating short videos “.
This tool uses the diffusion technique to create static, realistic images, but also learned what sequences of images in a video look like through training on datasets of video content.
Without having needed training on how these notions should be combined, AI has succeeded in combining these two techniques to generate videos.
In addition to generating videos from text, this AI can be used to transform still images into videos or create variants and extensions of existing videos.
Interaction between objects leaves much to be desired
According to Tanmay Gupta, computer vision researcher at the Allen Institute for Artificial Intelligence, the model presented by Meta is very promising. The videos shared show that the AI is capable of capture 3D shapes while the camera is running.
This system also has notions of depth and lightingand the movements are convincing. Nevertheless, this expert feels that there is still ” a lot of room for improvement, especially if these tools are to be used for video editing and professional content creation”. “.
For the moment, one of Make-A-Video’s main weaknesses remains modeling complex interactions between objects. For example, on a video generated from the text “ an artist’s brush painting on a canvas “The brush’s contact with the canvas is not realistic.
The movements may also seem strangelike a stop-motion film animated frame by frame. Corruption and artefacts give each video a surreal appearance, as if objects were fleeing. People also seem to blend together, as the AI doesn’t yet understand borders or contact effects.
In any case, this is just the beginning. Like other generative AIs, Make-a-Video will improve massively over time…
A technology reserved for GAFAM?
Even more than Text-to-Image AIs, this new kind of tool raises important ethical questions. Indeed, this model of artificial intelligence requires titanic computing power.
Text-to-Image AIs already required millions of images for training, but this is no longer the case. a single video requires hundreds of images. Consequently, only the largest technology companies such as GAFAM will be able to build such systems in the short term…
A powerful but dangerous technology
Meta promises that this technology can ” open up new opportunities for creators and artists “. However, this technology could also be used to create and propagate false information and DeepFakes. In the near future, differentiating between real and fake content on the internet could become extremely difficult…
According to synthetic media expert Henry Ajder, this new Meta model increases the potential of generative AI on a technical and creative level, but also increases the risks.
At present, ” creating factually inaccurate content that people would believe requires effort”. In contrast, “ in the future, it may be possible to create deceptive content by typing on a keyboard “.
In order to avoid the risk of bias and discrimination, the Meta researchers have filtered out offensive words and images in training datasets. However, it is almost impossible to fully filter a dataset of this size…
This is why the model is not yet available to the public. According to a Meta spokesperson, ” we will continue to explore ways to reduce the potential risks associated with this research. “.
How to use Make-a-Video
For the time being, Make-a-Video is only not yet available to the general public. In order to limit the risk of abuse, Meta prefers to reserve its tool for researchers.
This strategy had also been adopted by OpenAI for its DALL-E image generator, but this AI has just been opened up to all. We can therefore expect that Meta will open up its model within a few months..
In the meantime, you can read the paper published by the researchers at this address to learn all about Make-A-Video and view the various demonstration videos. A button also allows you to sign up for future releases of the tool.