At a time when every tech company is working to advance in artificial intelligence, Meta has announced its newest generative AI project: an AI art generator model named CM3Leon. The model has been touted to have a better state-of-the-art performance than all the rest in its class.
Meta Addresses Copyright Concerns
While AI tools have been praised for their ability to accelerate work and generate content, they have also received a lot of criticism concerning copyrighted work used to train them. To that end, several lawsuits have been filed by content owners claiming that their work was used without consent.
For instance, Meta and OpenAI are currently facing a lawsuit by a US comedian and author, Sarah Silverman, who alleges that the two companies used her and two other authors’ work to train AI models without their consent.
Sarah Silverman joined a class-action lawsuit against OpenAI and another against Meta accusing the companies of copyright infringement, saying they “copied and ingested” her protected work in order to train their artificial intelligence programs. https://t.co/FrE1PO5JjZ
— The New York Times (@nytimes) July 11, 2023
As such, it is important that these companies avoid infringing on copyrights with their models moving forward. Admitting this in their research paper, Meta said:
“The ethical implications of image data sourcing in the domain of text-to-image generation have been a topic of considerable debate.”
Meta’s new creation named CM3Leon, pronounced Chameleon, has been trained on licensed Shutterstock images to completely avoid copyright concerns.
Introducing Meta’s CM3Leon
The model is multimodal with text-to-image and image-to-text capabilities. This makes it one of the first generative AI models with the ability to generate captions for images which, according to Meta, establishes the foundation for future image-understanding models.
“With CM3Leon’s capabilities, image generation tools can produce more coherent imagery that better follows the input prompts. We believe CM3Leon’s strong performance across a variety of tasks is a step toward higher-fidelity image generation and understanding.”
With the rise of generative AI art tools like Stable Diffusion, DALL-E, and Midjourney, AI-generated graphics are no longer a new concept. However, the techniques Meta is using to develop CM3leon are fresh, resulting in the impressive performance that Meta claims the foundation model can achieve.
Today’s text-to-image generating systems rely heavily on the use of diffusion models, which is where Stable Diffusion gets its name. An alternative approach is used by CM3leon: a token-based transformer model.
As Meta highlights from its research, diffusion is computationally intensive, making it costly to operate and too slow making most real-time applications impractical.
Meta’s researchers have been able to leverage token-based transformer models in the creation of CM3Leon but in a way that results in a more efficient model than the diffusion model-based approach.
“CM3leon achieves state-of-the-art performance for text-to-image generation, despite being trained with five times less compute than previous transformer-based methods,” Meta researcher wrote in the blog post.
Along with using transformers, the development team applied supervised fine-tuning (SFT) to improve the quality of the images generated by the model. This training method is like what OpenAI used on ChatGPT, and Meta also utilized it to help the model grasp complex prompts, which is helpful for generative tasks.
“We have found that instruction tuning notably amplifies multi-modal model performance across various tasks such as image caption generation, visual question answering, text-based editing, and conditional image generation,” the paper states.
Meta just introduced CM3leon! a single foundation model that does both text-to-image and image-to-text generation.
The new multimodal model achieves SOTA performance for text-to-image generation with 5x the compute efficiency.
CM3leon learns from text and image data through a… pic.twitter.com/EBIvMuani9
— Lior⚡ (@AlphaSignalAI) July 14, 2023
Aside from simply generating images from text prompts, CM3Leon can also generate captions from image inputs, edit an image based on text input and answer questions concerning the image generated.
The model can also generate an image with bounding box segmentations as defined in the text prompt as well as generate an image given only a segmented mask image without text descriptions.
Notably, the CM3Leon still suffers from some of the common issues plaguing AI models including bias. Meta in its announcement cautioned that the model “can also reflect any biases present in the training data.”
It is unknown whether Meta will release the CM3Leon model for public use or integrate it into one of its products. But considering the quality of its generations as well as its efficiency, the model will probably be advanced beyond research papers and into production.
Related Articles:
- Generative AI is Already Being Used in Scams On a Massive Scale – Here’s What to Watch Out For
- OpenAI’s DALL-E AI Image Generator Now Live on Microsoft Edge, Follows Bing App Rollout
- AI-Powered Image Generator Midjourney Shuts Down Free Trial
What's the Best Crypto to Buy Now?
- B2C Listed the Top Rated Cryptocurrencies for 2023
- Get Early Access to Presales & Private Sales
- KYC Verified & Audited, Public Teams
- Most Voted for Tokens on CoinSniper
- Upcoming Listings on Exchanges, NFT Drops