Tech giant Meta has open-sourced its AI tool ImageBind, a multimodal model that understands six different data types.

In a Tuesday blog post, Meta announced that its ImageBind will join the company’s other open-source products, including computer vision models like DINOv2 and the Segment Anything (SAM) model.

ImageBind stands out from other image-generating tools like Midjourney, Stable Diffusion, and DALL-E 2 due to its ability to learn simultaneously from text, audio, visual, motion sensor, thermal, and depth data.

This allows the creation of more complex environments based purely on a simple input prompt like an image or audio recording or text prompt.

The technology is a step closer to machine learning that mimics human learning, where the brain unconsciously absorbs sensory experiences to infer information about an environment.

Machines can use these links to generate fully realized scenes based on limited chunks of data. The technology opens doors in the accessibility space, generating real-time multi-media descriptions to help people with vision or hearing disabilities better perceive their immediate environments.

ImageBind to Add More Senses in the Future

Meta believes the technology will expand beyond its current six senses, eventually introducing new modalities like touch, speech, smell, and brain fMRI signals to enable richer human-centric AI models.

The technology points towards Meta’s primary goals, which are virtual reality (VR), mixed reality, and the metaverse.

For example, future headsets could construct fully realized 3D scenes with sound and movement on-the-fly, while videogame developers could use the technology to take most of the design process’s legwork.

Content creators could make immersive videos with realistic soundscapes and movement based on simple input prompts. Researchers aim to create a joint embedding space across multiple modalities without training on datasets with every different combination of modalities.

The technology can create distinctive opportunities to create animations from static images by combining them with audio prompts, such as coupling an image with an alarm clock and a rooster crowing and animating both into a video sequence.

The tech giant said that “a multimodal AI tool like ImageBind may eventually create a video of the dog with corresponding sounds, including a detailed suburban living room, the room’s temperature and the precise locations of the dog, and anyone else in the scene.”

Meta Accelerates Push into AI Products

Meta has recently accelerated its push into AI amid a wave of breakthroughs in the sector from big tech companies.

In early April, the company released the Segment Anything Model (SAM), which is capable of identifying objects within images and videos.

The tech giant has even claimed that it is a leader in AI development. “We feel very confident that … we are at the very forefront,” the company’s chief technology officer Andrew Bosworth said in a recent interview.

“We’ve been investing in artificial intelligence for over a decade, and have one of the leading research institutes in the world. We certainly have a large research organization, hundreds of people.”

Furthermore, Meta aims to commercialize its generative AI technology to improve ad effectiveness by telling advertisers what tools to create better ads for different audiences.

Read More: