A look back at recent AI trends — and what 2022 might hold
2021 was an eventful year for AI. With the advent of new techniques, robust systems that can understand the relationships not only between words but words and photos, videos, and audio became possible. At the same time, policymakers — growing increasingly wary of AI’s potential harm — proposed rules aimed at mitigating the worst of AI’s effects, including discrimination.
Meanwhile, AI research labs — while signaling their adherence to “responsible AI” — rushed to commercialize their work, either under pressure from corporate parents or investors. But in a bright spot, organizations ranging from the U.S. National Institutes of Standards and Technology (NIST) to the United Nations released guidelines laying the groundwork for more explainable AI, emphasizing the need to move away from “black-box” systems in favor of those whose reasoning is transparent.
As for what 2022 might hold, the renewed focus on data engineering — designing the datasets used to train, test, and benchmark AI systems — that emerged in 2021 seems poised to remain strong. Innovations in AI accelerator hardware are another shoo-in for the year to come, as is a climb in the uptake of AI in the enterprise.
Looking back at 2021
In January, OpenAI released DALL-E and CLIP, two multimodal models that the research lab claims are “a step toward systems with [a] deeper understanding of the world.” Its name, inspired by Salvador Dalí, DALL-E was trained to generate images from simple text descriptions, while CLIP (for “Contrastive Language-Image Pre-training”) was taught to associate visual concepts with language.
DALL-E and CLIP turned out to be the first in a series of increasingly capable multimodal models in 2021. Beyond reach a few years ago, multimodal models are now being deployed in production environments, improving everything from hate speech detection to search relevancy.
Google in June introduced MUM, a multimodal model trained on a dataset of documents from the web that can transfer knowledge between different languages. MUM, which doesn’t need to be explicitly taught how to complete a task, is able to answer questions in 75 languages, including “I want to hike to Mount Fuji next fall — what should I do to prepare?” while realizing that “prepare” could encompass things like fitness as well as weather.
Not to be outdone, Nvidia recently released GauGAN2, the successor to its GauGAN model, which lets users create lifelike landscape images that don’t actually exist. Combining techniques like segmentation mapping, inpainting, and text-to-image generation, GauGAN2 can create photorealistic art from a mix of words and drawings.