There are a large number of commonly-used discriminative model classes that produce efficient, high-speed classifiers on fixed-size inputs. These include logistic regression, k-nearest neighbors, support vector machines, and gradient-boosted decision trees. Neural architectures such as convolutional neural networks (CNN) and long short-term memory (LSTM) units are often used to build reasonably-sized discriminative models for very long and varying-length inputs. For very large models, transformers—the neural component underlying the most recent advancements in AI—continue to gain popularity.
GANs are machine-learning techniques that consist of two neural networks, a generator and a discriminator. The generator generates data by shaping random noise fed to it into a target format (typically for images). On its own, it cannot assess the quality of its output. This is where a separate model, termed a discriminator, comes in.
The discriminator aims to differentiate between real data and fake data generated by the generator. The two are trained simultaneously, with the discriminator trained to differentiate real and generator data, and the generator trained to confuse the discriminator by making increasingly realistic data. As training progresses, each model becomes increasingly better at its task, resulting in the generator being able to create realistic-looking content.
Despite their unprecedented success, a persistent challenge with GANs is training them. For instance, GANs can undergo model collapse in training, in which the generator only learns to generate a small variety of samples sufficient to confuse the discriminator but not sufficient to be useful. A recent successor to GANs, having a much improved training regime, are diffusion models. In essence, diffusion models are trained to recover training data from noisy-fied versions of it. After training, diffusion may ideate entirely new images from a pure noise input. Many popular image-generation services are built on diffusion models.
Autoregressive models are the oldest of the three generative approaches described in this section, having their roots in the field of statistics rather than machine learning. Autoregressive models generate sequences of data by modeling the probability of the next element in a sequence conditioned on the prior elements. The next element is then randomly selected from this distribution, using a “temperature” parameter can nudge the results to be more deterministic or more random, and the process is repeated (much like writing). Popular neural network components for autoregressive models once again include LSTMs and transformers, the latter of which underlies the most impressive generative AI to date.
Until recently, autoregressive text models were challenging to use because they could only complete a sequence fed to it. To improve their utility, an additional alignment stage is performed. In alignment, the autoregressive model is additionally trained to prefer certain input-output pairs to others based on human feedback. In the case of generative large language models, alignment has successfully taught models how to respond to questions and commands. Alignment is typically performed using techniques from reinforcement learning.
Recent gains in generative AI have resulted from training very large generative models (100B+ of parameters) on substantial amounts of data (10TB+) and aligned using human feedback. Generative LLMs, in particular, now have sufficient knowledge and response accuracy to provide unprecedented zero-shot/few-shot accuracy, enough to supplant the use of dedicated discriminative classifiers for specific tasks.
[email protected]
Securiti, Inc.
300 Santana Row
Suite 450
San Jose, CA 95128