- Nov 10, 2017
Last Sunday, we looked at OpenAI's latest work in which the firm trained diffusion models to generate deepfakes and subsequently achieved a new state-of-the-art in multiple image generation tasks. Today, we shift gears and focus on another big and recent development in the field of artificial intelligence—transformer models.
Transformer models came to the forefront with Google's open-source implementation of BERT. By improving on the shortcomings of RNNs and LSTMs, this deep learning architecture revolutionized the field of natural language processing and generation. We first saw the potency of such language models in the form of OpenAI's GPT-2 with 1.5 billion parameters when the language model produced news, stories, lyrics, and other pieces of text that could easily be mistaken as a piece of work by a human and not a language model. Soon after, the GPT-3—successor to the GPT-2—essentially borrowed all the best bits from its predecessor and with 175 billion parameters to back it up, produced work that sounded shockingly cohesive, sophisticated, and factually correct. Since the training dataset for this language model was basically the entire internet, we could ask it to produce pretty much anything that is publicly available in textual form on the internet. Stories, lyrics, news pieces, and conversations aside, the GPT-3 even wrote valid CSS and HTML code. The last of these, a language model's ability to write code, is what we shall be focusing on today.
In a paper, researchers test the best language models (GPT-2/3/Neo) at solving programming questions from coding interviews. Results aren't particularly groundbreaking but show potential.