This glossary of AI terms was drafted by ChatGPT (GPT4), with prompts, edits, and the addition of more recent terms from me. I asked Claude 2 to check and correct the definitions.
I’ve divided them into beginner and advanced terms, so if you are well-versed in the topic, skip down to the advanced section. Are there any terms you find helpful that are missing? Let me know!

Beginner Terms
| Term | Definition |
|---|---|
| Advanced Data Analysis | A mode integrated into ChatGPT Plus (GPT-4) that can produce data analysis and visualizations. This feature was previously a plugin called Code Interpreter. It allows the user to upload files and it can perform data visualization. |
| Algorithm | In machine learning, refers to a set of rules or instructions given to an AI, neural network, or other machine to help it learn on its own. |
| Architecture | The structure of a machine learning model, including the number and arrangement of layers and nodes. |
| Artificial Intelligence (AI) | The simulation of human intelligence processes by machines, especially computer systems. |
| Bard (obsolete) | An AI chatbot based on Google’s PaLM 2 LLM. Replaced by Gemini. |
| Bias | When a machine learning model produces results that are systematically prejudiced due to inherent flaws in the training data or the model design. |
| Bing Chat (obsolete) | Microsoft’s free chatbot that uses OpenAI’s GPT-3.5 and GPT-4 models. Replaced by Copilot. |
| ChatGPT | An AI language model developed by OpenAI, which uses machine learning to write human-like text based on prompts. Currently allows use of either GPT-3.5 (free) or GPT-4 (paid). |
| Claude | An LLM AI assistant created by Anthropic. |
| Code Interpreter (obsolete) | Previous name of a plug-in available to paid ChatGPT users that was renamed Advanced Data Analysis (see above). |
| Copilot | Microsoft’s AI service, including a web AI chatbot, an operating system chatbot, and integrated LLM capability inside Microsoft products |
| Dataset Shift | When the data the model is working with changes or drifts over time, leading to a decrease in the model’s performance. |
| Deep Learning | A subset of Machine Learning, it imitates the workings of the human brain in processing data for use in decision making. |
| Expert systems | Traditional AI systems that work based on rule-based knowledge and logic. |
| Explainability | Methods for understanding and articulating the reasons behind model behavior and predictions. |
| Few-Shot Prompting | Few-shot prompting is when you show the model 2 or more examples in your prompt. |
| Fine-Tuning | The process of training a pre-existing model on a new, often smaller, dataset to improve its performance on specific tasks. |
| Foundation Models | Models like GPT-4 that are trained on a broad data corpus and can be fine-tuned for specific tasks. |
| Gemini | A series of foundation LLMs from Google, also the name of their chatbot and paid service (Gemini Advanced). |
| Generative AI | A type of AI that can create new content, it can range from text to images, music, or even video. |
| GPT-3.5 | OpenAI’s LLM that was the model used in the free version of ChatGPT. |
| GPT-4o, GPT-4o mini | The OpenAI language models available in ChatGPT. |
| Hallucination | A term used in AI to describe when the model generates incorrect or imaginary content not based on evidence. |
| Hidden Layers | The layers in a neural network between the input and output layers that perform computations and transformations on the input data. |
| Inference | The process where a machine learning model makes predictions or generates outputs based on new data. |
| Input Layer | The first layer of a neural network that receives the initial data the network will learn from. |
| Knowledge Graph | A network of real-world entities (like people, places, or concepts) and their interrelations, used by AI to provide context-based answers. |
| Large Language Models (LLMs) | These are language models that have been trained on vast amounts of text data and can generate human-like text based on the input they’re given. They can answer questions, write essays, summarize texts, translate languages, and even generate poetry. |
| LLAMA | A series of open-source LLMs from Meta. |
| Machine Learning (ML) | A subset of AI, Machine Learning involves the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something. |
| Mixture of Experts | AI model that uses multiple specialized sub-models, each “expert” in a specific area, and dynamically selects which ones to use for a given task, resulting in more efficient and specialized processing of complex problems |
| Natural Language Generation (NLG) | The use of artificial intelligence programming to produce written or spoken narrative from a dataset. |
| Natural Language Processing (NLP) | This is an AI method of communicating with an intelligent system using a natural language. |
| Neural Network | Inspired by the human brain, a Neural Network is a series of algorithms that attempts to recognize relationships in a set of data through a process that mimics how the human brain works. |
| Nodes | The points of connection and computation in a neural network, similar to neurons in a human brain. |
| Output Layer | The final layer in a neural network that produces the results of the computation. |
| PaLM 2 | PaLM (Pathways Language Model) is an LLM from Google. |
| Parameters | The parts of a machine learning model that are learned from the training data, such as the weights and biases in a neural network. |
| Plug-ins | Programs that ChatGPT Plus users can add to ChatGPT to add functionality or access to third-party services. |
| Pretraining | The initial phase of training a machine learning model, usually done on a large, general dataset before being fine-tuned for a specific task. |
| Prompt | The initial input given to an AI model, to which it responds by generating output. |
| Prompt Engineering | The practice of designing prompts effectively to get better and more useful outputs from AI models. |
| Reasoning Engine | An artificial intelligence component that simulates the human ability to reason and make decisions. |
| Reinforcement Learning | A type of Machine Learning where an agent learns how to behave in an environment by performing actions and seeing the results. |
| Reinforcement Learning with Human Feedback (RLHF) | A method used to fine-tune foundation models (like GPT-4) where humans evaluate the model’s outputs. |
| Response | The output generated by an AI model in response to a prompt. |
| Retrieval Augmented Generation (RAG) | RAG retrieves relevant fragments of existing content and combines them with the user prompt to produce a more informed and accurate response. |
| Strong AI | This kind of AI can understand, learn, adapt, and implement knowledge from one domain into another, much like a human. |
| Supervised Learning | A type of Machine Learning where the AI is trained using labeled data, i.e., data paired with the correct answer or outcome. |
| Symbolic AI | This is the traditional kind of AI, which is based on explicit symbolic representations of problems, logic, and search. |
| Temperature | A parameter in language models that controls the randomness of the output. Higher temperatures result in more diverse outputs, while lower values make the output more predictable. |
| Theory of Mind | The ability to understand and attribute mental states to oneself and others, an attribute currently lacking in AI models. |
| The Pile | The Pile is a diverse, 825GB set of English language text for training large language models (LLMs). It consists of a collection of many similar datasets, including books, websites, and other texts, providing a broad base of knowledge for models trained on it. |
| Token | A single unit of input to a language model. This can be a part of a word as short as one character or as long as one word. |
| Tokenization | The process of breaking down text into smaller pieces (tokens) that can be processed by a language model, such as words or parts of words. |
| Transformers | A type of model architecture used in machine learning. They handle variable-sized input using the mechanism of attention, selectively focusing on parts of the input data. |
| Unsupervised Learning | A type of Machine Learning where AI learns from unlabeled data and finds patterns and relationships therein. |
| Weak AI | Also known as Narrow AI, this kind of AI is designed to perform a narrow task, like voice recognition, and lacks general intelligence. |
| Weights | Values in a neural network that transform input data within the network’s hidden layers. |
| Zero-Shot Learning | The ability of a machine learning model to perform tasks or solve problems it has not been trained on. |
Advanced Terms
| Term | Definition |
|---|---|
| Attention | A technique used in neural networks allowing focus on specific parts of the input most relevant to the desired output. |
| Backpropagation | An algorithm used during the training of neural networks, which adjusts the weights of the neurons to improve the accuracy of predictions. |
| Bias-Variance Tradeoff | A fundamental problem in machine learning regarding the balance between a model’s ability to generalize from the data (bias) and its ability to capture the data’s complexity (variance). |
| BookCorpus | A dataset consisting of 11,038 books in 16 different genres. The dataset, used often in language model training, provides diverse long-form text data. |
| Classification | A type of machine learning model that predicts discrete values, used for making decisions or predictions. |
| Common Crawl | Common Crawl is an open repository of web crawl data that can be accessed and analyzed by anyone. The dataset includes raw web page data, metadata, and text. It’s frequently used for training language models due to its size and diversity. |
| Convolutional Neural Network (CNN) | A class of deep learning neural networks, most commonly applied to analyzing visual imagery. |
| Generative Adversarial Network (GAN) | A class of machine learning frameworks where two neural networks contest with each other in a game. The generative network generates predictions while the discriminative network evaluates them. |
| Gradient Descent | An optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent. It’s a method to optimize the performance of a neural network. |
| Hyperparameters | These are the parameters of the learning algorithm itself, which influence the speed and quality of the learning process. They are set before training starts. |
| K-Means Clustering | A type of unsupervised machine learning algorithm used to group data into different clusters based on their similarities. |
| K-Nearest Neighbors (K-NN) | A simple, flexible machine learning algorithm that uses a group of data points in close proximity (neighbors) to predict the value or class of a given data point. |
| Naive Bayes | A group of simple, fast, and efficient classification algorithms that use a common principle of assuming the features are independent of each other. It’s based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features. |
| Overfitting and Underfitting | Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Underfitting occurs when a model is too simple, unable to capture the structure in the data. |
| Pruning | Removing redundant or less important parts of a neural network to increase efficiency without losing accuracy. |
| Recurrent Neural Network (RNN) | A type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or the spoken word. |
| Regression | A statistical method used in machine learning and data analysis that attempts to predict a continuous outcome variable (Y) based on the value of one or multiple predictor variables (X). |
| Sentiment Analysis | The process of computationally determining whether a piece of writing is positive, negative, or neutral. |
| Support Vector Machine (SVM) | A Support Vector Machine is a supervised machine learning model that uses classification algorithms for two-group classification problems. |
| Wikipedia Dump | This is a dataset that consists of a downloadable version of all the text in Wikipedia. Despite being narrower in scope than web crawl datasets or The Pile, it’s widely used in natural language processing and provides a useful base of factual knowledge. |