AI Terms Glossary

This glossary of AI terms was drafted by ChatGPT (GPT4), with prompts, edits, and the addition of more recent terms from me. I asked Claude 2 to check and correct the definitions.

I’ve divided them into beginner and advanced terms, so if you are well-versed in the topic, skip down to the advanced section. Are there any terms you find helpful that are missing? Let me know!

Designed by ChatGPT (GPT-4, Sept 25 version) as the prompt “Create an image of a digital library. Visualize a sleek, futuristic tablet or digital screen floating against a soft gradient background. On the screen, display a grid of glowing, holographic icons representing AI concepts. Include icons such as a brain (for AI), a gear (for algorithms), a speech bubble (for NLP), a book (for datasets), and a magnifying glass (for analysis). The overall feel should be modern, with a touch of sci-fi, emphasizing the digital and innovative nature of AI.” and created by Ideogram.ai on Sept 27, 2023.

Beginner Terms

TermDefinition
Advanced Data AnalysisA mode integrated into ChatGPT Plus (GPT-4) that can produce data analysis and visualizations. This feature was previously a plugin called Code Interpreter. It allows the user to upload files and it can perform data visualization.
AlgorithmIn machine learning, refers to a set of rules or instructions given to an AI, neural network, or other machine to help it learn on its own.
ArchitectureThe structure of a machine learning model, including the number and arrangement of layers and nodes.
Artificial Intelligence (AI)The simulation of human intelligence processes by machines, especially computer systems.
Bard (obsolete)An AI chatbot based on Google’s PaLM 2 LLM. Replaced by Gemini.
BiasWhen a machine learning model produces results that are systematically prejudiced due to inherent flaws in the training data or the model design.
Bing Chat (obsolete)Microsoft’s free chatbot that uses OpenAI’s GPT-3.5 and GPT-4 models. Replaced by Copilot.
ChatGPTAn AI language model developed by OpenAI, which uses machine learning to write human-like text based on prompts. Currently allows use of either GPT-3.5 (free) or GPT-4 (paid).
ClaudeAn LLM AI assistant created by Anthropic.
Code Interpreter (obsolete)Previous name of a plug-in available to paid ChatGPT users that was renamed Advanced Data Analysis (see above).
CopilotMicrosoft’s AI service, including a web AI chatbot, an operating system chatbot, and integrated LLM capability inside Microsoft products
Dataset ShiftWhen the data the model is working with changes or drifts over time, leading to a decrease in the model’s performance.
Deep LearningA subset of Machine Learning, it imitates the workings of the human brain in processing data for use in decision making.
Expert systemsTraditional AI systems that work based on rule-based knowledge and logic.
ExplainabilityMethods for understanding and articulating the reasons behind model behavior and predictions.
Few-Shot PromptingFew-shot prompting is when you show the model 2 or more examples in your prompt.
Fine-TuningThe process of training a pre-existing model on a new, often smaller, dataset to improve its performance on specific tasks.
Foundation ModelsModels like GPT-4 that are trained on a broad data corpus and can be fine-tuned for specific tasks.
GeminiA series of foundation LLMs from Google, also the name of their chatbot and paid service (Gemini Advanced).
Generative AIA type of AI that can create new content, it can range from text to images, music, or even video.
GPT-3.5OpenAI’s LLM that was the model used in the free version of ChatGPT.
GPT-4o, GPT-4o miniThe OpenAI language models available in ChatGPT.
HallucinationA term used in AI to describe when the model generates incorrect or imaginary content not based on evidence.
Hidden LayersThe layers in a neural network between the input and output layers that perform computations and transformations on the input data.
InferenceThe process where a machine learning model makes predictions or generates outputs based on new data.
Input LayerThe first layer of a neural network that receives the initial data the network will learn from.
Knowledge GraphA network of real-world entities (like people, places, or concepts) and their interrelations, used by AI to provide context-based answers.
Large Language Models (LLMs)These are language models that have been trained on vast amounts of text data and can generate human-like text based on the input they’re given. They can answer questions, write essays, summarize texts, translate languages, and even generate poetry.
LLAMAA series of open-source LLMs from Meta.
Machine Learning (ML)A subset of AI, Machine Learning involves the practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something.
Mixture of ExpertsAI model that uses multiple specialized sub-models, each “expert” in a specific area, and dynamically selects which ones to use for a given task, resulting in more efficient and specialized processing of complex problems
Natural Language Generation (NLG)The use of artificial intelligence programming to produce written or spoken narrative from a dataset.
Natural Language Processing (NLP)This is an AI method of communicating with an intelligent system using a natural language.
Neural NetworkInspired by the human brain, a Neural Network is a series of algorithms that attempts to recognize relationships in a set of data through a process that mimics how the human brain works.
NodesThe points of connection and computation in a neural network, similar to neurons in a human brain.
Output LayerThe final layer in a neural network that produces the results of the computation.
PaLM 2PaLM (Pathways Language Model) is an LLM from Google.
ParametersThe parts of a machine learning model that are learned from the training data, such as the weights and biases in a neural network.
Plug-insPrograms that ChatGPT Plus users can add to ChatGPT to add functionality or access to third-party services.
PretrainingThe initial phase of training a machine learning model, usually done on a large, general dataset before being fine-tuned for a specific task.
PromptThe initial input given to an AI model, to which it responds by generating output.
Prompt EngineeringThe practice of designing prompts effectively to get better and more useful outputs from AI models.
Reasoning EngineAn artificial intelligence component that simulates the human ability to reason and make decisions.
Reinforcement LearningA type of Machine Learning where an agent learns how to behave in an environment by performing actions and seeing the results.
Reinforcement Learning with Human Feedback (RLHF)A method used to fine-tune foundation models (like GPT-4) where humans evaluate the model’s outputs.
ResponseThe output generated by an AI model in response to a prompt.
Retrieval Augmented Generation (RAG)RAG retrieves relevant fragments of existing content and combines them with the user prompt to produce a more informed and accurate response.
Strong AIThis kind of AI can understand, learn, adapt, and implement knowledge from one domain into another, much like a human.
Supervised LearningA type of Machine Learning where the AI is trained using labeled data, i.e., data paired with the correct answer or outcome.
Symbolic AIThis is the traditional kind of AI, which is based on explicit symbolic representations of problems, logic, and search.
TemperatureA parameter in language models that controls the randomness of the output. Higher temperatures result in more diverse outputs, while lower values make the output more predictable.
Theory of MindThe ability to understand and attribute mental states to oneself and others, an attribute currently lacking in AI models.
The PileThe Pile is a diverse, 825GB set of English language text for training large language models (LLMs). It consists of a collection of many similar datasets, including books, websites, and other texts, providing a broad base of knowledge for models trained on it.
TokenA single unit of input to a language model. This can be a part of a word as short as one character or as long as one word.
TokenizationThe process of breaking down text into smaller pieces (tokens) that can be processed by a language model, such as words or parts of words.
TransformersA type of model architecture used in machine learning. They handle variable-sized input using the mechanism of attention, selectively focusing on parts of the input data.
Unsupervised LearningA type of Machine Learning where AI learns from unlabeled data and finds patterns and relationships therein.
Weak AIAlso known as Narrow AI, this kind of AI is designed to perform a narrow task, like voice recognition, and lacks general intelligence.
WeightsValues in a neural network that transform input data within the network’s hidden layers.
Zero-Shot LearningThe ability of a machine learning model to perform tasks or solve problems it has not been trained on.

Advanced Terms

TermDefinition
AttentionA technique used in neural networks allowing focus on specific parts of the input most relevant to the desired output.
BackpropagationAn algorithm used during the training of neural networks, which adjusts the weights of the neurons to improve the accuracy of predictions.
Bias-Variance TradeoffA fundamental problem in machine learning regarding the balance between a model’s ability to generalize from the data (bias) and its ability to capture the data’s complexity (variance).
BookCorpusA dataset consisting of 11,038 books in 16 different genres. The dataset, used often in language model training, provides diverse long-form text data.
ClassificationA type of machine learning model that predicts discrete values, used for making decisions or predictions.
Common CrawlCommon Crawl is an open repository of web crawl data that can be accessed and analyzed by anyone. The dataset includes raw web page data, metadata, and text. It’s frequently used for training language models due to its size and diversity.
Convolutional Neural Network (CNN)A class of deep learning neural networks, most commonly applied to analyzing visual imagery.
Generative Adversarial Network (GAN)A class of machine learning frameworks where two neural networks contest with each other in a game. The generative network generates predictions while the discriminative network evaluates them.
Gradient DescentAn optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent. It’s a method to optimize the performance of a neural network.
HyperparametersThese are the parameters of the learning algorithm itself, which influence the speed and quality of the learning process. They are set before training starts.
K-Means ClusteringA type of unsupervised machine learning algorithm used to group data into different clusters based on their similarities.
K-Nearest Neighbors (K-NN)A simple, flexible machine learning algorithm that uses a group of data points in close proximity (neighbors) to predict the value or class of a given data point.
Naive BayesA group of simple, fast, and efficient classification algorithms that use a common principle of assuming the features are independent of each other. It’s based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features.
Overfitting and UnderfittingOverfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Underfitting occurs when a model is too simple, unable to capture the structure in the data.
PruningRemoving redundant or less important parts of a neural network to increase efficiency without losing accuracy.
Recurrent Neural Network (RNN)A type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, or the spoken word.
RegressionA statistical method used in machine learning and data analysis that attempts to predict a continuous outcome variable (Y) based on the value of one or multiple predictor variables (X).
Sentiment AnalysisThe process of computationally determining whether a piece of writing is positive, negative, or neutral.
Support Vector Machine (SVM)A Support Vector Machine is a supervised machine learning model that uses classification algorithms for two-group classification problems.
Wikipedia DumpThis is a dataset that consists of a downloadable version of all the text in Wikipedia. Despite being narrower in scope than web crawl datasets or The Pile, it’s widely used in natural language processing and provides a useful base of factual knowledge.

Free and low-cost education for medical writers and pharma scientists

An empty classroom

Pharmaceutical professionals keep one foot in science and another foot in health authority regulation, both rapidly evolving fields that require consistent training to keep pace. In addition, new opportunities often entice scientists to widen their expertise to contribute to new areas of drug development, which may require an entirely new set of skills. At some point, you are likely to reach the limit of what your corporate or personal budget for training allows. This list is just what you are looking for: quality training that is either free or subsidized!

These courses are ones I have taken and felt were worth the time investment. This list is current as of February 2022.

Online Courses

Introduction to the Principles and Practice of Clinical Research, NIH

This course (IPPCR) is offered by the NIH to train new clinical investigators from September to July of each year. It covers all aspects of clinical trials including design, analysis, reporting, budgeting, regulations, and ethics. The textbook is available for about $88 (with the promo code they provide) and the course is free, self-paced, and entirely online. If you pass the final exam you will earn a certificate of completion. This course is a significant time commitment, there are about 40 lectures and most are 60-90 minutes, and you will need to allow additional time to read the assignments and study for the exam. I highly recommend going through Statistics on Khan Academy (see below) to prepare for the biostatistics in this course.

Regulatory Affairs Training Program, Duke University

This 6-week course offered by the Office of Regulatory Affairs and Quality (ORAQ) at Duke University is free and available online via WebEx. This course is not self-paced, there are 1-hour webcasts from 12-1 pm EST on Fridays. They take attendance and there is reading and homework.

ORAQ also offers free seminars on regulatory topics that you can join via WebEx.

The FDA and Prescription Drugs: Current Controversies in Context, HarvardX

This online course consists of 6 self-paced modules, which took me about 6 weeks (they estimate 8 weeks). It’s a great overview of the drug approval process and the history of the FDA. It’s presented by the Program on Regulation, Therapeutics, and Law at Harvard and Brigham and Women’s Hospital. It is free to audit, or $199 to get a certificate. It is self-paced, so you start anytime enrollment is open.

FDA training

The FDA offers free online resources, including the online courses below.

Khan Academy

Khan Academy offers free courses on Statistics and Probability, AP Statistics, and Health and Medicine that can keep you busy for months.

Explore a Career in Medical Writing

If you are completely new to medical writing, this course will provide a complete overview of the field. It’s available online through ed2go and also through a partnership with community colleges (I took it through Wake Community College for less than ed2go charges).

Online Medical Terminology Course, Des Moines University

When writing up adverse events, you need to learn a whole other language. If you are new to terms like pyrexia, dyspnoea, and tachycardia, take this brief online self-paced course on the Latin and Greek behind medical terminology. It takes about 2 hours, and is free if you don’t require a certificate.

NIH Plain Language Online Training

Government communications are required to be written in plain language, but some writers are unclear what plain language means. This free online self-paced course from the National Institutes of Health (NIH) will help you understand federal requirements for plain language.

Trade organizations

The American Medical Writers Association (AMWA)

Membership to AMWA will cost you $199/year ($80 for students) but offers a lot of educational perks for medical, scientific, or regulatory writers. AMWA offers a variety of paid courses, but there are some that are free for members (search for “complimentary“), and they email out a monthly free webinar (one that is usually paid) for members as well. You also get an included subscription to the AMWA Journal and all back issues online. Membership in AMWA also includes free chapter events that are a great bargain if you live near an active chapter (if you live in NC or SC, check out @AMWACarolinas on Twitter!).

NC RAF

Membership in the North Carolina Regulatory Affairs Forum is only $40 and includes 6 seminars per year on regulatory topics that are available by WebEx. They also offer a summer workshop (at an additional very reasonable fee) to prepare for the Regulatory Affairs Certification (RAC) exam.

Podcasts

Medical Writers Speak Podcast

This podcast is provided by Emma Hitt Nichols of Nascent Medical to promote her business and her 6-week course. She has many interviews with medical writers, through listening you can learn about the many career paths available to medical writers as well as many tricks of the trade.

The Effective Statistician

This podcast, sponsored by PSI, is excellent for statisticians and those who work with statisticians, and some episodes on leadership and influence are applicable to anyone in pharma.

Peter Attia Drive

Take a whirlwind tour of the past, present, and future of cancer in episode #62 of Peter Attia’s podcast. I’ve listened to this one a few times because so much interesting detail is packed into every minute of this episode.

Publications

Pharmaceutical Technology

This free magazine will keep you up to date on news, regulatory updates, and best practices for Chemistry, Manufacturing, and Controls (CMC).

Pencil Points Newsletter

This is a great monthly email newsletter for medical writers full of tips, tricks, and tools.

EMWA Regulatory Writing Basics

The European Medical Writer’s Association journal published an entire issue devoted to regulatory writing in 2014, and it’s free for non-members to read.

Whitsell Innovations

WI posts videos, infographics, and handouts on regulatory and medical writing topics.

Trilogy Medical Writing

Trilogy has several publications written by expert medical writers on a variety of topics.

Emma Hitt Nichols’ book on Freelance Medical Writing

Emma broadcasts a free webinar regularly to promote her 6-week course. The course is not free, but when I attended her webinar she provided her book for free to attendees.

Know of a great cheap or free course that I should list here? Contact me through the contact link above and let me know! Happy learning!

Cognitive Bias in Pharmaceutical Development

Image created by J. Pickett on canva.com

Scientists are well versed in experimental bias, which is why we address it by using experimental controls, masking our clinical trials, and using the scientific method to approach questions. However, how do we control for bias within our own minds? Cognitive bias refers to any number of ways that our brain prevents us from making entirely objective decisions. In an article that Harvard Business Review published in June 2011, “Before You Make That Big Decision…”, several types of cognitive bias are defined and discussed along with case studies, and a 12-step checklist to root out bias is defined.

Major decisions in pharmaceuticals are impacted by cognitive bias. When developing a product, there are a million decisions that can have a significant impact on the cost, timescale, clinical success, and eventual marketability of your product. Many of these decisions are originally made at the bench level, and may not be able to be changed without considerable additional time or expense as the project progresses through later stages of development.

For example, a formulator may demonstrate a bias for a particular type of formulation process because of previous experience and comfort, or the wish for high visibility through the use of trendy new technology, or convenience according to what equipment is on site and available. Decision makers should recognize the potential for this bias and make sure the best formulation is chosen regardless of the above factors. Once this formulation makes it into human studies, there is considerable inertia that makes change difficult, since the project team doesn’t want to delay timelines by having to repeat animal studies or bridge with additional human pharmacokinetic studies. 

Bias can be very costly to big pharma companies, but attempts to avoid bias are not without cost. Multiple layers of peer review, involving Marketing early in development where most compounds fail for other reasons, and execution of checklists also take time, but could save billions for that one “blockbuster in the rough.”

According to the article, it is nearly impossible to detect your own bias, but through learning about bias, we can better detect it in our peers and use this knowledge to better challenge decisions. For example, when performing due diligence, you must be alert for bias from the company under scrutiny, the fellow members of your team, and in how your team prioritizes and reports the findings.

Here are some types of bias from the article and how they could come up in pharma:

Self-interested Bias

This type of bias is hard to avoid. Almost every person on a project team is heavily vested in the success of their project. Part of this is due to corporate culture, which tends to reward those people who happen to be on successful projects. This bias can be minimized by shifting the focus from project success, which can be largely due to the luck of being assigned to a safe and effective compound, to excellence in contributing to the project. Another similar bias is loss aversion, a fancy business term for “fear of failure.” Pharma is understandably already risk-averse, but it is also disadvantageous to have people avoiding difficult projects, or killing projects that are a deviation from the norm without sufficient basis. If people on failing projects are rewarded for swiftly contributing to clinical evaluation and cost-effectively killing their project, there is less motivation to “succeed at all costs” or “run for the hills.”

In a similar vein, even when project members’ fates are not tied to a project outcome, a project team can fall in love with a concept after expending a lot of hard effort, which also makes an objective analysis of the product’s value difficult. In this case, it is up to the peer reviewers or due diligence team to make sure that they are getting a clear picture and not an overly positive projection based on the best subset of data.

Groupthink

Groupthink is the result of insufficient diversity on the team or strong dominant members that quash all dissent before it can be fully explored. If you have a group of scientists from similar backgrounds, who have been working together in the same field for a long time, groupthink can occur. Most Big Pharma companies indirectly solve groupthink by aggressively promoting diversity and reorganizing fairly often, so you aren’t working with the same people for more than a few years. Groupthink can be challenged head-on in peer review by considering the people making up the team- was there enough varied expertise? Were all voices heard?


“We find comfort among those who agree with us – growth among those who don’t.”

Frank A. Clark

Halo Effect

There is a whole book devoted to this type of bias. Where does it come up for pharma? In audits of suppliers and due diligence for in-sourcing, this bias can be difficult to avoid. A related bias is the saliency bias, where a previous success casts a rosy glow on a new, similar project. The halo effect can come up in decisions regarding outsourcing. If you have a company that you love and frequently use for analytical capability, that positive association may bias you to choose them for formulation work, even though it may turn out that their capabilities for formulation are insufficient. As common as this bias is, at least it is easier to spot than some other types of bias. Auditing and due-diligence teams will benefit from reminding themselves of this potential bias before visiting a favorite supplier, as tempting as a shortened visit would be.

Confirmation Bias

This bias may be the most insidious for pharma. In confirmation bias, the team generates one path forward and seeks only data to support the chosen path, disregarding all else. In drug development, each decision builds over a thousand smaller previous decisions. A common pitfall in oral formulation development is dose. Early in development, a high dose is required, so you develop a melt granulation. Later in development, when the dose has dropped to 10 mg, did the project team scale down the melt granulation, or evaluate a cheaper dry blend process?

Availability Bias

There is much scientific information to evaluate in the early stages of product development. Even still, many times you have to move forward with less info than you would like. Analytical testing is a bit like exploring a cave with a flashlight, where the light cast by the flashlight is the capability of your test. Is there anything lurking in the shadows? It’s important to do a risk assessment based on what data is missing at the time of the decision and evaluate “what ifs.” What if the drug substance supply was not an issue? What if you had another month to develop? How would the decision change? Should a contingency plan be in place in case a critical factor does change?

For example, many times your first formulation is developed while your salt program is ongoing. For now, you are assuming your compound is insoluble, but what if a soluble salt is found? How will this change your approach? Do you have a workable backup plan?

Sunk Cost Fallacy

Pharma is very susceptible to the sunk cost fallacy because it is just so expensive to develop a drug. The sunk cost fallacy is when you, for better or worse, factor in past cost/resource into a decision for the future.

Consider the simplistic hypothetical case where you have a drug that you have already spent $500 million developing. The Food and Drug Administration (FDA) then restricts your patient population, driving the market forecast from blockbuster level to only $5 million a year over a projected remaining patent life of 7 years. You have $5 million in expected future costs prior to launch. If you consider the sunk costs, this project is a loser, and you may be tempted to cut your losses and save $5 million. However, if you ignore the past money spent and focus only on the future, the return on investment is pretty good.

The sunk cost fallacy can also work in the opposite way and be a powerful companion to the self-interest bias and related biases above, also known as the “We Have to Make This Work Because We Have Already Spent Ungodly Sums on It” bias.

Bias Assessment

Considering the impact and cost of bias to Big Pharma, an organizational assessment to determine how susceptible you are to bias may be in order:

– How aware are your project teams of cognitive bias and how to recognize it? Is this awareness only at the executive level, or does it reach to your bench-level decision makers?

– How are your decisions controlled? Is there peer review? Are the peer groups involved sufficiently diverse?

– Is your corporate or departmental culture breeding bias? Are people rewarded based on only project success? Have you ever rewarded a “positive failure”? Are dissenting opinions welcomed?

– Are there physical or process factors that could create bias in your decisions? For example, scientists may have a bias toward equipment housed in the same building as their office. If ordering a new excipient requires multiple forms and a six-month auditing process, there will be a strong preference for what’s already in the warehouse.

Pharmaceutical employees weather a perfect storm of conditions that promote bias: high financial stakes, a strong scientific drive to produce successful results, considerable time pressure, and a highly regulated environment resistant to change. A pharma company that promotes awareness of bias and implements effective counter-measures at all levels of the organization can sail through this storm toward better outcomes.

References

  1. Kahneman D, Lovallo D, Sibony O. The Big Idea: Before You Make That Big Decision… https://hbr.org/2011/06/the-big-idea-before-you-make-that-big-decision. Accessed January 19, 2019.
  2. Rosenzweig, Phil. The Halo Effect: . . . and the Eight Other Business Delusions That Deceive Managers. Free Press, 2014.