AI Glossary

The global Artificial Intelligence (AI) market size is expected to gain momentum by reaching USD 360.36 billion by 2028 while exhibiting a CAGR of 33.6% between 2021 to 2028. Artificial intelligence is impacting the future of virtually every industry and every human being. Here is the glossary of AI, trending AI terms, definitions and frameworks that would serve as a guide for beginners, AI Aspirants and Data experts. Any terms you think we should add? Please Let us know.

“Artificial intelligence is one of the most profound things we're working on as humanity. It is more profound than fire or electricity.”
—Sundar Pichai, 2020


Algorithm: A set of rules that a machine can follow to learn how to do a task.

Artificial IntelligenceArtificial Intelligence is a computational system that simulates parts of human intelligence but focuses on one narrow task. Also called narrow AI, in contrast to AGI.

This refers to the general concept of machines acting in a way that simulates or mimics human intelligence. AI can have a variety of features, such as human-like communication or decision making.

Accuracy – Refers to the percentage of correct predictions the classifier made.

Autonomous: A machine is described as autonomous if it can perform its task or tasks without needing human intervention.

Adversarial Machine Learning – A research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection, and biometric recognition.

Adversarial Example – A very specific transformation of an image, typically featuring very small, deliberate changes to an image that can completely disrupt a previously tuned classifier.

Application Programming Interface (API) – A set of commands, functions, protocols, and objects that programmers can use to create software or interact with an external system.

Artificial General Intelligence (AGI) – AGI is a computational system that can perform any intellectual task a human can. Also called “Strong AI.” At this point, AGI is fictional.

Artificial Neural Network – A model for AI and machine learning inspired by the neural network configurations of the human central nervous system, especially the brain.

A/B Testing – A controlled, real-life experiment designed to compare two variants of a system or a model, A and B.

Activation Function – In the context of Artificial Neural Networks, a function that takes in the weighted sum of all of the inputs from the previous layer and generates an output value to ignite the next layer.

Active Learning (Active Learning Strategy) – A special case of Semi-Supervised Machine Learning in which a learning agent is able to interactively query an oracle (usually, a human annotator) to obtain labels at new data points.

Annotation – A metadatum attached to a piece of data, typically provided by a human annotator.

Area Under the Curve (AUC) – A methodology used in Machine Learning to determine which one of several used models have the highest performance.

Association Rule Learning – A rule-based Machine Learning method for discovering interesting relations between variables in large data sets.

Auto encoder – A type of Artificial Neural Network used to produce efficient representations of data in an unsupervised and non-linear manner, typically to reduce dimensionality.

Automated Speech Recognition – A subfield of Computational Linguistics interested in methods that enables the recognition and translation of spoken language into text by computers.


Brute Force Search – A search that isn’t limited by clustering/ approximations; it searches across all inputs. Often more time-consuming and expensive but more thorough.

Backward chaining: A method where the model starts with the desired output and works in reverse to find data that might support it.

Bias: Assumptions made by a model that simplify the process of learning to do its assigned task. Most supervised machine learning models perform better with low bias, as these assumptions can negatively affect results.

Big data: Big data is Datasets that are too large or complex to be used by traditional data processing applications.

Bounding box: Commonly used in image or video tagging, this is an imaginary box drawn on visual information. The contents of the box are labeled to help a model recognize it as a distinct type of object.

Backpropagation (Backpropagation Through Time) – A method used to train Artificial Neural Networks to compute a gradient that is needed in the calculation of the network’s weights.

Batch – The set of examples used in one gradient update of model training.

Bayes’s Theorem – A famous theorem used by statisticians to describe the probability of an event based on prior knowledge of conditions that might be related to an occurrence.

Boosting – A Machine Learning ensemble meta-algorithm for primarily reducing bias and variance in supervised learning, and a family of Machine Learning algorithms that convert weak learners to strong ones.

Bounding Box – The smallest (rectangular) box fully containing a set of points or an object.

Read also


Chatbot: A chatbot is program that is designed to communicate with people through text or voice commands in a way that mimics human-to-human conversation.

Cognitive computing: This is effectively another way to say artificial intelligence. It’s used by marketing teams at some companies to avoid the science fiction aura that sometimes surrounds AI.

Computational learning theory: A field within artificial intelligence that is primarily concerned with creating and analyzing machine learning algorithms.

Corpus: A large dataset of written or spoken material that can be used to train a machine to perform linguistic tasks.

Content Moderation – The practice of monitoring and applying a predetermined set of rules and guidelines, especially to user-generated submissions, to determine best if the communication of the input is permissible.

Convolutional Neural Network – Convolutional neural networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes.

CPU (Central Processing Unit) – The electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control, and input/output (I/O) operations specified by the instructions.

Custom Model – A small artificial neural network which takes inputs particular to a user, such as images or videos of their products, and returns predicted concepts, based on what the model is trained to see in the inputs.

Custom Training – The process of teaching a model to make certain predictions.

Classification – The task of approximating a mapping function from input variables to discrete output variables, or, by extension, a class of Machine Learning algorithms that determine the classes to which specific instances belong.

Clustering – In Machine Learning, the unsupervised task of grouping a set of objects so that objects within the same group (called a cluster) are more “similar” to each other than they are to those in other groups.

Cold-Start – A potential issue arising from the fact that a system cannot infer anything for users or items for which it has not gathered a sufficient amount of information yet.

Collaborative Filtering – A method used in the context of recommender systems to make predictions about the interests of a user by collecting preferences from a larger group of users.

Computer Vision – The field of Machine Learning that studies how to gain high-level understanding from images or videos.

Confidence Interval – A type of interval estimate that is likely to contain the true value of an unknown population parameter. The interval is associated with a confidence level that quantifies the level of confidence of this parameter being in the interval.

Contributor – A human worker providing annotations on the Appen data annotation platform.

Convolutional Neural Network (CNN) – A class of Deep, Feed-Forward Artificial Neural Networks, often used in Computer Vision.

Central Processing Unit (CPU) – The electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output operations specified by the instructions.

Cross-Validation (k-fold Cross-Validation, Leave-p-out Cross-Validation) – A collection of processes designed to evaluate how the results of a predictive model will generalize to new data sets.

– k-fold Cross-Validation

– Leave-p-out Cross-Validation


Data (Structured Data, Unstructured Data, Data augmentation) – Any collection of information converted into a digital form.

The most essential ingredient to all Machine Learning and Artificial Intelligence projects.

Unstructured Data: raw, unprocessed data. Textual data is a perfect example of unstructured data because it is not formatted into specific features.

Structured Data: data processed in a way that it becomes ingestible by a Machine Learning algorithm and, if in the case of Supervised Machine Learning, labeled data; data after it has been processed on the data annotation platform.

Data Augmentation: the process of adding new information derived from both internal and external sources to a data set, typically through annotation.

Data science: Drawing from statistics, computer science and information science, this interdisciplinary field aims to use a variety of scientific methods, processes and systems to solve problems involving data.

Dataset: A collection of related data points, usually with a uniform order and tags.

Data Mining – The process by which patterns are discovered within large sets of data with the goal of extracting useful information from it.

Deep LearningDeep learning is the general term for to machine learning using layered (or deep) algorithms to learn patterns in data. It is most often used for supervised learning problems.

Deep Neural Network – An artificial neural network (ANN) with multiple layers between the input and output layers. It uses sophisticated mathematical modelling to process data in complex ways.

Detection – To discover an event or object.

Domain Adaptation – Learning a discriminative classifier or other predictor in the presence of a shift between training and test distributions.

Decision Tree – A category of Supervised Machine Learning algorithms where the data is iteratively split in respect to a given parameter or criteria.

Deep Blue – A chess-playing computer developed by IBM, better known for being the first computer chess-playing system to win both a chess game and a chess match against a reigning world champion under regular time controls.


Explorer – A web application that allows you to preview applications.

Entity annotation: The process of labeling unstructured sentences with information so that a machine can read them. This could involve labeling all people, organizations and locations in a document, for example.

Entity extraction: An umbrella term referring to the process of adding structure to data so that a machine can read it. Entity extraction may be done by humans or by a machine learning model.

Embedding (Word Embedding) – One instance of some mathematical structure contained within another instance, such as a group that is a subgroup.

Ensemble Methods – In Statistics and Machine Learning, ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models but typically allows for a much more flexible structure to exist among those alternatives.

Entropy – The average amount of information conveyed by a stochastic source of data.

Epoch – In the context of training Deep Learning models, one pass of the full training data set.


Feature (Feature Selection, Feature Learning) – A variable that is used as an input to a model.

Feature Learning – An ensemble of techniques meant to automatically discover the representations needed for feature detection or classification from raw data.

False Positive – An error due to the fact a result did reject the null hypothesis when it shouldn’t have.

False Negative – An error due to the fact a result did not reject the null hypothesis when it should have.

Feed-Forward (Neural) Networks – An Artificial Neural Network wherein connections between the neurons do not go backward or form a cycle.

F Score – A weighted average of the true positive rate of recall and precision.

Facial Recognition – A computer application capable of identifying or verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image and a face database.

False Negatives – An error where a model falsely predicts an input as not having a desired outcome, when one is actually present. (Actual Yes, Predicted No).

False Positives – An error where a model falsely predicts the presence of the desired outcome in an input, when in reality it is not present (Actual No, Predicted Yes).

Feature Extraction

1) When image features at various levels of complexity are extracted from the image data. Typical examples of such features are:

Lines, edges, and ridges.

Localized interest points such as corners, blobs, or points.

More complex features may be related to texture, shape, or motion.

2) The process by which data that is too large to be processed is transformed into a reduced representation set of features such as texture, shape, lines, and edges.


General AI: AI that could successfully do any intellectual task that can be done by any human being. This is sometimes referred to as strong AI, although they aren’t entirely equivalent terms.

Garbage In, Garbage Out – A principle stating that whenever the input data is flawed, it will lead to misleading results and produces nonsensical output, a.k.a. “garbage”.

General Data Protection Regulation (GDPR) – A regulation in EU law on data protection and privacy for all individuals within the European Union aiming to give control to citizens and residents over their personal data.

Genetic Algorithm – A search heuristic inspired by the Theory of Evolution that reflects the process of natural selection where the fittest individuals are selected to produce offspring of the following generation.

Generative Adversarial Networks (GANs) – A class of artificial intelligence algorithms used in unsupervised machine learning, implemented by a system of two neural networks contesting with each other in a zero-sum game framework. This technique can generate photographs that look at least superficially authentic to human observers, having many realistic characteristics (though in tests people can tell real from generated in many cases).

GPU (Graphics Processing Unit) – A specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, game consoles.

Ground Truth – A piece of information obtained through direct observation as opposed to inference.


Hyper parameter: Occasionally used interchangeably with parameter, although the terms have some subtle differences. Hyperparameters are values that affect the way your model learns. They are usually set manually outside the model.

Human-in-the-Loop – Human-in-the-loop (HITL) is a branch of artificial intelligence that leverages both human and machine intelligence to create machine learning models. In a traditional human-in-the-loop approach, people are involved in a virtuous circle where they train, tune, and test a particular algorithm.

Human Workforce (“Labelers”) – Workers who can help to complete work on an as-needed basis, which for purposes usually means labelling data (images).


Intent: Commonly used in training data for chatbots and other natural language processing tasks, this is a type of label that defines the purpose or goal of what is said. For example, the intent for the phrase “turn the volume down” could be “decrease volume”.

ImageNet – A large visual dataset made of 14 million URLs of hand-annotated images organized in twenty-thousand (20,000) different categories, designed for use in visual object recognition research

Image Recognition – The ability of software to identify objects, places, people, writing, and actions in images.

Image Segmentation – The process of dividing a digital image into multiple segments/fragments, with the goal of simplifying or changing the representation of an image into something that is easier to analyze. Segmentation divides whole images into pixel groupings, which can then be labelled and classified. Put simply, segmentation is to put a bounding box around the desired object in an image and do a pixel-by-pixel outline of that object, removing the background.

ImageNet – A large visual database designed for use in visual object recognition software research. Over 14 million URLs of images have been hand-annotated by ImageNet to indicate what objects are pictured; in at least one million of the images, bounding boxes are also provided.

ImageNet Challenge – A competition where research teams evaluate their algorithms on the given data set and compete to achieve higher accuracy on several visual recognition tasks.

Input – Any form of data – text, audio, code, music notation, essentially anything that can be encoded digitally.

Inference – The process of making predictions by applying a trained model to new, unlabeled instances.

Information Retrieval – The area of Computer Science studying the process of searching for information in a document, searching for documents themselves, and also searching for metadata that describes data, and for databases of texts, images or sounds.


Kaggle Kaggle is a data science platform to host data analysis competitions launched by companies and users.

Knowledge-Based Systems – It is a computer system that uses knowledge to solve a problem or support a decision. A knowledge-based system has three types of subsystems: a knowledge base, a user interface, and an inference engine.


Layer (Hidden Layer) – A series of neurons in an Artificial Neural Network that process a set of input features, or, by extension, the output of those neurons. Hidden Layer: a layer of neurons whose outputs are connected to the inputs of other neurons, therefore not directly visible as a network output.

Learning-to-Learn – A new direction within the field of Machine Learning investigating how algorithms can change the way they generalize by analyzing their own learning process and improving on it.

Learning-to-Rank – The application of Machine Learning to the construction of ranking models for Information Retrieval systems.

Learning Rate – A scalar value used by the gradient descent algorithm at each iteration of the training phase of an Artificial Neural Network to multiply with the gradient.

Logit Function – The inverse of the sigmoidal “logistic” function used in mathematics, especially in statistics.

Long Short-Term Memory Networks – A variation of Recurrent Neural Network proposed as a solution to the vanishing gradient problem.


Machine Learning (ML) – A general term for algorithms that can learn patterns from existing data and use these patterns to make predictions or decisions with new data.

Machine intelligence: An umbrella term for various types of learning algorithms, including machine learning and deep learning.

Also see: 
Top datasets to actualize machine learning and data training tutorial
How AI and Machine Learning Will Affect Machining
What Is Machine Learning and Where to Find the Best Courses?
Guide To Unsupervised Machine Learning: Use Cases 
What Are Transformer Models In Machine Learning
Difference between Machine learning and Artificial Intelligence
Machine Learning Models in Production

Misclassification Rate – Rate used to gauge how often a model’s predictions are wrong.

Machine translation: The translation of text by an algorithm, independent of any human involvement.

Model – A processing block that takes inputs, such as images or videos, and returns predicted concepts.

Machine Learning Lifecycle Management – DevOps for Machine Learning systems.

Monte Carlo – An approximate methodology that uses repeated random sampling in order to generate synthetic simulated data.

Multi-Modal Learning – A subfield of Machine Learning aiming to interpret multimodal signals together and build models that can process and relate information from multiple types of data.

Multi-Task Learning – A subfield of Machine Learning that exploits similarities and differences across tasks in order to solve multiple tasks are at the same time.


Neural network: Also called a neural net, a neural network is a computer system designed to function like the human brain. Although researchers are still working on creating a machine model of the human brain, existing neural networks can perform many tasks involving speech, vision and board game strategy.

Natural Language Processing (NLP) – A branch of artificial intelligence that helps computers understand, interpret, and manipulate human language. This field of study focuses on helping machines to better understand human language in order to improve human-computer interfaces with use cases like moderation, information extraction, summarization, etc.

Natural language generation (NLG): This refers to the process by which a machine turns structured data into text or speech that humans can understand. Essentially, NLG is concerned with what a machine writes or says as the end part of the communication process.

Natural language understanding (NLU): As a subset of natural language processing, natural language understanding deals with helping machines to recognize the intended meaning of language — taking into account its subtle nuances and any grammatical errors.

Naive Bayes – A family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independence assumptions between the features.

Named Entity Recognition – A subtask of Information Extraction that seeks to identify and classify named entities in text into predetermined categories such as the names, locations, parts-of-speech, etc.

Noise – Signals with no causal relation to the target function.

Neuron – A unit in an Artificial Neural Network processing multiple input values to generate a single output value.


Not Suitable for Work (NSFW) – Shorthand tag used to mark certain content as being profane, offensive, and/ or otherwise potentially disturbing, which a platform may not wish to have posted on their site or may want to mark as mature.

Null Error Rate – How often one would be wrong if one always predicted the majority prediction. (e.g. if you make 100 predictions, 60 “yes” and 40 “no”, the null error rate would be 40/100=0.40 because if you always predicted yes, you would only be wrong for the 40 “no” cases).


Object Detection – A computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. This technique also involves localizing the object in question, which differentiates it from classification, which only tells the type of object.

Optimization – The selection of the best element (with regard to some criterion) from some set of available alternatives.

Object Recognition (or Object Classification) – A computer vision technique for identifying objects in images or videos.

Object Tracking – The process of following a specific object of interest, or multiple objects, in a given scene. It traditionally has applications in video and real-world interactions where observations are made following an initial object detection.

One Shot Classification – A model that only requires that you have one training example of each class you want to predict on. The model is still trained on several instances, but they only have to be in a similar domain as your training example.

On-premises Software – Software that is installed and runs on computers located on the premises of the organization using that software versus at a remote facility such as a server farm or on the cloud.

Optical Character Recognition (OCR) – A computer system that takes images of typed, handwritten, or printed text and converts them into machine-readable text.

Output – Predictions made after the input uploaded to or fed into a model are processed by the model.

Overfitting – A machine learning problem where an algorithm is unable to discern information that is relevant to its assigned task from information which is irrelevant within training data. Overfitting inhibits the algorithm’s predictive performance when dealing with new data.

OpenAIOpen AI is a nonprofit artificial intelligence research company (founded in December 2015 by partners including Elon Musk) that aims to promote and develop friendly AI in such a way as to benefit humanity as a whole. The organization aims to “freely collaborate” with other institutions and researchers by making its patents and research open to the public


Parameter – Any characteristic that can be used to help define or classify a system. In AI, they are used to clarify exactly what an algorithm should be seeking to identify as important data when performing its target function.

Pattern recognition: The distinction between pattern recognition and machine learning is often blurry, but this field is basically concerned with finding trends and patterns in data.

Predictive analytics: By combining data mining and machine learning, this type of analytics is built to forecast what will happen within a given timeframe based on historical data and trends.

Precision (Recognition) – A rate that measures how often a model is correct when it predicts ‘yes.’

Predictive Model – A model that uses observations measured in a sample to gauge the probability that a different sample or remainder of the population will exhibit the same behaviour or have the same outcome.

Pooling (Max Pooling) – The process of reducing a matrix generated by a convolutional layer to a smaller matrix.

Personally Identifiable Information – Any piece of information that can be used on its own or in combination with some other information in order to identify a particular individual.

Positive Predictive Value (PPV) – Very similar to precision, except that it takes prevalence into account. In the case where the classes are perfectly balanced (meaning the prevalence is 50%), the positive predictive value is equivalent to precision.

Prevalence – The rate of how often the “yes” condition actually occurs in a sample.

Python Python is an interpreted high-level programming language for general-purpose programming.

Prediction – The inferred output of a trained model provided with an input instance.

Preprocessing – The process of transforming raw data into a more understandable format.

Pre-trained Model – A model, or the component of a model, that have been preliminary trained, generally using another data set. See also: Transfer Learning.

Principal Component Analysis – A process that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables called principal components.

Prior – The probability distribution that would represent the preexisting beliefs about a specific quantity before new evidence is considered.


Recall (Sensitivity) – The fraction of relevant instances that have been retrieved over the total amount of relevant instances.

Recurrent Neural Network – A type of artificial network with loops in them, allowing recorded information, like data and outcomes, to persist by being passed from one step of the network to the next. They can be thought of as multiple copies of the same network with each passing information to its successor.

Regression – A statistical measure used to determine the strength of the relationships between dependent and independent variables.

Reinforcement Learning – A type of machine learning in which machines are “taught” to achieve their target function through a process of experimentation and reward receiving positive reinforcement when its processes produce the desired result and negative reinforcement when they do not. This is differentiated from supervised learning, which would require an annotation for every individual action the algorithm would take.

ROC (Receiver Operating Characteristic) Curve – This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class.

Random Forest  – An ensemble learning method that operates by constructing a multitude of decision trees at training time and outputting a combined version (such as the mean or the mode) of the results of each individual trees.

Rectified Linear Unit – A unit employing the rectifier function as an activation function.

Regressor – A feature, an explanatory variable used as an input to a model.

Regularization – The process of introducing additional information in order to prevent overfitting.

Reproducibility (crisis of) – A methodological crisis in science in which scholars have found that the results of many scientific studies are difficult or impossible to replicate or reproduce on subsequent investigation, either by independent researchers or by the original researchers themselves.

Restricted Boltzmann Machines – A restricted Boltzmann machine (RBM) is a generative stochastic artificial neural network that can learn a probability distribution over its set of inputs.


Semantic annotation: Tagging different search queries or products with the goal of improving the relevance of a search engine.

Sentiment analysis: The process of identifying and categorizing opinions in a piece of text, often with the goal of determining the writer’s attitude towards something.

Strong AI: This field of research is focused on developing AI that is equal to the human mind when it comes to ability. General AI is a similar term often used interchangeably.

Search Query – A query that a user feeds into a search engine to satisfy his or her information needs. If the query itself is a piece of visual content, then that is what is known as a “visual search query.”

Selective Filtering – When a model ignores “noise” to focus on valuable information.

Siamese Networks – A different way of classifying image where instead of training one model to learn to classify image inputs it trains two neural network that learns simultaneously to find similarity between images.

Signal – Inputs, information, data.

Software Development Kit (SDK) – A set of software development tools that allows for the creation of applications on a specific platform.

Specificity – The rate of how often a model predicts “no,” when it’s actually “no.”

Standard Classification – The process by which an input is assigned to one of a fixed set of categories. In machine learning, this is often achieved by learning a function that maps an input to a score for each potential category.

Supervised Learning

1) A type of machine learning in which human input and supervision are an integral part of the machine learning process on an ongoing basis. In supervised learning, there is a clear outcome to the machine’s data mining and its target function is to achieve this outcome, nothing more.

2) A class of machine learning algorithms that learn patterns from outcome data. Supervised learning algorithms make predictions based on a set of examples.

Support Vector Machines (SVM) – A class of discriminative classifiers formally defined by a separating hyperplane, where for each provided labeled training data point, the algorithm outputs an optimal hyperplane which categorizes new examples.

Synthetic Data – Data generated artificially when real data cannot be collected in sufficient amounts, or when original data doesn’t meet certain requirements.

Statistical Distribution – In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.


Target Function – The end goal of an algorithm.

Taxonomy – The formal structure of all the types of objects within a particular domain. They can follow either a flat or hierarchical format and provide names for each object in relation to the other objects, often capturing the membership properties of each. There are usually specific, complete, consistent, and definitive rules for classifying all objects in the domain. This ensures any newly discovered object fits into one and only one category of the structure.

TensorFlow – An open-source software library also used for machine learning applications such as neural networks. It is used for both research and production at Google and was released under the Apache 2.0 open source license in 2015.

Time Series (Time Series Data) – A sequence of data points recorded at specific times and indexed accordingly to their order of occurrence.

Topic Modeling – A category of Unsupervised Machine Learning algorithms that uses clustering to find hidden structures in textual data, and interpret them as topics.

Test Data Set – In machine learning, the test data set is the data given to the machine after the training and validation phases have been completed. This data set is used to check the performance characteristics of the algorithms produced after the completion of the first two phases when presented with unknown data. This will give a good indication of the accuracy, sensitivity, and specificity of the algorithm’s predictive powers.

Torch – A scientific computing framework with wide support for machine learning algorithms, written in C and lua. The main author is Ronan Collobert, and it is now used at Facebook AI Research and Twitter.

Training Data Set – In machine learning, the training data set is the data given to the machine during the initial “learning” or “training” phase. From this data set the machine is meant to gain some insight into options for the efficient completion of its assigned task through identifying relationships between the data.

True Negatives – Actual negatives that are correctly identified as such (Actual No, Predicted No).

True Positives – Actual positives that are correctly identified as such (Actual Yes, Predicted Yes).

Turing Test – A test developed by Alan Turing 1950, used to identify true artificial intelligence. It tested a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human.

Transfer learning: This method of learning involves spending time teaching a machine to do a related task, then allowing it to return to its original work with improved accuracy. One potential example of this is taking a model that analyzes sentiment in product reviews and asking it to analyze tweets for a week.


Uncertainty – A range of values likely to enclose the true value.

Underfitting – The fact that a Machine Learning algorithm fails to capture the underlying structure of the data properly, typically because the model is either not sophisticated enough, or not appropriate for the task at hand; opposite of Overfitting.

Unsupervised Learning – A class of machine learning algorithms that learns patterns in data without knowing outcomes. Here, the machine is presented with totally unlabelled data, then asked to find the intrinsic patterns in or draw its own conclusions from the data.


Validation Data Set – The sample of data used to provide an unbiased evaluation of a model fit on the training dataset while tuning model hyper parameters. The evaluation becomes more biased as skill on the validation dataset is incorporated into the model configuration.

Variance: The amount that the intended function of a machine learning model changes while it’s being trained. Despite being flexible, models with high variance are prone to overfitting and low predictive accuracy because they are reliant on their training data.

Variation: Also called queries or utterances, these work in tandem with intents for natural language processing. The variation is what a person might say to achieve a certain purpose or goal. For example, if the intent is “pay by credit card,” the variation might be “I’d like to pay by card, please.”

Vanishing/Exploding Gradients – A dreaded difficulty and major obstacle to recurrent net performance that data scientists face when training Artificial Neural Networks with gradient-based learning methods and backpropagation, due to the neural network’s weights receiving an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training.

Vision Processing Unit (VPU) – As of 2016, it is an emerging class of microprocessor and a specific type of AI accelerator, designed to accelerate machine vision tasks.

Visual Recognition – The ability of software to identify objects, places, people, writing, and actions in images and videos.

Visual Search – The ability of software to find visually similar content based on an image or video query.


Weak AI: Also called narrow AI, this is a model that has a set range of skills and focuses on one particular set of tasks. Most AI currently in use is weak AI, unable to learn or perform tasks outside of its specialist skill set.

Watson – Watson is named after Dr. Watson, a former IBM CEO. It is a question-answering supercomputer that uses AI to perform cognitive computing and data analysis. In the year 2011, Watson competed on the Jeopard television show against human contestants and won the first place prize. Since then, Watson has been used for utilization management in medical centers.

Weights – The connection strength (coefficients) between units or nodes in a neural network.

Web Crawler (Spider) – An internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing, copying pages for processing by a search engine which indexes the downloaded pages, allowing users to search more efficiently.

Web Scraper – The automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

Also see:

Top 20 Artificial Intelligence Platforms for 2022

Difference between Machine learning and Artificial Intelligence

Artificial Intelligence: Automating Hiring Process For Businesses!

Top 5 Hidden Artificial Intelligence Technology

Artificial Intelligence: What Can We Expect Next?