AI — Glossary & Terminology

AI
Published

March 5, 2024

Modified

September 24, 2025

Prompt

Input text/query from a user…

  • …starting point for the model’s generation process
  • …sets the context for the response

Tokens

Individual units of text that the model processes

  • For example: words, characters, punctuation, spaces
  • Tokenization — Process of converting text into tokens
    • …tokenisation in language dependent and varies between models
    • …symbols are then mapped (embedded) to vectors (feed to the model)
  • Limits…
    • …models have different limits for input and output tokens
    • …prompt engineering may include the necessity to condense the input
    • …requests to different models are priced by counting tokens

Model

A model (hypothesis) is the output of a machine learning algorithm

  • Trained on vast amounts of text data…
    • …a specific representation learned from data…
    • …by undergoing a training process driven by (huge amounts of) input data
    • …contains learned patterns evaluated from the input data
    • …contains guidelines for making predictions
  • Structure of the model defines how it processes information

Parameters

Internal variables a model learns during training

  • Model’s performance is often measured by the number of parameters
  • Larger models (more parameters) capture more complex language patterns

Training

Based on a set of inputs (features) with expected outputs (labels)

  • …basically the concept of learning from examples to make future predictions
  • …process to algorithmically recognize patterns & relationships from datasets
  • Building block of a machine learning model…
  • …where the quality of the input dataset has significant impact on the model capabilities
  • Input features are individual measurable property of data…
    • …represented as a set of numeric values in a features vector
    • Feature vector used as input to the machine learning model during training
    • A feature extractor is program to extract relevant features from input data

Prediction created by a trained model based on features extracted from (unseen) input date

Label

Data labels (data annotation) …values to be predicted by a model…

  • …essential to supervised training of a machine learning model
  • Labels describe attributes & characteristics of a data point…
  • …can be based on class, subject, theme, or other categories
  • Example…
    • …images with corresponding labels to indicting visible objects in the picture
    • …image recognition then learns to recognize patterns for a labeled object

RAG

RAG (Retrieval-Augmented Generation) enhances LLMs

  • “If an LLM servers as brain, RAG is the library to the brain”
  • Goal …improving factual accuracy …reducing hallucinations
  • Ground generated response in current data…
    • …retrieved in real-time in relation to the user prompt…
    • …rather than relying solely on the model’s pre-existing training data
  • Enables to add domain-specific knowledge tailored to a specific environment

Workflow (first-stage pipeline)…

  • Retrieval, incorporate…
    • …relevant, external, up-to-date information
    • …from trusted data sources (documents, databases)
  • Argumentation …combine selected data with user input …feed to LLM inference
  • Generation …generate response based on this enriched context

MLLM

Multimodal Large Language Models (MLLMs)

  • Multiple modalities — vision, language, audio, etc
  • Process & reason across multiple modalities
  • Example: combine image + text prompts to guide responses
  • Combining multiple domain specific models…
    • …for example vision encoder (images) & language model (text)
    • …with a projection layer to map models on a common token space

Example for multimodal reasoning…

  • GTP-4V (2023) GPT-4 with vision, OpenAI
  • LLaVA (2023), Open Source
  • BakLLaVA (2024)

AI Agents

LLM agents …agentic technology

  • Autonomous agent that acts on behalf of users…
    • …to plan & orchestrate tasks
    • …using an LLM to communicate

MCP

MCP (Model Context Protocol)

Standardizes how AI agents access/manipulate external tools