Artifical Inteligence & Machine Learning

AI
Published

March 5, 2024

Modified

March 14, 2024

Inspired by neurons in brains… neural networks …machine that thinks like a human:

Machine Learning

…part of AI (Artificial Intelligence)

Development of a learning system (…addresses a specific problem)

  • Learning problem characterized by observation…
  • …input- & output-data with unknown (coherent) relationship
  • Goal: Find generalize mapping between input and output

Statistical learning… learning of a mapping function

  • Given input data X & output data y
  • …find a function that approximates their relationship
  • …form of the function unknown …hence need of a learning system

ML can be thought as the problem of function approximation

  • …learned mapping function will be imperfect
  • …difficult to understand distance between the approximated function and the true (best) mapping
  • …search for a “good enough” mapping function

Example: Perceptron

Frank Rosenblatt (1958) builds perceptron …single-layer artificial neural network …trained with supervisor

  • …servers as a model for a single neuron (binary classification)
  • …determines if an input vector of numbers belongs to one of two classes
  • Original task was to distinguish between a triangle and a circle

Components or the perceptron:

  • Input …multiple features …represent characteristics or attributes
  • Weights …each input associated with a weight
    • …determines significant in influence
    • …weights adjusted during training (to learn optimal value)

Perceptron single artificial neuron
  • Summation function …weighted sum of the inputs
  • Activation function …Heaviside step function …compare sum to threshold
  • Bias …adjustments independent of the input …additional parameter learned during training
  • Weight update rule …perception learns by adjusting its weight and bias on a learning algorithm

Linear Regression

Used for a larger number of machine learning models

  • Linear means …linear model which is y=wX+b
    • …relationship between the (known) input data X aka feature
    • …and the target variable y aka predictor
    • b is the bias and w is the weight of the feature
  • Regression means …model predicts continuous (quantitative) values

Goal: Find (infer) the best-fitting response between feature a predictor

More sophisticated model rely on multiple features …

  • …each feature having a separate weight
  • …aka multiple linear regression

Supervised Learning

…supervisor corrects mistakes …requires training examples

  • Reinforcement Learning
    • …agent learns based on rewards & penalties (aka feedback)
    • …based on interactions with environment …autonomous decision-making
    • Used in… robotics, optimization problems, game play
  • Supervised Learning
    • …predict outcomes based known input & output data
    • …maps input to correct output based on provided labels
    • Used in…
      • …classification (e.g. SPAM detection, medical diagnosis)
      • …regression (e.g price predictions)
      • …recommendation systems (e.g. shopping, media streaming)
  • Unsupervised Learning
    • …discover patterns, structure & relationships in data
    • …operates on unlabelled datasets …understand data without guidance
    • Used in… clustering, anomaly detection, dimensionality reduction
  • Semi-supervised Learning
    • …bridges labeled & unlabeled data

Deep Learning

…based on (artificial) neural networks

  • Employ methods of supervised, semi-supervised or unsupervised learning
    • …complex data at large-scale …simulate (automate) human learning process
    • …applies convolutional neural networks and transformers
  • Used for…
    • …computer vision …image classification …image analysis
    • …natural language processing …translation
    • …speech recognition …voice assistants

“deep” …number of layers through which the data is transformed

  • …progressively extract higher-level features (from the input)
    • …each layer transforms input in a more abstract way
    • Example: image pixel …identify edges …arrangements …noes & eyes …recognize face

LLM

Large Language Model …belong to the field of natural language processing …process & understand human language (text)

What can LLMs do?

  • …text generation …coherent & contextually relevant sentences
  • …seamless multilingual translation (bridge language barriers)
  • …sentiment analysis …evaluate sentiment in social media
  • …summary of extensive documents …extract key information
  • …image generation from text description with text
  • …conversation (chat) with human-like behaviour
    • …automate customer service interactions
  • …code generation …assistance in software development

What technologies do LLMs use?

  • …deep learning …neural networks
  • multimodal …generate text, images, and code
  • …self-supervised learning from language data
  • Architecture influenced by…
    • …model size & parameter count (billions)
    • …input representation …decoding …output generation
    • …computational efficiency
    • …self-attention mechanism …training objectives

Consist of multiple layers…

  • Feed-forward layers …process input data
  • Embedding layers …represent words (tokens) as dense vectors
  • Attention layers
    • …weight importance of tokens in a sequence
    • …capture entity relationships in text

Transformers

References:

…backbone of LLMs …GPTs (Generative Pre-trained Transformers)

  • …pre-trained on large, general-purpose datasets
  • …fine-tuned for specific tasks after training
  • Advantage …parallelization reading input training data
  • Therefore …computationally efficient for large-scale language tasks

How does GPT processes words and sentences?

  • Word embeddings …represent words as word embeddings
    • …segment words (text) into discrete units
    • …the word embedding captures the semantic association of words
      • …encode meaning …relationships between words
      • Example: “apple” is close to “fruit” but far from “car”
    • Word embeddings represented by …numerical vectors
    • …aka numerical embeddings (tokens) …used as input to a LLM
    • Process often called tokenisation
  • Transformer layers …process sequences of word embeddings (tokens)
    • …multiple layers of transformer blocks …output a new sequence of vectors
    • …capture relationship with other words …context, dependencies, semantic nuances
    • Positional encoding …process order of tokens
  • Attention mechanism …multi-head attention
    • …weight importance of each word …captures contextual information
    • …tokens contextualized in context window (long-range dependencies)
    • …amplify important tokens & diminish less relevant tokens
  • Text generation …word by word related to a input prompt
    • …words chosen based on probabilities learned during training
    • …generates coherent sentences based on learned patterns

GAN

Generative Adversarial Networks …for images

Todo…

RNN

Recurrent Neural Networks …for music

Datasets

LIAON (Large-scale Artificial Intelligence Open Network) 1

  • …non-profit organization …based in Hamburg, Germany
  • …datasets = indexes to the Internet …image URL + text linked to image
  • …recommend img2dataset 2 to download image subsets

Hugging Face Hub 3 hosts datasets mainly in text, images, and audio