AI Without Python

An Intro to Machine Learning for C++ Programmers


Borislav Stanimirov / @stanimirovb

Hello, World



  #include <iostream>

  int main()
  {
      std::cout << "Hi, I'm Borislav!\n";
      std::cout << "These slides are here: https://is.gd/aicppintro\n";
      return 0;
  }
        

Borislav Stanimirov


  • I mostly write C++
  • Professionally since 2002
  • 2006—2018: game development
  • 2019—2023: medical software
  • 2023—now: machine learning
  • Open source: github.com/iboB

This talk


  • ⚠️ More inspirational than educational
  • ⚠️ Contains personal opinion on software
  • More technical than philosophical
  • The gist, rather than the detail
  • Mainly for programmers who are not in the ML field
  • ... and who have experience or interest in low-level
  • Themes
    • Why should you consider this?
    • What can you do?

Background

Machine Learning in 2023

The Current Big Thing™ in software

Whisper, DALL·E, Craiyon 🖍, ChatGPT, GPT-J, LLaMa 🦙, LaMDA, Midjourney, Falcon LLM 🦅, Stable Diffusion, Unstable Diffusion 😉, GitHub Copilot, StarCoder, BERT 🐻, SAM, Chinchilla 🐭

A Cambrian explosion of AI tools

... and startups

... and software as a whole

There's something new every day.

(this talk will probably be outdated by tomorrow)

This software is no magic

Modern AI Software


  • In many regards software like any other
  • Written by teams (of humans)
  • ...with conventional software development tools
  • It has some unusual, but not unique, features
  • Many libraries and frameworks exist to help
  • It's most often done in Python

Python Stacks


  • The big fish: PyTorch and TensorFlow
  • Every ML framwork has a Python front end
  • Why Python?

⚠️ Personal opinion time ⚠️

Borislav on Python


  • Is Python the best language for ML?
    • No.
  • It it the worst language for ML?
    • No. But it's down there
  • Extreme care is needed for software in duck-typed languages
  • Python stacks are a mess

Python Stacks


  • Package managers: pip, pipenv, poetry, npm, conda
  • Env managers: conda, mamba, pyenv, containers
  • Notebooks and Scientific code

Modern ML software

Objective Truth

Python is slow

"No it's not slow!"

  • "This Python program is faster than its C++ equivalent!"
  • ".pyc should do it"
  • "Python is the most optimized interpreter there is!"
  • "Python JIT compilers work!"
  • "No matter. The low-level framework does the actual work."

Opaque Frameworks


  • Data flow suffers
  • Tweaks are hard to impossible
  • Many similarities with game engines
  • Bloat intensifies

Something Good About Python



  slice = a[5:10, :20:2] # slicing is pretty neat
        

* Similar syntax coming soon to C++

I'm not the only one with such problems

Alternatives


  • CUDA — so C++
  • OpenCL — so C++
  • Metal — so C++, but Objective, or Swift
  • Vulkan — so C++... OK! OK! Many more options
  • CPU/SIMD (Like REAL men!)— Anything but Python
  • Mojo — ??? — Definitely not magic, though
  • ... — Alternatives pop up by the hour

A Crash Course in ML

I am not an ML engineer

Borislav Stanimirov


  • I mostly write C++
  • Professionally since 2002
  • 2006—2018: game development
  • 2019—2023: medical software
  • 2023—now: machine learning
  • Open source: github.com/iboB

Borislav Stanimirov


  • C++: yes
  • Low-level: yes
  • GPGPU: yes
  • Chasing microseconds: yes
  • Machine learning: well...

So, this is my perspective...

ML Techniques


  • Linear regression
  • Bayes classification
  • Support vector machine
  • Decision tree
  • Random forest

NOPE

Neural Networks

Neural networks

  • History

What is a neural network?

It's a function


    enum thing { ... };
    thing classifier(const image& input);
        

    enum thing { ... };
    struct result {
      thing t;
      float p;
    }
    std::vector<result> classifier(const image& input);
        

    std::string gpt(const std::string& input);
        

    using gpt_callback = std::function<void(const std::string&)>
    void gpt(const std::string& input, gpt_callback cb);
        

What is a neural network?

It's a computation with parameters


    enum thing { ... };
    thing classifier(const image& input, const std::vector<float>& parameters);
        

Parameters

LLaMa-7B - the LLaMa model with 7 billion parameters

Training Neural Networks


  • Solve the function with respect to the parameters
  • Gradient descent and differentiability
  • Learning rate
  • Over/Underfitting
  • Stacking
  • Shearing
  • Transfer Learning
  • Fine tuning
  • ...

NOPE

Designing Neural Networks


  • It's magic
  • Mostly indistinguishable from fortune telling
  • Years of experience
  • Lots of untransferrable knowledge
  • Takes millions of hours
  • It seems that we do need Python here 😢

NOPE

Neural Network Applications


  • Design - not today
  • Training - not today
  • Inference - executing the computation - today
  • Inference on the edge - tomorrow

What is a neural network?

A network of neurons, duh

$y = g \left( \sum_{i=1}^{n} w_j x_j + b \right)$

Layers ("deep" means more than 2)

Wait! I know this

$\begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix} = g \left( \begin{pmatrix} w_{11} & w_{12} & w_{13} & w_{14} \\ w_{21} & w_{22} & w_{23} & w_{24} \\ w_{31} & w_{32} & w_{33} & w_{14} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix} + \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix} \right)$

Yes. This is mostly everything

Types of layers


  • This was the linear (fully connected, dense) layer
  • Almost all layer types can be represented as fully connected
  • It's a matter of efficiency
  • Convolution/Pooling layers
  • Normalization layers
  • Attention layers
  • "Layer" actually is pretty fuzzy
  • The Neural Network Zoo

Activation Functions


  • Without them every output would be a linear function of the input
  • Layer count wouldn't matter
  • Sigmoids
    • Logistic function
    • tanh
    • smht
  • ReLU
  • Leaky ReLU
  • GELU

Convolution

Neurons don't depend on the entire input

Weights are shared

Feature maps

Pooling

(Subsampling)

Collecting "important" features

What is a neural network?

A collection of layers which define a computation

Terminology time

Tensors

  • No, not physical ones.
  • Just N-d arrays
  • Think std::vector
  • Shape: [[1,2],[3,4],[5,6]] -> (3, 2)... or maybe (2, 3)
  • Broadcast:
    • f([1,2,3]) = [f(1), f(2), f(3)]
    • [[1,2],[3,4]] + [10,20] = [[11,22],[13,24]]
  • Tensors for weight, bias, layer
    • ll_5 = mul(w_5, l_4) + b_5

Models


  • What is a model anyway?
  • Any of:
    • The layer/computation sequence
    • The parameter (weight) tensors

LeNet

AI like it's 1998

MNIST dataset

LeNet Model

Classify individual hand-wrritten digits

  class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()

        self.convs = nn.Sequential(
          nn.Conv2d(in_channels=1, out_channels=4, kernel_size=(5, 5)),
          nn.Tanh(),
          nn.AvgPool2d(2, 2),

          nn.Conv2d(in_channels=4, out_channels=12, kernel_size=(5, 5)),
          nn.Tanh(),
          nn.AvgPool2d(2, 2)
        )

        self.linear = nn.Sequential(
          nn.Linear(4*4*12,10)
        )

    def forward(self, x: torch.Tensor):
        x = self.convs(x)
        x = torch.flatten(x, 1)
        x = self.linear(x)
        return nn.functional.softmax(x, dim = 0)
        
    m_input = create_tensor("input", {28, 28, 1});
    ggml_tensor* next;
    auto conv0_weight = create_weight_tensor("conv0_weight", {5, 5, 1, 4});
    auto conv0_bias = create_weight_tensor("conv0_bias", {1, 1, 4});
    next = ggml_conv_2d(m_ctx, conv0_weight, m_input, 1, 1, 0, 0, 1, 1);
    next = ggml_add(m_ctx, next, ggml_repeat(m_ctx, conv0_bias, next));
    next = ggml_tanh(m_ctx, next);
    next = ggml_pool_2d(m_ctx, next, GGML_OP_POOL_AVG, 2, 2, 2, 2, 0, 0);
    auto conv1_weight = create_weight_tensor("conv1_weight", {5, 5, 4, 12});
    auto conv1_bias = create_weight_tensor("conv1_bias", {1, 1, 12});
    next = ggml_conv_2d(m_ctx, conv1_weight, next, 1, 1, 0, 0, 1, 1);
    next = ggml_add(m_ctx, next, ggml_repeat(m_ctx, conv1_bias, next));
    next = ggml_tanh(m_ctx, next);
    next = ggml_pool_2d(m_ctx, next, GGML_OP_POOL_AVG, 2, 2, 2, 2, 0, 0);
    next = ggml_reshape_1d(m_ctx, next, 12 * 4 * 4);
    auto linear_weight = create_weight_tensor("linear_weight", {12 * 4 * 4, 10});
    auto linear_bias = create_weight_tensor("linear_bias", {10});
    next = ggml_mul_mat(m_ctx, linear_weight, next);
    next = ggml_add(m_ctx, next, linear_bias);
    m_output = ggml_soft_max(m_ctx, next);
        

Practical Challenges With Inference

Number crunching


  • GPGPU is the way to go
  • SIMD
  • gemm, BLAS and custom gemm
  • Cache-locality
  • Memory bandwitdh bottlenecks - M2 Ultra's time to shine
  • Quantizations. Yes, Q2 is a thing
  • Fusing kernels - hey, remember expression templates?
  • Streaming - finally a use for coroutines

Tweaks


  • They come more often that you would think
  • Quantization
  • Reshapes
  • Custom kernels
  • Sampling and resampling

How to Start?

First, forget about training!

Implement a simple model in the most naive way!

Yes, play with Python, too

Libs and Frameworks


Examples and Sources


Practical Steps


  1. Find a model (for example on Hugging Face)
  2. Look at the model description if available
  3. Look at the Python implementation
  4. Yes, there will be one
  5. Implement tensor ops
  6. Compare intermediate steps with the Python implementation
  7. ...
  8. Profit

How to continue?


  • Try being faster than the Python implementation
    • Really, not such a tall order
    • on CPU
    • on GPU
  • Do more models

The Real World


  • Plugins
  • Profiling can be a challenge
  • The periphery
    • Tokenizers, streaming, decoders, encoders, guidance
    • Horizontal scaling, MPI
  • It's software like any other

But Why?


  • If you like number crunching
  • If you like chasing microseconds
  • If you like doing magic
  • If you don't like "scientific" code

You are needed!

Let's ride the hype wave!

End

Questions?



Slides license: CC-BY 4.0   Creative Commons License