Using a bit of probability, a smart bulb, and a Raspberry Pi to build a simple Facebook Bot

The Raspberry Pi Foundation brought out a new product the other week, Raspberry Pi zero w which seemed pretty cool so I bought three. After deciding I should *actually* make something with them as opposed to, say, making a LED flash then exile them for all eternity to the electronics box (Guilty glance at my original Rpi and Arduino).  So I decided to try and make a Facebook messenger bot I could use to interact with my LIFX smart bulb.

As it turns out both Facebook and LIFX have pretty well documented HTTP APIs so getting started wasn’t particularly difficult. Pretty quickly, just from following the documentation on the LIFX site I was able to get a simple python interface up and running which could turn my bulb on/off, adjust the brightness, change the colours and query state information.

Creating a Facebook bot was also pretty simple there are a few tutorials online but essentially it just boils down to forwarding messages (done through the Facebook developers portal) to your server then dealing with the messages with your HTTP request system of choice.  Again this was just another python script. After putting those two things together it was pretty simple to make an integrated Facebook messenger processor come light controller.

Demo! Slightly sped up to make it a bit less dull, from sending a message to receiving the response is about a second but the changes take effect in slightly under a second, just before the response is sent back to the user.

Ok so the probability bit which I mentioned in the title. Basically I’m using a fairly simple machine learning algorithm called Naive Bayes. This is based on Bayes rules which is:

Screen Shot 2017-03-12 at 19.23.54

I’ll try and keep this bit brief, just remember that basically A is some label and B is a word in the Facebook message.

The above formula means: the probability of event A given that condition B is met is equal to the probability of B given that condition A is met multiplied by the prior probability of A (I.E our initial guess, irrespective of B). All of this is then divided by the prior probability of B, again this is irrespective of A. I actually skip the last step as it’s irrespective of A. So what I’m actually calculating is the likelihood.

Screen Shot 2017-03-12 at 19.32.22.png
The Actual function being used, P’ used to denote likelihood


Basically, I have a set of these classifiers trained to work out different things from the sentence. The basic walkthrough is: is this operation a query or trying to change the state, which object does this refer too, and what operation does it refer too? Each point is a pre-trained Naive Bayes classifier which represents a node in a big decision tree.

Screen Shot 2017-03-12 at 20.00.51.png
Blue = Naive Bayes classifier, Red = Intermediate Label, green = Final label

This means to add another operation I just have to add additional labels to the training set. Since these are just text files I could theoretically actually add a command in the Facebook messenger interface to retrain the classifiers dynamically to make it easier to add more labels to each node (for instance additional colours).

So for example if I said ‘dim the lights please’ it will go through the first classifier which will distinguish sentences that are querying the state and changing the state. It knows that the word ‘dim’ appears in the change state training examples so it will end up with a higher likelihood and the others appear in either both or neither. It will then see what object it is referring to, again, ‘dim’ appears in the brightness set but not the others. Finally, once more to check what is the intention of the statement.

As it turns out this all pretty fast to compute, all the HTTP requests are a pretty big bottleneck in comparison. The examples I gave could be much more efficiently implemented by just searching for certain strings in the messages but my hope is to build the dataset to process more complex sets like “if the light is on, turn it off” and sets with more ambiguity.
And all this is running on this tiny computer.


Screen Shot 2017-03-05 at 23.34.46
The ‘s’ in IoT stands for security, as the mini light show courtesy of my friend showed






Compile-time Raytracer and C++ Metaprogramming

As you’ve probably derived from the title the two raytraced images above are computed during compilation, I thought it was quite cool so I thought I’d write a short blogpost on C++ template metaprogramming. To give you a brief overview here, the idea is when the compiler is invoked on the code is to have it evaluate as many expressions as possible. So during compilation all of the ray bounces ect are calculated and then the resulting code is just a massive array of floats with the value at each pixel represented by a constant value. This array is saved to a file when you run the executable.

It is quite tricky to debug however, so you can see some artifacts in the images above. They cropped up just before started writing this post and I’ve already spent waaaay more time than I intended on this mini-project so I’m just cutting my losses with them. I have another picture below from an earlier build which doesn’t have the fuzzy effect (but does have that *one dot*, which is annoying).

Early version, ground reflection, no weird fuzzy effect.

Take this example:

template<int base,int power>
struct Pow
const static int result = Pow<base,power-1>::result*base;

template<int base>
struct Pow<base,0>
const static int result = 1;

int main(int argc,char *argv[])
return Pow<5,2>::result;

If we look into the assembly file produced we just see the constant 25 being written to a register.

 mov eax, 25 

The total assembly file is only 22 lines (function version is >200 on -O2)! Here’s what happened,  Pow was created with the template parameters of 5 and 2, this doesn’t call a function with those values it creates a type using them. This type has a single constant value. When Pow<base,power-1> is evaluated it creates a whole new type with its own constant static field which creates another, and so on until power is zero.

template<int base>

struct Pow<base,0> {};

This is called a template specialisation, it allows for the initial template to be overridden when the conditions are met. As the ‘base’ value is itself templated it can be any value, but the second value, power, is specified to be zero. This means when an instance of ‘Pow’ is created using any value of base and a value of zero for power, it compiles this template instead of the base one. In our case it just returns the value one and as that is a constant value its multiplication with the constant base value in Pow<base,1> is immediately evaluated and so on until it reaches the power which the first struct was created with, and with that you will be left with one single constant value. How do you make a Raytracer out of this? With modest difficulty, to start with anyway.  Basically, you just stack all of these constant evaluations on top of each other. This results in some pretty messy code, here is the mess for Vector Dot product (Scalar is a namespace, fyi): Screen Shot 2016-02-17 at 03.09.53

“Why do you need Scalar::Multiply if you just multiplied ints before?”

Well. Raytracers don’t work so well with ints so you need to use floats, but as it turns out this is quite tricky when it comes to templates, recall how we used ‘int’ and now we use ‘class’ (which just means generic type name, in this case), well that only works for constant integral types. So we need to do some trickery to wrap float values in integral types. To get around this I scale up the floats by some huge number (65536)  and store them in a int64_t (I didn’t invent this! I just saw it in another compile time tracer,  you can see his tracer here). Great thing about this is you only have to do some really basic stuff for division and multiply, and subtraction and addition work “out of the box”.

static const int64_t scale = (1<<16);

template<int64_t v>
struct Number
static const int64_t val = v;
constexpr static const float actual_value = static_cast<float>(v) / static_cast<float>(scale);

Once we’ve written the basic types, (Scalar and Vector) plus the normal operations and some other utility functions, like “IF” (see below), then we just stack normal raytracing maths on top of each other. This post is about the compile-time part, so I won’t go into the raytracing maths.Screen Shot 2016-02-21 at 17.22.42

This was the first time I had tinkered with metaprogramming so it was pretty hard at first, basically once one template evaluation fails you get a huge chain of thousands of failures.  At the start I probably had a failure about 80% of the time I changed something non-trivial, so it took a long time. But, once you get used to this way of programming it starts to flow quite well and it becomes much more intuitive.

This has been a pretty cool project, learnt loads as usual. When I started out I was going to try to recreate a scene from an earlier OpenCL Raytracer I had written, which can be seen in action here. That ran in realtime, this one takes probably around 15-20 minutes to calculate for a 300*300 image and I have quite a few projects on concurrently so I can’t really afford to run it on larger scenes. Having said that, I still think it’s pretty cool having a Raytraced image produced during compilation! So I’m still pleased with it. I’ll end this with a shot from my earlier Raytracer for comparison, cya next time!

Shot from my old OpenCL Raytracer

Convolutional Neural Networks from scratch

Over the past week and a bit I’ve been reading up on Deep Learning and Convolutional Neural Networks. I mean a lot of reading! I was dismayed to see that so many of the lectures and tutorials rely on machine learning libraries (like Caffe, Torch,the Python Docker book thing,  ect) which is fine, but I felt that I was kinda missing out on the intuition behind them. So I decided to write the whole thing from scratch in C++, and I finally got it to work so I thought I’d make a blog post about it!

So what is a Convolutional Neural Network (CNN)? It is based around the convolution operation, which you perform by multiplying two matrices element wise. One matrix is your weights and the other is a same size patch of the image. You keep moving the patch over the image until you’ve covered the whole thing. Here’s an example of a 3*3 Sobel filter from an earlier project.


The filter reduces the width*height*depth (depth = 3 due to red, green,and blue channels) to just width*height. Technically the width and height are reduced too but the edges are (usually) just padded with zeros to keep the same dimensions. Convolutional layers just use filters like these, the really cool part is that they *learn* what filters to use!

Screen Shot 2016-02-12 at 02.55.30

Notice we treat the 2D input image as a three dimensioned volume using the RGB channels as the depth components. What happens when you use more than one filter?

Screen Shot 2016-02-12 at 03.03.37

You produce a new volume with the z dimension matching the number of filters used. Where each filter will move over the image and produce its own 2D output image and these are stacked together to form the resulting volume. The result of the filter is also thresholded by some function (I used ReLU), but that’s not really important right now. Now lets look at another layer that we can use.

Where convolutional layers (usually) conserve the width and height of the data while altering the depth, pooling layers conserve the depth while reducing the the width and height dimensions of the input volume. The method I used is called Max Pooling, where you just pick the largest value from a block of the image. Using 2*2 image segments the input would be altered like so:

Screen Shot 2016-02-12 at 13.44.10

If we passed the input from the above example through a max pooling layer we would get:

Screen Shot 2016-02-12 at 13.41.08

Now this is where the “network” part comes in, we stack these layers to produce one big network, the object scales are a little off but it will look something like the below.

Screen Shot 2016-02-12 at 14.53.52

At the end of the network we flatten the result and feed it into a ‘fully connected’ layer which is just a normal neural network with each neuron attaching to each input pixel. But now, not only has the size been reduced but the convolutional layers will have trained themselves to look for features in side the picture. What you should find is that the first layer of the network will train itself to look for ‘low level’ features in the image like lines, basically acting like an edge detection filter. The higher level convolution layers will then pick out higher level features from that result, I saw an lecture where they used a technique called guided backpropagation to actually pull out features which the higher level convolutional layers had trained themselves to look for like nosies and eyes.

It took me quite some time to get everything working, especially in calculating and flowing the derivatives through the network for the backpropagation. But it works now! I made a dummy set of 64*64 images of crosses and circles. Starting with a two filter convolutional layer I feed the network this image of a small circle (notice mostly whitespace).


On the first iteration (no training), the result from the first layer is:

Which makes sense, as all of the filter values are going to be a positive floating point value the intensities will be scattered across the picture. After training for 100 iterations we get this volume (think of it as the two slices of a 2 deep volume):

What you can see is the filters have started to cut out the background value (the data set includes all sort of colours for background) and just focus on the circle. This is then passed up the network to eventually reach a logistic regression classifier which gets 100% on this pretty basic example set.

It was pretty frustrating at times even just working with the tiny data set that I had, but writing the whole thing from scratch has been really rewarding in that now I have a pretty solid understanding of the intuitions behind the backpropagation/derivative calculations, CNNs, and machine learning in general. The network is super modular so I can just drop layers in and out and modify their hyperparameters. The setup looks like so:

 // Width, height, and depth (3 due to RGB) of input image
 Network seqNet(width,height,depth);

 // Layer 1 - Convolution - ReLu - MaxPool
 seqNet.AddConvolution(5,3,1); // Filters - Filter size - stride
 seqNet.AddMaxPooling(2); // Size

 // Layer 2 - Convolution - ReLu - MaxPool

 // Layer 3 - Linear Classifier (LogRegression)

I have an idea for a pretty cool project which would use CNNs but I’ll probably need to write an GPGPU implementation for it. I only just touched on the stuff I’ve learnt about CNNs but the field is super cool and can do things like not only classify an image but if the class is part of a larger image identify the bounding box of that class in the larger image (I.E where it is). There is so much to more to go into in terms of the implementation and how the backpropagation makes sense, but I think this is a fairly decent introduction to what CNNs do.

Shout out to Nando de Freitas’s  superb deep learning course at the University of Oxford, the videos go pretty deep into the maths which was super useful, and Andrej Karpathy and Fei-Fei Li’s deep learning course at Stanford University which has probably the best videos and course notes on the topic.

Logistic Regression for image processing, on a PS3!

An attempt to classify the bruised segments of an apple. (It’s not trained to find the cuts)

I thought I’d write a short blogpost about a project which I’ve been working on recently: identification of bruises in an apple using logistic regression running on the Cell broadband architecture. The Cell Broadband engine is a processor from the olden days, used specifically in the PlayStation 3. It’s quite difficult to work with, but super fun! Really cool having to think about which the hardware is doing and tailoring solutions around it. The algorithm isn’t 100%, but we have an *extremely* small training set which isn’t particularly diverse either. This is due to the fact that I produced the sets by taking pictures of apples to get the “good” set, then throwing them at the floor to get the bruised set!

Anyway, I digress. To classify the apples we load in the normal apple image and break it into 3 colour channels (red, green, and blue) and create a new fourth channel through greyscaling the image using those values. For each channel we break the image down into 5*5 segments and for each segment calculate the histogram of intensities for that segment. I wrote some code to output histograms to an image for a rough visualisation of what is going on. As each pixel has an intensity (for each channel) of between 0-255 that is the range of the histogram. I show two arbitrary examples below.

The idea is that if all the bruise histograms are concentrated in one area and all the “good” bits in the remainder we can plot a decision boundary to discriminate between the two. Once we have these values we can feed each histogram to a trained classifier to get a prediction on whether or not that histogram is of a bruise, if it is we shade it in red. Each colour channel has its own classifier which means the training takes much longer, but it makes the classification easier to parallelise. The result is each colour channel comes up with it’s own ‘guess’ as to what pixels are bruised. Each colour channel on its own is *wildly* off course (mainly due to the small training set as stated above) but by combining them all and only keeping pixels *they all* classify as bruises we get a *fairly* accurate picture. These are the predictions for each colour channel:

As you can see they all pretty much pull in the same bruised pixels (algorithm isn’t trained on the cuts) but misclassify different pixels. This is what allows us to use them as masks to produce the final image. Once we have this we get a fairly clean image, but there are still some parts which have been misidentified so we reduce the noise by calculating the percentage of correctly identified pixels in a larger segment (about 10 times larger) and if it fails to meet a certain threshold, say 60%, we scrub the guesses for that segment. This cleans up the image quite a bit and gives the result shown at the top of the page.

Not bad for such a small training set! If we run it on some of the non bruised apples we get either a single segment misclassified or none at all (after noise cleaning), but we don’t have that many apples to verify the results.

To run this on the Cell broadband engine what we do is we break each channel into smaller chunks and run each classifier on its own SPE. Basically the PS3 has one processor called the PPU and 8 (6 accessible) coprocessors called SPEs. The PPU can send the addresses of the data to the SPE which can then read the data straight out of either L2 cache or main memory. The performance running on the single PPU was, of course, poor but splitting it up gave an enormous performance boost which was cool. There were *so many* challenges to get it to work, things like data alignment, running out of space, ect. But it was pretty fun in the end up.

To classify the bruises we use logistic regression which is really cool. It works by applying some weight to each of the inputs, summing the result, then using a function to squash the result between 0-1. Zero is a “good” segment, one being a bruise. That might seem counterintuitive, but my reasoning was we are *looking* for bruises so it makes sense for that to be one. The function we use to squash the result is the sigmoid function which is given by (1/(1+e^-z)). Combining that with the above means our prediction function looks like:

Screen Shot 2015-12-09 at 03.28.57
Theta= weights, x = inputs, C = a given class (ignore the transpose on the theta)

But how do we know what the optimal weights are? I’m glad you asked. We use gradient descent. First, we start with a way to measure how accurately do the weights map the input to the known output. We want a function which gives a low cost when the prediction is close to 0 and the actual result is 0 and the same for a prediction of 1 when the actual result is 1. Which this function does. I won’t go through it, but substituting in 1 and 0 you can see the desired results.

Screen Shot 2015-12-09 at 03.31.47
Cost function J of any given weights, theta, for training set of size m. (excluded regulariation term for brevity)

Now this bit is cool. We can create a learning rule to reduce the cost of the algorithm by taking the partial derivative of the cost function with respect to theta. The learning rule looks like:

Screen Shot 2015-12-09 at 03.44.08
Alpha is the learning rate, regulariation term omitted again.

To get that partial derivative I had to dust off some logarithm and derivative maths I haven’t used in a while. First, we use the logarithm rules to simplify the two logarithm terms in the cost function, like so:

Screen Shot 2015-12-09 at 03.32.17
The first line is the first term then the next two are working to simplify the second term. (I couldn’t fit it in one)

We then substitute them into the original equation and expand the brackets.

Screen Shot 2015-12-09 at 03.33.04

Once we have this we can then take the partial derivative with respect to theta:

Screen Shot 2015-12-09 at 03.33.10

Once you have that you can plug it into the learning rule, iterate over the learning rule for some number of iterations and eventually you’ll get a set of *kinda* trained weights. Like I said, not enough training examples + not a particularly diverse bunch means the results aren’t as good as they could be but it was still fun working with linear regression and the PS3. I also used a regularisation term which helps prevent overfitting, but there’s not much to it so I just left it out to focus on the core algorithm.

Aannddd that’s a wrap folks, see you next time.

Writing a toy compiler for mesh generation


After my last blog post of using L-Systems to make fractals, I made some changes to bring it into three dimensions. After doing this I found that I actually need to make quite a few extensions to get it work well, things like: symbols for rotation axis, the ability to set the direction to a value, math functions (like sqrt), and arguments. So what I ended up doing is cannibalising parts (and rewriting lots) of my old compiler project to build a toy scripting language to generate patterns for a modified version of the L-System graphical interpreter from last time.

The result is a compiler which is pretty brittle (all of it was written from scratch though, even things like the regular expression system) but hopefully I can use it to generate the base patterns in a larger procedural generation system. I’m hoping I can create a library of base shapes/meshes and have a system to build something like, say, a spaceship. Where it could build the final ship by combining the smaller components together in a way which makes sense.

So let’s do a walk through! Take the script:

 A 135
 A 135

The compiler will take this script and run it through the lexer and the parser to produce a parse tree like so:


That’s actually the less verbose version! Due to the recursive nature of the Context Free Grammars which are used in the parser (FunctionBody -> FunctionCall + FunctionBody, for instance) the tree is a convoluted mess. Some clean up is required to remove some of these redundant calls. It’s a bit hacky at the moment with thing being arbitrarily removed because I know that they are redundant and expressions always kept. So it isn’t quite an abstract syntax tree.

Anyway. This tree is then passed along so we can run through and resolve the arguments, the function calls, and evaluate all the expressions. Once it is finished you end up with a compiled string like so:


The L-System interpreter uses its own lexer to run over the string and break it into readable lexemes. The ‘D’s are ‘Draw’, ‘a’ rotation Axis, ‘A’ angle, ‘d’ direction (sets the direction rather than turning with the angles), and ‘x’ ‘y’ ‘z’ (upper or lower) for vector constants representing the three dimensions. As before it executes these commands to produce a set of vertices which it turns into OpenGL buffers.

Finally the interpreter will produce:

Screen Shot 2015-10-26 at 17.04.55

Now take the script (Yes, yes a square must satisfy WIDTH=HEIGHT so it’s pointless but it was like 23:45 when I wrote it so…):

 A 45 
 A 135

This time the compiler will resolve the “Triangle” function calls by looking up the previous definition and will produce:


This time we have the ‘[‘ and ‘]’ for branch and ‘M’ for move. The interpreter will produce:
Screen Shot 2015-10-26 at 17.37.22

And finally we can use the above script to create a new script which will produce:


 d z
 a y
 a x
 a -y
 a -x
 d y
 a -z
 a -x
 d -y
 a -z

Again, obviously, a cube has to satisfy width=height=depth but I can confirm that if you change the values which are passed to the “Square” functions each function will produce a different sized triangles. Not quite a square though if you don’t follow those constraints!

Next time hopefully I will have some cool procedurally generated content built with this. Until then, Wubalubadubdub!


Click to see the whole thing!

Over the past few days I’ve been reading up on various procedural generation algorithms and I came across a L-Systems. They are super simple and really cool. You start out with some starting string, or Axiom, and then you run a grammar over it a certain number of times to mutate the string in a deterministic way. Once you have that string you can then interpret the string using a graphics system to produce really nice graphical effects. I’ve used context free grammars before for compilers so it was cool to see them being used for something completely different.

The grammars behave slightly differently, while the last time I used them the whole purpose was to filter out all the non-terminals (or throw errors) but this time the point is to use the productions to indefinitely insert the same recurring pattern of non-terminals. The bounds of the loop determine when it is finished rather than the result. Take the following fractal.Screen Shot 2015-10-15 at 00.32.21

This fractal was produced using the following pattern:

// Not the actual code!
axiom = "F"; // Start state
angle  = 60.0f;
iterations = 5;
production['F'] = "F+G++G-F--FF-G+"; // Production for F
production['G'] ="-F+GG++G+F--F-G"; // Production for G

So basically what happens is it reads the start state (‘F’) and looks up the production for ‘F’, finds it and replaces ‘F’ with the production for ‘F’, a.k.a that big messy string on line 6. Iteration one done. Now it goes over it again this time replacing every ‘F’ for the production of ‘F’ and every ‘G’ with the production of ‘G’. And so on until it has iterated the required number of times.

Once you have the massive string produced you can then pass this to the graphical interpreter to render a nice fractal pattern. For L-Systems it is usually (always?) a simple turtle based solution. I use the turtle to build the mesh then just render it with normal OpenGL buffers. My interpreter draws a line every time it reads a capital letter, turns left and right on ‘+’ and ‘-‘ respectively, pops/pushes the stack on ‘[‘ and ‘]’ (needed for cool tree effects), and has special lower case characters for different things.

//  Tree below
axiom = "F"; // Start state
angle  = 25.7f;
iterations = 5;
production['F'] = "c2F[c1+F]F[c4-F]F"; 

Screen Shot 2015-10-15 at 02.19.29

For instance this uses the special character ‘c’, followed by an index, to change the colour of the turtle. It also uses the stack. The stack lets the turtle save it’s state so it can produce those nice branching effects you can see. Using ‘[‘ will save the position/direction/colour to the stack and ‘]’ will return it to that state.

//  Box thing below
axiom = "c3F+F+F+F"; // Start state
angle  = 90.0f;
iterations = 2; 
production['F'] = "F+f-FF+F+FF+Ff+FF-f+FF-F-FF-Ff-FF"; 
production['f']  = "ffffff";

Screen Shot 2015-10-15 at 01.13.37
You can also use the special character ‘f’ to move the turtle without drawing a line. Allowing for “floating” patterns like the above to be produced.

You could also add in a new special character ‘%’ which could be a probability of pushing/poping to the stack which would allow you to introduce some more stochastic tree structures, and you could add in characters to reduce/increase the step/angle change which would give more control over the resulting structure. Maybe you could add probability into the grammar production as well?

Here’s some of the fractal patterns I produced using L-Systems.

//  Sierpinski gasket
axiom = "F"; // Start state
angle  = 60.0f;
iterations = 8; 
production['F'] = "G-F-G"; 
production['G']  = "F+G+F";

Screen Shot 2015-10-15 at 01.09.01

// Tree thing.
axiom = "F"; // Start state
angle  = 22.5f;
iterations = 5; 
production['F'] = "c2FF-[c1-F+F+F]+[c4+F-F-F]"; 
This is for the trees at the top, they go from 1 iteration to 5. The below is iteration 5.

Screen Shot 2015-10-15 at 01.26.09

That's it folks! You've been a wonderful audience.

Clangin’ , tankin’, JITin’ and some other stuff I’ve been doing.

First blog post in a looong time, but I’ve got some cool things to show off. I have had a few hobby projects which I worked on since my last blog post, like building a compiler completely scratch (I was busy with work so only finished the frontend).  But today I thought I would show off something I’ve been working on over the past week for a university project.Before I came back to university I was brainstorming some project ideas with one of my colleagues (A.K.A the 8BitPimp) and we got on to the topic of building an AI testbed/learning to program framework. The idea was you could have a tournement style game where each player builds an AI to fight against the other AIs. I wrote a simple validator to see how difficult it would be to compile and run code on the fly in the context of a game.

The C++ engine  loads C scripts at runtime either from a file on disk or takes a string as argument and interprets that as C code. I used the Clang API to invoke clang on the C script to turn it into an LLVM Module which can then be passed to the LLVM API  and compiled into a native object. The C files include a header file which is also included by the C++ game engine and that is used to communicate between the two systems. The C++ game engine saves key parts of the game state to a C struct and the C script can send commands back to the game through a similar struct.

For instance if we take the script:

#include "Runtime.h"
void update()
    gCommands.turn = -1;
    if (gStatus.canFire == 1) = 1;

We will see a tank doing something like this:


And we can make another tank be controlled by the user with this script:
#include "Runtime.h"
void update()
    if (gKeys.A ==1) {gCommands.turn = 1; }
    else if (gKeys.D == 1) {gCommands.turn = -1; }

    if (gKeys.W == 1) { gCommands.thrust = 3.0f; }
    if (gKeys.S == 1) {gCommands.thrust =-3.0f; }
    if (gKeys.Z == 1) {gCommands.thrust = 0.0f; }

    if (gStatus.canFire == 1 && gKeys.SPACE ==1) { = 1; }

And one ‘bot’ to patrol about:

#include "Runtime.h"
#include <math.h>
void update()
    if (gStatus.position.x < 200 && (int)(gStatus.direction.x + 1.0f) != 2) { gCommands.turn = -1; } if (gStatus.position.x > 600 &&
            abs((int)(gStatus.direction.x - 1.0f)) != 2)
         gCommands.turn = 1;

    if (gCommands.turn == 0)
         gCommands.thrust = 3.0f;

    gCommands.turnTurret = 1;
    gCommands.turnTurretLocation.x = gStatus.position.x+(gStatus.direction.x*100);
    gCommands.turnTurretLocation.y = gStatus.position.y+(gStatus.direction.y*100);

And we get something like this:

Tanks are running on the C code (green is player) above and I wrote the game environment in C++ with SFML for the windowing/graphics. Art by Kenney.

Anyway it’s still at a pretty early stage, but I thought the Clang/LLVM JIT part was pretty cool. There is probably a little too much code to show how the whole thing works (linking LLVM/Clang into a standalone project is a post on its own IMO). Getting it all to work involved a *lot* of stepping through the LLVM/Clang sources but eventually I got it to work pretty well. TBH, for this, you probably could just use the interpreter. Here is the gist of it:

std::vector<const char *> args; // The command line clang will be invoked with
args.push_back("script.c"); // Our file

 // Create the Clang Diagnostic objects here

 // Ownership of this pointer is passed to the compiler invocation.
 clang::CompilerInvocation *CI {new clang::CompilerInvocation}; 
 clang::CompilerInvocation::CreateFromArgs(*CI, &args[0], &args[0] + args.size(), diagnostics);

 clang::CompilerInstance Clang;

// Error checking ommited

// We want LLVM IR.
 std::unique_ptr<clang::CodeGenAction> codeGenAction(new clang::EmitLLVMOnlyAction());
Clang.ExecuteAction(*codeGenAction); // Compile the code
// Error checking ommitted

This compiles the C script into LLVM IR, which we can then pass to LLVM to JIT it into native assembly.

std::unique_ptr<llvm::Module>module =codeGenAction->takeModule();

llvm::EngineBuilder engBuilder(std::move(module));

engBuilder.setEngineKind(llvm::EngineKind::JIT); // We want JIT, defaults to interpreter.

engBuilder.setOptLevel(llvm::CodeGenOpt::Aggressive); // -O3

llvm::ExecutionEngine *engine = engBuilder.create(); 

// Now the engine will compile the code either with a call to:
// Or when automatically when you query the engine for an address.

Boom. Now we have a native object, we can the get the address of functions and globals like:


Annnd thats it for now. It will be much, much, much more involved for the actual AI (with actual AI techniques instead of dumb bots) part but for now I think this a pretty good validation of the core concept.

Writing a screen recorder on MacOS

I took a dive into the OS X windowing/graphics framework to see if it is possible to capture the contents of the screen (without capturing OpenGL buffers). Turns out you can!  Here’s a simple screen recorder I wrote with it.

(ignore the debugger continue in lieu of a key press I was lazy)

And here’s how to implement it :

    CFArrayRef windows = CGWindowListCopyWindowInfo(kCGWindowListOptionAll,0);
    CFStringEncoding encoding = CFStringGetSystemEncoding();

The first line gives us a list of all the windows (plus a ton of junk values) which are open and the second gives us the string encoding in use on the system (we need it later). In order to maintain cross compatibility with iOS/other systems they have a lot of abstraction so basically everything is returned as a void star and needs to be passed into a function to query the result which can then be cast to something concrete.

for(unsigned int index = 0; index < CFArrayGetCount(windows); ++index){
    CFDictionaryRef dict = static_cast<CFDictionaryRef>(CFArrayGetValueAtIndex(windows, index));
    CFStringRef value = static_cast<CFStringRef>(CFDictionaryGetValue(dict,kCGWindowName));

    if( value ){
        char name[256];
        bool worked = CFStringGetCString(value,name,256,encoding);

        //Wordpress formatting screwed this bit up a little
        if( worked ){
	    if( strcmp("OpenRA", name) == 0){
	        CFDictionaryRef dims =static_cast<CFDictionaryRef>(CFDictionaryGetValue(dict,kCGWindowBounds));
	        m_Screen = CGRectMake(g_Rect.x,g_Rect.y,g_Rect.width,g_Rect.height);

As I mentioned basically everything is a void star so you can see a lot of casts. The first line in the loop returns a dictionary related containing a whole bunch of details to do with the window, such as the name, whether it is active, ext. The second line queries the dictionary for the name of the window (the kCG value is a CFStringRef, there is a few possible values this can be). If that worked we then convert the value to a CString, there is a function to return a CString pointer but unfortunately the documentation specifies that you should never rely on it being non-null (and it always returns null for me) so we have to use different function. We can the compare this value to the target window name. If this is the window we are looking for we then query the dictionary once again for the value specifying the bounds of the object, and convert those bounds into a usable CGRect.

This is where things get horrible. The dictionary stores the bounds in fields like [“Width”,800],[“X”,0] but the “Width” isn’t a CString it’s a CFStringRef and the 800 isn’t an int/double it’s a CFNumberRef. So we need to write another function to extract these values. Luckily, there is a function CFDictionaryApplyFunction which can apply a function pointer to every element in the dictionary so we can do something like this:

static void getValues (const void* key, const void* value, void* context) {
    CFStringEncoding encoding = CFStringGetSystemEncoding(); // As before
    char name[256];
    bool worked = CFStringGetCString(static_cast<CFStringRef>(key),name,256,encoding);

    if( worked )
        if( strcmp("Width", name) == 0 )
            else if( ... ) //And so on for "Height", "X", and "Y"
        { ....}

This function checks the key and converts the CFNumberRef into a usable type, then saves it into g_Rect.width.

        //In init
	m_DisplayID = CGMainDisplayID();

        //In capture screen
	CGImageRef image = CGDisplayCreateImageForRect (m_DisplayID,m_Screen);
	CGDataProviderRef dataProvider = CGImageGetDataProvider(image);

	CFDataRef imageData = CGDataProviderCopyData(dataProvider);

	size_t dataLength  = CFDataGetLength(imageData);
	unsigned char * rawData = new unsigned char[dataLength];
        //Gets the actual image data from the image
        CFDataGetBytes (imageData, CFRangeMake(0,dataLength), rawData);
        //We can also query the image for a whole bunch of other useful info
 	m_Depth = CGImageGetBitsPerPixel(image);
 	m_ComponentSize = CGImageGetBitsPerComponent(image);
 	m_Pitch= CGImageGetBytesPerRow(image);
 	m_Width  = static_cast<unsigned int>(CGRectGetWidth(m_Screen));
 	m_Height = static_cast<unsigned int>(CGRectGetHeight(m_Screen));


Using the m_Screen rect constructed before we can now capture an image with those bounds on our screen with the code above. Volia! We can now push those pointers (as unique_ptrs to avoid leaks) onto a stack and pop them off one by one and pass them to some windowing code to be Rendered.

If you wanted to capture the whole screen you can use this method:

m_Screen = CGDisplayBounds(m_DisplayID);

Also if you are using CMake, like me, to include Quartz you add the line:

set(CMAKE_EXE_LINKER_FLAGS "-framework CoreGraphics -framework QuartzCore -framework ApplicationServices -framework Quartz")

Some side projects: Machine Learning and Software Rasterisation

I’ve decided to start a little development blog to show some of the hobby projects that I have been working on. This inaugural post will look at the two main projects I’ve given myself over the past little while: learning about Artificial Intelligence, and a software rasteriser with a graphics pipeline similar to OpenGL (though without Geometry and Tessellation shaders) .

Software Rasterisation + Graphics pipeline

Interpolation still looks a little funny
Interpolation still looks a little funny

This project I have had kicking about in some form or another for months. It was so much fun to write I learned so much maths in the process which was cool. When I set out on this project I decided that I would try to write as much stuff from scratch as possible so all of the model loading and the maths things like matrices and vectors I wrote from scratch. SDL is used for writing a buffer to the window (and creation of said window) but thats about it. I also gained so much intuition about how the graphic pipeline works having now written out the whole thing from the vertex shader (can be switched at runtime) through the rasteriser and on to the fragment shader (like the vertex can switched). It was quite a challenge to get it all to work but one of the most rewarding programming exercises I’ve ever done.

I’m going to wrap this project up quite soon, the performance is poor at the moment and the interpolation seems a little off, but once i’ve got those hammered down I will probably move on. I might add in a few new features like bitmap support and multithreading but we will see, I have other projects I want to work on.


I’ve been watching a few different courses on machine learning and AI from Udacity and Coursera and have learned quite a bit in a few weeks. One of the things I wrote during this is a simple language classifier using a Naive Bayes Net in C++:

Screen Shot 2015-03-10 at 00.18.10

It’s pretty simple code, most of the work was in working out how the probability theory around Bayes rule worked but the end result ended up looking quite cool. The whole process is trained on huge chunks of text which are already classified so really all you need to do is change the training data to make something like a primitive spam detection filter, or you could identify who sent a message on the basis of previous messages. It was also cool to have a fresh project to start using with some C++11/14 features that I’ve not really used much of before.

Also I looked at a few other AI techniques like Logistic + Linear Regression, comparison of search space techniques like A* (which was revision really), and some more. Hopefully I should have something cool to show in a few weeks (whenever I get time) using other algorithms.