Friday, December 30, 2016

Lernmi: Building a Neural Network in Ruby

Neural networks

A neural network is based somewhat on a real brain - the idea being that data comes into a bunch of neurons at the front, gets propagated through multiple layers, and an answer comes out the end. We've covered this a little in another blog post, so here we'll just focus on what goes on under the covers.

Math time

A neural network is a type of non-linear function that can approximate any other function. How well it can approximate is based on the number of links it has between its layers of neurons. In other words, more neurons and more layers gives you a better (but slower) neural network. If you want to have a play with this idea, have a look at the Tensorflow Playground - if you can't get it to match, try a mix of more layers and more neurons per layer.

More math time

We have a non-linear function, and we need to approximate something based on stochastic gradient descent. In other words, we get a bunch of samples as input, and we need to adjust the network based on what the output *should* be, and then the network will change to give us answers that are slightly closer to what we expect. If our training data is a diverse sample of our real data, then we can train a neural network on our training data, and it'll also work as well with other data that it hasn't seen before.

This does seem kind of like magic though, right? What's going on under the covers?

The calculus bit

We generally use a method called Stochastic Gradient Descent, or a method related to it. The "stochastic" part means we randomly choose samples from our training data so that the model doesn't get biased, and all the little nudges we give it start to shape it in the way we want it to behave. The "gradient descent" part means for each sample, we look at how wrong the network currently is (the "error"), and calculate a gradient (or slope, if you like), and nudge the network in that direction.

This approach is robust and it works for a lot of things, but the hard part with a neural network is trying to find the gradient for every single weight in the network - if your network is 10 layers deep (a small network by modern standards), how does an error at the output adjust the weights 10 layers back at the start?


We use partial derivatives to find the gradient at each layer, and then use that gradient to adjust the network. At each neuron, we take the sum of all of the downstream gradients (they get "backpropagated" to us), multiplied by the derivative of the activation function applied to the forward-propagated data. We then take this number, multiplied by the weight of the previous link, and add a fraction of this (multiplied by the training rate) to the current weight of that link. We also take this number multiplied by the forward-propagated data and backpropagate that to the previous neuron.

Simple... right?

This is some fiddly stuff if you're not currently a calculus student, and it took me a couple weeks to get my head around. You can find it all on the wikipedia page for backpropagation, and there are a couple of papers floating around the internet that explain it in different ways.

It shouldn't be this hard

The idea of multiplying partial derivatives is a simple one, and I spent a long time thinking about how to make this accessible. The architecture that I've settled on for Lernmi has made a couple of trade-offs, but I feel it breaks out the backpropagation into bite-sized pieces, and I hope that it makes it easier to understand.


I've published a neural network application in ruby, hosted here. The logic is broken up between Neurons and Links, and instead of the word "gradient", I've used the word "sensitivity" as it better translates to what we're using it for - high sensitivity means we adjust the weight more, low sensitivity means we adjust it less.

When we propagate forwards, it's fairly simple. Each Link in a layer takes the output from it's input Neuron, multiplies it by its weight, and inputs it to its output Neuron. Each Neuron sums all of its inputs, applies the activation function (the sigmoid function), and waits for the next Link to take the resulting value.

When we backpropagate, the logic is spread across the Neurons and Links. For the output layer, the sensitivity is just the error (the actual output minus the expected output). Each Link in the last layer takes the sensitivity from its output Neuron, and multiplies this by the sensitivity of the activation function - we call this the "output sensitivity". The reason we do this is because our neurons use their activation function when they propagate, so we need to find the sensitivity of that, and we multiply it by the sensitivity of the output Neuron to get the sensitivity of the Link. Perhaps a better name could be the "link sensitivity"? We'll call it the "output sensitivity" for now though.

We use the output sensitivity in two ways: when we update the weight, we multiply it by its input value (the larger our input value, the more we probably contributed to the error), and adjust the weight of this Link by that amount. We also need to send a sensitivity back to the input Neuron - we multiply the output sensitivity by our weight, so that the earlier layer knows how much this Link contributed to the error.

Whose fault is this?

Don't be surprised if this is confusing the first time your read through. I would love to find a way to simplify this further - the principle is as simple as asking "what part of the network is to blame for the error". The math is a bit frustrating, as we're ultimately multiplying partial derivatives along the length of the network, but the goal of it is to do lots of tiny adjustments until the network is a good-enough approximation of the underlying data.

What are we approximating though?

This is where it gets a little freaky. It turns out that everything - handwriting, faces, pictures of various different objects, they all have a mathematical approximation. With a big enough neural network, you can recognise people, animals, objects, and various different styles of handwriting and signwriting. Even in a board game like chess or go, you can make a mathematical approximation of which moves are better, to the point where you can have a computer that can play at a world class level.

This isn't the same as intelligence; the computer isn't thinking, but at the same time, what is thinking anyway? Is what we call "thinking" and "judging" just a mathematical approximation in our heads?


  1. Attend The Machine Learning Course Bangalore From ExcelR. Practical Machine Learning course Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course Bangalore.
    Machine Learning Course Bangalore

  2. ExcelR provides data scientist course in pune . It is a great platform for those who want to learn and become a data scientist. Students are tutored by professionals who have a degree in a particular topic. It is a great opportunity to learn and grow.

    data scientist course in pune

  3. Excellent post for the people who really need information for this science courses

  4. I’m happy I located this blog! From time to time, students want to cognitive the keys of productive literary essays composing. Your first-class knowledge about this good post can become a proper basis for such people. nice one
    best data science courses in hyderabad

  5. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end. I would like to read newer posts and to share my thoughts with you.
    data scientist courses

  6. Attend The Course in Data Analytics From ExcelR. Practical Course in Data Analytics Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Course in Data Analytics.
    Course in Data Analytics

  7. wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated.
    best data science courses in hyderabad

  8. Very informative blog
    Join 360digiTMG for best courses
    data science training in Pune

  9. hello sir,
    thanks for giving that type of information. I am really happy to visit your blog.Leading Solar company in Andhra Pradesh

  10. Highly appreciable regarding the uniqueness of the content. This perhaps makes the readers feels excited to get stick to the subject. Certainly, the learners would thank the blogger to come up with the innovative content which keeps the readers to be up to date to stand by the competition. Once again nice blog keep it up and keep sharing the content as always.
    Data Science Training

  11. I want to leave a little comment to support and wish you the best of luck.we wish you the best of luck in all your blogging enedevors.
    Data Analytics Courses in Bangalore

  12. I would like to thank you for the efforts you have made in writing this article. I am hoping the same best work from you in the future as well..
    data scientist training and placement in hyderabad

  13. Truly mind blowing blog went amazed with the subject they have developed the content. These kind of posts really helpful to gain the knowledge of unknown things which surely triggers to motivate and learn the new innovative contents. Hope you deliver the similar successive contents forthcoming as well.

    data science in bangalore

  14. ExcelR Data Scientist Course In Pune provides certification with top-notch faculty and gives 100% placement assistance to get a job in top MNC's. Post completion of the training, one should take an online examination facilitated by the university and should attain 60% or more to complete the course and gain the certification. Data Scientist Course

  15. My friend mentioned to me your blog, so I thought I’d read it for myself. Very interesting insights, will be back for more!
    data scientist training and placement in hyderabad

  16. This post is very simple to read and appreciate without leaving any details out. Great work!
    best data science course in pune

  17. This is a great article thanks for sharing this informative information. I will visit your blog regularly for some latest posts. I will visit your blog regularly for Some latest posts.
    data scientist course in hyderabad

  18. Thanks for posting the best information 😊 thanks for posting the best marketing institute in hyderabad

  19. Thanks for posting the best information and the blog is very important.artificial intelligence course in hyderabad

  20. I recently came across your article and have been reading along. I want to express my admiration of your writing skill and ability to make readers read from the beginning to the end.
    Python Classes in Pune

  21. Thanks for posting the best information and the blog is very science institutes in hyderabad

  22. I was actually browsing the internet for certain information, accidentally came across your blog found it to be very impressive. I am elated to go with the information you have provided on this blog, eventually, it helps the readers whoever goes through this blog. Hoping you continue the spirit to inspire the readers and amaze them with your fabulous content.

    Data Science Course in Faridabad

  23. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing. data scientist course in delhi


  24. I was basically inspecting through the web filtering for certain data and ran over your blog. I am flabbergasted by the data that you have on this blog. It shows how well you welcome this subject. Bookmarked this page, will return for extra. data science course in jaipur

  25. Amazing Articles ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.If you are Searching for info click on given link
    Data science course in pune