
So the cool kids keep talking about machine learning and how it is changing everything and you want to be part of that conversation? No problem, here is a roadmap to get you started…
First of all, learn python programming. In my opinion everyone should know a little programming but you really can’t make progress in ML unless you can program. You don’t need to be at the level where you can make a living from it, you only need the basics to get started in ML. If you are already a programmer but in a different language then great. You can do ML in pretty much any language now, but if you are starting out and want to build your own models then python is the right place to begin. Most of the hard stuff is already done for you and it is simply a case of downloading the right libraries. Most children, at least in the UK, learn python at school. It is probably harder to learn if you already program in different languages but if you are new to development it is probably the easiest to pick up. Many of the libraries you need for python are also available for R so if you are a statistician looking to get into ML then R might be an easier jump for you. Most data scientists tend to work in python and then development engineering teams come along and implement things in either C++ or a JVM language such as Scala both for robustness and performance.
Before you rush off and start learning python you should consider what environment you need to be able to enter and run your code. My personal favourite is PyCharm from Jetbrains. they have both a paid and a free version. The free version gives you everything you need to get started and runs on all the usual platforms. Another option would be to use web based tool such as Jupyter and its variants. The real advantage here is that you can use hosted environments and everything tends to be set-up for you. If you want to run your own then take a look at Anaconda.
Once you feel confident in python then it is time to start getting familiar with some of the prerequisite libraries. Don’t worry, these are normally prey easy to install – you can just use a python library installer such as pip. Your IDE should also be able to download and install libraries for you. The first two libraries you need to get are pandas and numpy. Pandas provides some great data frame like data structures which make it easy to load your data and manipulate. The great thing about a data frame is that it is like a table. you can have different data types in each column and each row represents a single observation or entry. You can look at your data, take samples, sort it and summarise it in all sorts of useful ways. Even if you don’t want to go all the way to build machine learning models I’d say that pandas is a pretty useful tool to know.
Whilst pandas gives you very friendly data frames, for calculations at speed you tend to want to work with just numbers, and all in the same precision. This is a lot less friendly but much easier in a computationally intense environment such as machine learning. Here we need numpy (or something based upon numpy). Most of the data science libraries you will need will require numpy as a prerequisite. Numpy is great for matrix maths or for tensors – which you will start to hear a lot about from those cool ML kids… Don’t worry if numpy seem pretty opaque to you, most of the libraries use it and mask the complexity so knowing just a little bit of numpy is often enough.
Once you can load data into pandas from csv files and then convert numeric data to numpy arrays you are ready to start looking and ML libraries. The first one you schooled take a look at is SciKitLearn. This comes with lots of useful components to help prepare your data and many standard ML algorithms to help you do things like categorisation or linear regression. At this point you are ready to start doing some conventional machine learning and can probably produce some pretty good results by following some of the many examples available on the Web. Start out with a regression problem, like predicting the next value in a trend but beware that predicting the stock-market will be unrewarding for you if only because it works by lots of people making irrational decisions all day long. Not a good subject for ML!
For the bold or the foolish amongst you the next step is to take a look at deep learning and here there are two libraries you should consider installing, pytorch and tensorflow. Both of these overlap so you can probably just pick one based upon the examples you are learning from. Be warned, tensorflow can be pretty unpleasant to install if you want to get it working with your GPU and why wouldn’t you. Don’t get confused with TensorflowRT – This is not for building models but for evaluating them on the edge, in smaller environments. If you want to train your own deep learning models – perhaps you want to build an early warning system for people sending you cat pictures, then you need the full tensorflow.
Start with a simple ANN (Artificial Neural Network) then take a look at CNNs (Convolutional Neural Networks). Both of these should be well within your grasp even without a GPU. Once you get into bigger, more complex models you are going to need some help. If you happen to be a gamer then you are probably all set up – though you will find a Nvidia graphics card in your rig easier to deal with and giving you more flexibility than some of the other options. If you don’t have anything like that don’t worry – the major Cloud Infrastructure providers will provide GPU enabled environment which are often free at the lower tiers. If you are just learning, then not having a ML workstation under your desk needn’t hold you back.
There are plenty of tutorials to get you up-to-speed and most either come with their own datasets or use standard, open source datasets for you to practise with. I’d estimate that if you spend an hour per day you could be building Deep Learning models by the end of the week, assuming you know a little python to get you started.
So, what are you waiting for?
Here are some useful links to get you started:
Learn Python with W3Schools: https://www.w3schools.com/python/
SciKit-Learn library: https://scikit-learn.org/stable/
Tensorflow: https://www.tensorflow.org