You are a Software Engineer. You notice Artificial Intelligence, Machine Learning, Deep Learning, Data Science buzzwords all around. You wonder what these phrases mean, whether all this is for real and useful or is yet another hype and passing fad.
You want to figure out how it is changing or will change the computer/IT industry, and why you should care, if at all. You google about it, you read various articles, blogs, and tutorials. You get some idea but are also overwhelmed by the enormous wealth of math, tools, frameworks you discover.
You wish if someone could give an overview, say, a map and compass suitable for an engineer, to help you embark on the journey of mastering it all. This blog post is for you.
Like you, I am a software engineer and have been through that journey. My experience is relevant, and you can use it for charting out your own course. Apart from programming, I also love to trek in the Himalayas. I see interesting parallels in my ML journey and high altitude treks. Through those parallels, I will explain how you can plan and go on an ML expedition.
A Trekking Expedition
A trek has the following four stages.
Irresistible Allure: I am drawn to a trek by hearing or reading about it. The peaks, trails, valley, and terrain. People gushing that it is a beautiful and challenging trek. It is an allure difficult to resist. But before I invest time, energy and resources, I ask: is the trail worth it? What is the experience like? Why should I go on it? I read people’s blogs, see photos, spend time on Google Earth. In the end, I either say “Naah!” or I am even more hooked into doing it.
Study the Terrain: That is when the real work starts. Every trek has its challenges. So I study the terrain on Google Maps and Google Earth. I spend time looking at elevation maps, reading blogs, understanding points of no returns, and possible escape routes. I want to know key geographical features, difficulties, and dangers.
Train Hard: Once it is clear what I am up to, I plan and prepare. I start training hard, make a detailed plan, and regularly assess my progress.
Check your Gears: Finally, I stock the supplies and check my gears, especially the map and compass. I ensure that the map is ingrained in my mind, and I know enough about the vicinity to have an intuitive sense of direction.
And then the adventure begins!
Let’s examine these four stages of the expedition to learn Machine Learning.
Like a vast number of software engineers, I am not formally trained in Machine Learning. For most of my working life, I built compilers, program analysis and programming tools, and IDEs; something very different from Machine Learning. But while working at Microsoft Research, I saw my colleagues applying Machine Learning and statistics to solve hard program analysis and software engineering problems. The success of these techniques allured me. But let’s examine: is it really worth your while or is all this buzz just a passing fad?
The four most common buzzwords are Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Data Science (DS). Let’s check out Google Trends
for these phrases, as the saying goes: In God we trust; others must provide data.
Since AlexNet in September 2012, there has been a significant increase in these four phrases. AlexNet winning ImageNet competition by a margin of almost 11% was a watershed moment. In research, improving the state of art by a few percentage points is a big deal, but doing it by over 10% is very rare.
In 2015, Microsoft Research CNN surpassed human-level vision performance. Considering focal adjustment and color cones bless human eyes with one of the best depth and color perception in the animal kingdom, beating human-level in vision tasks was astonishing!
At present, it is not just the trends data. In fact, consumers use ML several times every day: web search result ranking, email spam detection, Estimated Arrival Time (ETA) in maps, news story clustering around topics, ads on Google and Facebook, recommendation systems on sites like Amazon and Netflix, and Alexa and Google Assistant.
Businesses are also increasingly relying on ML for fraud detection, pricing/financial modeling, customer churn prediction, equipment failure prediction, network intrusion detection, customer segmentation, sentiment classification, image/video analysis, and speech/audio processing.
It has exploded into our daily lives because of two reasons:
- a lot of data is being generated, and
- the availability of economical on-demand cloud computing.
It is clear that ML is not just hype or passing fad. It is real, it is here and now. So the journey is worth it.
Study the Terrain
Once you have decided to go on this journey, it is important to study the terrain. Let’s start with understanding the four buzz phrases.
Artificial intelligence (AI): intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans.
Machine learning (ML): use of statistical models to perform a specific task without using explicit instructions, relying on patterns and inference instead.
Deep learning (DL): machine learning methods using artificial neural networks.
Data science (DS): a multidisciplinary field of building systems to extract knowledge and insights from (big amount of) structured and unstructured data. In my opinion, it is a fancy name for statistics + programming.
AI is the broadest term including statistical as well as other techniques. ML is a subset of AI, and DL is a subset of ML. Data Science overlaps with AI, ML, and DL, but has more emphasis on programming compared to the other three. Though that is changing. Data management and continuous deployment of ML/DL models are becoming mainstream. Companies are making significant investments in programming and engineering discipline.
Traditional Programs vs. Machine Learning
All these definitions are fine, but what is the difference between Machine Learning and traditional programs?
In traditional programming, a programmer designs logic or algorithm to solve a problem. The program applies this algorithm to input and computes the result.
But in machine learning, the programmer does not write the logic to compute results. Instead, she builds a model from the data. The model is logic. As newer data (user feedback on the correctness of output) arrives, the model (which is logic) changes too. So program “learns” on its own.
Machine Learning programs have two distinct phases:
- Training: Input and the expected output are used to train and test various models, and select the most suitable model.
- Inference: The model is applied to the input to compute results. These results are wrong sometimes. A mechanism is built the gather user feedback on such occasions.
This feedback is added to the training data, which in turn leads to improved models. This loop is called data-pipeline or data-engineering.
All this is still abstract. Let’s take the problem of detecting email spam, and compare traditional and machine learning solutions.
In Traditional Programming Solution, a programmer will analyze how a human will determine whether an email is spam, and enumerate an exhaustive list of rules and patterns. For example:
- the word FREE occurs several times
- there are phrases like Weight Loss,
- messages claiming you have won a lottery
- messages originating from specific countries or IP addresses, and so on.
As spammers change tactics, the programmer needs to continuously update these rules to keep up with. This is how Knowledge or Expert Systems were built in the past.
In Machine Learning Solutions, a programmer will:
- Prepare data set: a large number of emails labeled by humans as spam or no-spam
- Train, test and tune models, and select the best.
- During inference, this model is applied to determine whether to keep in an email in the inbox or the spam folder.
- Statistical models are not 100% accurate. Spammers too keep coming up with new tactics. So sometimes spam classification is incorrect. Users will move such emails from the inbox to the spam folder (or vice versa).
- Such user actions are tracked and are treated as new human-labeled data.
- These examples are added to the data set, and a new model is trained to remain up-to-date with the spam trends.
Machine Learning has three kinds of techniques:
Supervised Learning: Train a function that maps an input to output. Training infers the relationship in given labeled input-output pair examples (called training dataset). Two common methods are regression and classification.
Unsupervised Learning: Train to find previously unknown patterns in data set without pre-existing labels. Two common methods are clustering and principal component analysis (also known as dimensionality reduction).
Reinforcement Learning: Train software agents to take actions in an environment to maximize some notion of cumulative reward. Its applications are in robotics, games, skill learning and adaptation.
Let’s learn a bit more about the three most common techniques: regression, classification, and clustering.
Regression is a supervised learning technique to estimate the value of a dependent variable from one or more independent variables. An example is to estimate the value of a house from its size, location, number of bedrooms, number of bathrooms, etc.
It is like fitting a curve over the given points such that the difference between the estimated value and the actual value is minimized over a large sample data set. In the estimator function, Y = f(X), Y is called outcome, and X is called feature vector.
Classification is a supervised learning technique to identify class/group from the characteristics of an object. An example is to identify whether the vehicle in a given photo is of a car, or a truck, or a motorcycle.
It like drawing lines to divide a region into a number of regions (3 in this case, representing car, truck, motorcycle) such that the number of objects not falling in their region is minimum. In the classification function, Y = f(X), Y is called labels (and is a finite set, like an enum in programs), and X is called feature vector.
Clustering is an unsupervised learning technique to group objects into clusters of similar objects. In other words, objects in a cluster are more similar to each other than those in other clusters, as per given similarity criteria. An example is to cluster similar news articles based on topics.
The basic difference between classification and clustering is that in classification, the label-set is finite and given; but in clustering, the number and definition of the clusters are not known in advance and is inferred from the data, so label-set is neither finite nor given.
Now let’s try to map some of the problems listed in the beginning to one of these techniques:
- Web search result ranking/scoring: Regression
- Email spam detection: Classification
- ETA in maps: Regression
- Showing the ad that will maximize earning: Regression
- Recommendation systems: Clustering
I want to pause for a moment and emphasize that machine learning is not a magic bullet. It all depends on the data you use for the training because, in ML, the data is logic. If you are not careful in collecting and curating the data, ML predictions can be severely faulty. It is known as Garbage In, Garbage Out.
For supervised learning, it matters what you have trained your system for. For example, if you trained a classification system to distinguish between car, truck, and motorcycle, you can not use it to, say, a red car from a blue car. If your problem changes, you have to change labels in training data, and retrain the model.
Deep Learning is a subset of machine learning using Deep Neural Network (DNN) models. These models have one input layer, one output layer, and several intermediate hidden layers.
There are various specialized network designs suitable for different problems. Some examples are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) neural network. We will not get into details of various types of DNNs in this post.
DNNs solve the same problems: regression, classification, clustering, etc. But DNNs are computation-intensive (and thus expensive), these are used for the certain types of data.
Other machine learning techniques usually suffice for structured data. DNNs give better results on unstructured data. There are three kinds of unstructured data that DNNs are commonly used for:
Vision: To process images and video data. Common applications are object recognition in images, video summarization.
Natural Language: To process text in natural languages. Common applications are sentiment classification, intent recognition, entity recognition, machine translation.
Speech: To process voice audio data. Common applications are speech recognition (speech to text) and speech synthesis (text to speech).
Tools and Frameworks
Python is the most popular language among machine learning practitioners. There is a rich ecosystem of libraries and frameworks.
As you can see, machine learning is a vast terrain, a lot of ground to cover. And it is not easy. So you need to train hard.
The good news is that it is the best time in human history to learn even advance topics of computer science on your own. You just need motivation and a computer with a good internet connection.
There is no dearth of good articles, tutorials, and excellent and affordable online courses. I reiterate, if you are motivated to learn machine learning on your own, there wasn’t a better time ever.
Online Courses, Tutorials, Resources
I am listing a few courses that I am fond of, but there are so many other very good and useful online resources.
- Machine Learning Basics: Andrew Ng’s famous course
- Machine Learning for practitioners: DS/ML Python Bootcamp @ Udemy
- TensorFlow: Tutorials
- Deep Learning: Courses @ DeepLearning.ai
- Cloud: Google Cloud AutoML, Amazon SageMaker, Microsoft Azure ML
- Kaggle has a rich source of data sets and competitions.
Once you get started in ML, you might want to pick some of these books and strengthen your theory foundation. These are some of the best books from some of the most respected professors. These are available online, free of cost.
- The Elements of Statistical Learning, by Hastie, Tibshirani, Friedman
- An Introduction to Statistical Learning, by James, Witten, Hastie, Tibshirani
- Deep Learning, by Goodfellow, Bengio, Courville, Bach
- Pattern Recognition and Machine Learning, by Bishop
- Machine Learning Yearning, by Andrew Ng
- Speech and Language Processing, by Jurafsky, Martin
If you want to gather a quick glimpse for a specific problem, here is a list of articles to quickly start working on a specific technique or problem.
- An intro, tutorial, and complete guide to data manipulation with NumPy and Pandas
- Tips on graph plotting and data visualization with Matplotlib and Seaborn, and having fun doing Exploratory Data Analysis
- A hands-on intro to Scikit-Learn with examples
- A beginner’s guide to Linear Regression with Scikit-Learn
- Overview of Logistic Regression or Classification with Scikit-Learn
- Overview and explanation of Support Vector Machines (SVM) with Scikit-Learn and various kernels
- Introduction to Decision Tree and implementing and understanding Enchanted Random Forests with Scikit-Learn
- Intro, Tutorial, explanation, and implementation of Naive Bayes Classifier with Scikit-Learn
- An overview of Time Series data handling and forecasting with Scikit-Learn
- Understanding the Bias-Variance tradeoff
- Accuracy, Precision, Recall, and F1 score, which metrics to use for evaluating ML algorithms
- Ridge (L2) and Lasso (L1) Regularization with Scikit-Learn
- K Nearest Neighbors or k-NN classifier with Scikit-Learn, a tutorial
- K-Means Clustering explained with Scikit-Learn
- DBSCAN clustering with Scikit-Learn
- Principal Component Analysis (Dimensionality Reduction) with Scikit-Learn example
- A brief explanation of Building a Recommender Systems
Check Your Gears
As you begin your ML journey, it is time to check your gears and get a hang of the map and compass.
Beware of blind spots
Deterministic logic is ingrained in Software Engineers. But machine learning is statistical in nature. We need to learn to accept that the model will not work correctly on all inputs. Fixing it for a particular input will most likely degrade overall performance. Learning that was the biggest struggle I had.
You may be surprised that a unit test that has been passing, may start failing even without any code change. It can happen due to the random partitioning of training data into the train, validate, and test sets. Random partitioning can result in a slightly different model. And that model can coincidently fail on the input used in that unit test. You need to internalize the notion of statistical correctness.
You learned about the three most prominent machine learning techniques: linear regression, classification, and clustering. You know about neural networks and applications of deep learning. You also got a list of courses and books for in-depth knowledge. You also have a list of articles for most important topics and techniques. All of it together gives you an overview of the ML landscape. That is your map. You are ready to go on the journey of learning how ML models work.
Data scientists are strong in mathematics, they are expert in playing with data and designing efficient models. These might not be exactly your forte, but you need to slowly develop these capabilities. Without that, a model will remain a black box.
Do remember your strengths as a software engineer. You have strong programming skills. You are an expert in building highly scalable applications. You have mastered the continuous develop-test-deploy pipeline. You engineer systems that run 24x7 with automated monitoring and alerting.
These skills are not common in data scientists. They may not care about establishing correctness guarantees of an ML model in a 24x7 production system. They typically build batch programs for a given data set. It may require significant (re)work and engineering to take it to production. You have to adapt your engineer practices to the world of ML. You can bring that engineering discipline and rigor to ML.
Both data scientists and software engineers need to develop a better understanding of the counterpart. They must keep moving towards the top-right quadrant. That is your compass.
Key takeaways are:
- AI is real, the future is here. As Andrew Ng says: AI is the new electricity. Like electricity during the industrial revolution, AI will revolutionize one industry after another.
- It is only the beginning of AI adventure humanity is embarking upon.
- You can learn anything by training, start today!
Best of Luck, Bon Voyage !!!