Start free trial Sign in

From the course: Machine Learning with Python: Foundations

What is supervised learning? - Python Tutorial

From the course: Machine Learning with Python: Foundations

What is supervised learning?

“

- [Instructor] Supervised machine learning is a process of training a predictive model. Predictive models are machine learning models that enable us to assign a label to unlabeled data based on patterns learned from previously labeled historical data. If we want to predict the outcome of a new event, we can use a predictive model that has been trained on similar or related events to predict the outcome. To illustrate how supervised learning works, let's assume that we work in the analytics department of a local credit union. Our task is to develop a machine learning model that predicts loan risk. Specifically, we would like to build a model that predicts whether a particular customer will or will not default on a loan. Let's also assume that we already have two kinds of information about the loans our bank has previously issued. The first is descriptive data about each loan, such as the loan amount, the grade of the loan, the annual salary of the borrower, the purpose for the loan and so forth. The second type of information we have is the outcome of each previously issued loan. The outcome data is a label that tells us whether the borrower paid back the loan in full or whether the borrower defaulted on the loan. Before we can use a supervised machine learning model to predict the outcome of a new loan, we first have to train the model using historical loan data. In machine learning, we call the input the independent variables and we call the output the dependent variable. The independent variables and dependent variable make up what is known as a training data. If our training data consists of 10 previously issued loans by our credit union, then the independent variables are the loan amount, the grade of the loan and the stated purpose for the loan, while the dependent variable is outcome variable, default. The default variable has two levels or values. They are yes, which means the borrower failed to pay back the loan in full, and no, which means that the borrower paid the loan back in full. To train a model, we provide it with three independent variables and we provide it with the dependent variable or outcome as well. With these two sets of values, the machine learns the patterns in the data and builds a set of instructions that connect the input to the output. This set of instructions represent the trained model. After a model has been trained, we can evaluate how well its instructions explain the relationship between the independent variables and the dependent variable. One way to do this is to provide the trained model with just the input in order to see what output values it will predict. By comparing the predicted outcomes with the actual outcomes, we can score the performance of the model based on how many of them match. We call this the predictive accuracy of the model. The higher the score, the better the model is. And the lower the score, the worse the model is. One of the most popular definitions of supervised machine learning is that provided by Tom Mitchell. According to Mitchell, "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance tasks in T, as measured by P, improves with experience E." This definition presents three components to machine learning: experience E, class of tasks T, and performance measure P. In our loan outcomes example, the experience is a historical loan data that we use to train the model. The task is to predict who will or will not default. And the performance measure is predictive accuracy, which is measured by how well the predicted and actual outcomes match. We can reword the supervised machine learning definition as a loan prediction model is said to learn if its ability to predict which borrowers would default on the loan T, as measured by predictive accuracy P, improves as it encounters more training data E.

Contents