Week 4 Notes

Problem Set Review (Monday Sept 26th, 2022)

Confusion Matrix

A tabular format that shows a breakdown of a model's correct and incorrect classifications

from sklearn.metrics import confusion_matrix

Accuracy = correct predictions/total predictions

What is a good accuracy?
- 1/number of classes is random
Accuracy doesn't discriminate between errors
Recall:
- The proportion of positive examples that are predicted as postive
- reacall = tp/total positives
Precision:
- The proportion of predicted positive examples that are correct
- TP / TP + FP
F1 - Score
- It is possible to have high recall and low precision and vice versa
- (2 precision * recall) / (precision + recall)

ROC curves tell us evaluate the trade off between true positive rate (recall, sensitivity) and false positive rate (1 - specifically)
Good classifiers have lines near the upper left
A random classifier produces a diagonal line
ROC curves can help us choose a threshold other than the default 0.5

Decision Tree consists of a series of decision rules (yes/no questions), represented in a hierarchical or flowchart structure.
Unlike KNN, a decision tree does not store the entire training set
Non-linear model that can be used for regression and classification
Organizes classification/regression rules into a flowchart
Once the structure is learned, we can predict the label of unlabeled data sample by:
- Asking the learned questions and getting the answers from the features of the given data sample
True is to the left, False is to the right

It partitions the feature space into non overlapping regions (of rectangle/cuboid shape).
The leaves of the tree correspond to those regions

Minimizes the expected number of tests required to identify the unknown object
Has the smallest depth possible
Deep = Overfitting
Shallow = Underfitting
Finding an optimal tree with the minimum number of feature splits is not always computationally tractable

Consists of a top-down greedy approach known as recursive binary splitting:
- Begins at the root or top of the tree, finds the best split at the root
- Then successively repeats the same steps for each node

Choosing the pair of features and its corresponding threshold is done using:
- Classification - an impurity function
- Regression - an error metric (RMSE)

For classification
It's a linear model for binary classification that is easy to implement
One of the most widely used machine learning model
Logistic regression models the class probability: the probability that an instance belongs to class 1 or 0
Logistic regression maps the weighted sum to a certainty measure:
- if xtW + w0 is positive and very big, let's map it to one and vice versa.
Sigmoid
- Compresses the range -infinity to infinity to the range 0 to 1
- The output of the sigmoid function is treated as certainty measure or probability return 1 / (1 + np.exp(-x))
- P(y = 1 | x)
Assumes that there are two outputs
Outputs a probability
Threshold this probability (0.5)

[P(y = 1 | x)]y[1 - P(y = 1|x)]1-y