Week 4 Notes
Problem Set Review (Monday Sept 26th, 2022)
|
|
| Date |
09/26/2022 |
| Topic |
Problem Set Review |
| Professor |
Dr. John Bukowy |
| Week |
4 |
Evaluation Metrics for Classification
Confusion Matrix
A tabular format that shows a breakdown of a model's correct and incorrect classifications
- Represents counts for all combinations of true and predicted labels
- Each row corresponds to the true label
- Each column corresponds to a predicted label
from sklearn.metrics import confusion_matrix
Metrics
Accuracy = correct predictions/total predictions
- What is a good accuracy?
- 1/number of classes is random
- Accuracy doesn't discriminate between errors
- Recall:
- The proportion of positive examples that are predicted as postive
- reacall = tp/total positives
- Precision:
- The proportion of predicted positive examples that are correct
- TP / TP + FP
- F1 - Score
- It is possible to have high recall and low precision and vice versa
- (2 precision * recall) / (precision + recall)
ROC (Receiver-Operator Characteristics)
- ROC curves tell us evaluate the trade off between true positive rate (recall, sensitivity) and false positive rate (1 - specifically)
- Good classifiers have lines near the upper left
- A random classifier produces a diagonal line
- ROC curves can help us choose a threshold other than the default 0.5
Decision Trees (Tuesday Sept 27th, 2022)
|
|
| Date |
09/27/2022 |
| Topic |
Decision Trees |
| Professor |
Dr. John Bukowy |
| Week |
4 |
- Decision Tree consists of a series of decision rules (yes/no questions), represented in a hierarchical or flowchart structure.
- Unlike KNN, a decision tree does not store the entire training set
- Non-linear model that can be used for regression and classification
- Organizes classification/regression rules into a flowchart
- Once the structure is learned, we can predict the label of unlabeled data sample by:
- Asking the learned questions and getting the answers from the features of the given data sample
- True is to the left, False is to the right
Terminology
- Yes/no question of internal nodes represents a split on a single feature
- For numerical feature, the splits are represented as inequalities
Prediction
- It partitions the feature space into non overlapping regions (of rectangle/cuboid shape).
- The leaves of the tree correspond to those regions
Optimal Tree
-
Minimizes the expected number of tests required to identify the unknown object
-
Has the smallest depth possible
-
Deep = Overfitting
-
Shallow = Underfitting
-
Finding an optimal tree with the minimum number of feature splits is not always computationally tractable
Greedy Strategy - CART
- Consists of a top-down greedy approach known as recursive binary splitting:
- Begins at the root or top of the tree, finds the best split at the root
- Then successively repeats the same steps for each node
Choosing
- Choosing the pair of features and its corresponding threshold is done using:
- Classification - an impurity function
- Regression - an error metric (RMSE)
When to stop?
- Node contains only one class
- Node contains less than x data points
- Max depth is reached
- Node purity is sufficient
Computational Complexity -FYI
- n: number of training samples
- m: number of features
- Training phas: O(m*nlogn)
Tree Structure
- We stop to split a region once we reach a stopping criterion
- Not constrained? continue splitting until we reach pure regions
Feature scaling?
- NOT sensitive to feature scaling, no difference.
Advantages
- Interpretability
- Extends easily to multi-class problem
- Requires little pre-processing
- Robust to irrelevant features
- Handles interaction between features
Disadvantages
- Prone to overfitting -> solution: Random Forest
- Sensitive to small changes in the dataset
- Limited predictive power -> Boosting Trees
Logistic Regression (Friday Sept 30th, 2022)
|
|
| Date |
09/30/2022 |
| Topic |
Logistic Regression |
| Professor |
Dr. John Bukowy |
| Week |
4 |
-
For classification
-
It's a linear model for binary classification that is easy to implement
-
One of the most widely used machine learning model
-
Logistic regression models the class probability: the probability that an instance belongs to class 1 or 0
-
Logistic regression maps the weighted sum to a certainty measure:
- if xtW + w0 is positive and very big, let's map it to one and vice versa.
-
Sigmoid
- Compresses the range -infinity to infinity to the range 0 to 1
- The output of the sigmoid function is treated as certainty measure or probability
return 1 / (1 + np.exp(-x))
- P(y = 1 | x)
-
Assumes that there are two outputs
-
Outputs a probability
-
Threshold this probability (0.5)
Training Phase: How to find the weights?
[P(y = 1 | x)]y[1 - P(y = 1|x)]1-y