Week 3 Notes


Problem Set Review (Monday Sept 19th, 2022)

Date 09/19/2022
Topic Problem Set Review
Professor Dr. John Bukowy
Week 3

No content. Went over problem set.


Model Evaluation (Tuesday Sept 20th, 2022)

Date 09/20/2022
Topic Problem Set Review
Professor Dr. John Bukowy
Week 3

No-Free-Lunch theorem

The "No Free Lunch" Theorem argues that, without having prior knowledge, there is no single model that will always do better than any other model.

Model Evaluation

Feature vector with known label (y_true) -> Learned Model -> Predicted Label (output) y_pred = f(x)

Evaluation Metrics

Model Evaluation - Final Model

Train-Test Data Splitting

Random Sampling without Replacement

Stratification Followed by Random Sampling without Replacement

Code

No Stratification

from sklearn.model_selection import train_test_split

train_X, test_X, train_y, test_y = train_test_split(X, y)

With Stratification

from sklearn.model_selection import train_test_split

train_X, test_X, train_y, test_y = train_test_split(X, y, stratify = y)

Validation Set

Cross-validation in SciKit-Learn

sklearn.model_selection.cross_val_score
sklearn.model_selection.cross_val_predict
sklearn.model_selection.cross_validate

Underfitting, Overfitting

Possible Scenarios

A Model's error can be expressed as the sum of three different errors:

Bias:

Variance

How do we know?

KNN (Friday Sept 20th, 2022)

Date 09/23/2022
Topic KNN
Professor Dr. John Bukowy
Week 3

(Online Resource Notes)

KNN Algorithm Flow (Prediction Phase)

  1. Calculate Distances
  2. Sort
  3. Choos k closest neighbors
  4. Grab Labels for Neighbors
  5. Aggregate -> Predicted query label

Time complexity

Complexity of the model

Choice of distance metric

Importance of feature scaling

Curse of dimensionality

When the nearest neighbors in high-dimensional feature space might not be close enough, so they might not be similar enough to have a similar label.

Summary

Advantages

Disadvantages

(Video Lecture)

Terminology

training set data

datum we want to classify

points near each other in some space

Aggregation