Week 2 Notes

Vectors & Matrices (Monday Sept 12th, 2022)


Date	09/12/2022
Topic	Vectors & Matrices
Professor	Dr. John Bukowy
Week	2

Vectors in Machine Learning

In machine Learning, we use vectors and matrices to organize data: see them as data containers/structures

Vectors

A vector is an array of numerical values. It collects an ordered list of numbers to represent one entity or one mathematcial object.

x.shape
> (3,)

# There is a distinction in numpy with dot product and such
x.shape
> (3, 1)

A vector can be interpreted geometrically as: a point in space or geometric vector (an object with direction and magnitude).

Examples of 2D points

Latitude and longitude form a 2D coordinate system or space

Examples of 3D points

Each point is the client of an advertising company

Geometric Vector

A m-vector can be interpreted as an object that has magnitude and direction
It can be depicted as a directed line segment

Vectors

In machine learning, when we want to interpret a model geometrically, we might treat a given vector as a point or as a geometric vector

Dot product

Goes from two vectors to a scaler

Euclidean Distance

The the distance between two vectors u and v is the norm of their difference

Dot Products and Vector Norms

u . v = ||u|| ||v|| cos(a)

- if a > 90 then u . v < 0
- if a == 90 then u . v == 0
- if a < 90 then u . v > 0
- if a == 0 then u . v == ||u|| ||v|| and they are parallel
- if a == 180 then u . v == -||u|| ||v|| and they are anti-parallel

Matrices

Ordered collection of vectors
Used to store and perform operations on groups of vectors in one operation

Matrix-Vector Multiplication

Ab = c

Matrix-Matrix Multiplication

AB = C

Modelling (Tuesday Sept 13th, 2022)


Date	09/13/2022
Topic	Modelling
Professor	Dr. John Bukowy
Week	2

What is a model?
What is a linear model?
What is a linear model for regression and classification?
Advantages and disadvantages of a linear model?

(Mathematical) Models

A tool used to mathematically represent some observed phenomena
Uses relationships (math) to:
- explain a system
- encode system behaviior
- make predications about behavior
A model consists of finding a relationship or mapping between an input variable and output variable.

"no model is correct, but some are useful" - Bokowy

Models are abstractions and simplifications of the real world

Models Examples

Modeling the distribution of some variables/measurements (what values are possible and how frequent they occur):
- One of the first applications of the normal distribution: analysis of errors of measurement made in astronomical observations.
- "Galileo in the 17th centry noted that these errors were symmetric and that small errors occurred more frequently than large errors."

Models: Input-Output Relationship

One way to mathematically represent a model is through an input-output relationship

Machine Learning Models

Supervised machine learning:
- Input?
  - Feature vector
- Output?
  - Label vector
Purpose and goal of training the model, to find the relationship between the feature matrix and the known output
Once a final machine learning model is learned. The model is evaluated and then used for future predictions.
Machine learning models consist of linear and non-linear models
We now discuss linear models, and what a linear model means for regression, and what it means for classification.

What do linear models do?

It allows us to convert from a higher dimensional space to a scalar value. It compresses the information contained in each feature in one scalar.

How do we find the weights

We find them by training/fitting the linear model on the training data set: the weights are the parameters of the linear models
The goal of the training phase is to learn the weights from the training data: pairs of data features and labels.

Linear Regression

y = w1x1 + w0

We're interested in predicting y from x1.
Geometric Interpretation: a simple linear regression model fits the observed training data with a line

Multiple linear Regression

More than one feature
We're interested in predicting y from x1 and x2

Linear Classification (Binary Classification)

Linear model with a threshold
This means that the linear model finds the hyperplane that divides the feature space (space of all possible feature vectors) into two half-spaces (two decision regions).

Non-Linear Models & Decision Boundaries (Friday Sept 16th, 2022)


Date	09/16/2022
Topic	Non-Linear Models & Decision Boundaries
Professor	Dr. John Bukowy
Week	2

Linear Models

They assume a linear relationship between the features of a sample and the label.
Linear classification models (like linear SVM and logistic regression) divides the feature space into two half-spaces: each half-space is a decision region.
The decision regions are separated by a hyperplane.

Advantages/Disadvantages of Linear Models

Advantages

Simple
When used for prediction, that are computationally fast. Since they summarize data with a finite set of weights, storing trained linear models only requires storing the weights.
Linear models are interpretable: the weights help us understand the contribution of each feature to the overall prediction.

Disadvantages

Linear models have limitations. They can be too simple
Not linearly separable.
Other models are said to be more complex or flexible

Extending linear models

Linear models can be extended to accommodate non-linear relationships.
The very simple way: extend the feature space by adding polynomial features to the existing features in the data.
instead of fitting the line: mpg = w0 + w1 x horsepower
fit: mpg = w0 + w1 horsepower + w2 x horsepower^2
Engineer another feature to extend the linear model into a higher dimension
There are other methods to accommodate for non-linear relationships:
- Kernel trick with SVM
- Spline regression

Non-Linear Models

K-nearest neighbor
Decision trees
Random forest
Boosted trees

KNN and Tree Models

KNN and Trees are more complex models
KNN and Trees have extra settings (hyperparameters) that can be changed to change the complexity of the model itself.
How does the complexity of KNN change the changing K?

Decision boundaries

Any classification model divides the feature space into separate decision regions that can be used to classify unlabeled data points
With linear models, a hyperplane is what divides these regions
With non-linear models, a more flexible mathematical object is what separates the classes

Feature Scaling

Certain features might 'dominate' other features by having a greater range and therefore a greater weight
KNN are sensitive to scaling, while trees are not sensitive to scaling

Non-linear

more complex
higher chance of overfitting a model
might require more memory usage

Do we need non-linear models?

It depends on the data set.
For high dimensional data (number of features much greater than the number of rows), we might need to stick with simpler models).