In machine Learning, we use vectors and matrices to organize data: see them as data containers/structures
Vectors
A vector is an array of numerical values. It collects an ordered list of numbers to represent one entity or one mathematcial object.
x.shape
> (3,)
# There is a distinction in numpy with dot product and such
x.shape
> (3, 1)
A vector can be interpreted geometrically as: a point in space or geometric vector (an object with direction and magnitude).
Examples of 2D points
Latitude and longitude form a 2D coordinate system or space
Examples of 3D points
Each point is the client of an advertising company
Geometric Vector
A m-vector can be interpreted as an object that has magnitude and direction
It can be depicted as a directed line segment
Vectors
In machine learning, when we want to interpret a model geometrically, we might treat a given vector as a point or as a geometric vector
Dot product
Goes from two vectors to a scaler
Euclidean Distance
The the distance between two vectors u and v is the norm of their difference
Dot Products and Vector Norms
u . v = ||u|| ||v|| cos(a)
- if a > 90 then u . v < 0
- if a == 90 then u . v == 0
- if a < 90 then u . v > 0
- if a == 0 then u . v == ||u|| ||v|| and they are parallel
- if a == 180 then u . v == -||u|| ||v|| and they are anti-parallel
Matrices
Ordered collection of vectors
Used to store and perform operations on groups of vectors in one operation
Matrix-Vector Multiplication
Ab = c
Matrix-Matrix Multiplication
AB = C
Modelling (Tuesday Sept 13th, 2022)
Date
09/13/2022
Topic
Modelling
Professor
Dr. John Bukowy
Week
2
What is a model?
What is a linear model?
What is a linear model for regression and classification?
Advantages and disadvantages of a linear model?
(Mathematical) Models
A tool used to mathematically represent some observed phenomena
Uses relationships (math) to:
explain a system
encode system behaviior
make predications about behavior
A model consists of finding a relationship or mapping between an input variable and output variable.
"no model is correct, but some are useful" - Bokowy
Models are abstractions and simplifications of the real world
Models Examples
Modeling the distribution of some variables/measurements (what values are possible and how frequent they occur):
One of the first applications of the normal distribution: analysis of errors of measurement made in astronomical observations.
"Galileo in the 17th centry noted that these errors were symmetric and that small errors occurred more frequently than large errors."
Models: Input-Output Relationship
One way to mathematically represent a model is through an input-output relationship
Machine Learning Models
Supervised machine learning:
Input?
Feature vector
Output?
Label vector
Purpose and goal of training the model, to find the relationship between the feature matrix and the known output
Once a final machine learning model is learned. The model is evaluated and then used for future predictions.
Machine learning models consist of linear and non-linear models
We now discuss linear models, and what a linear model means for regression, and what it means for classification.
What do linear models do?
It allows us to convert from a higher dimensional space to a scalar value.
It compresses the information contained in each feature in one scalar.
How do we find the weights
We find them by training/fitting the linear model on the training data set: the weights are the parameters of the linear models
The goal of the training phase is to learn the weights from the training data: pairs of data features and labels.
Linear Regression
y = w1x1 + w0
We're interested in predicting y from x1.
Geometric Interpretation: a simple linear regression model fits the observed training data with a line
Multiple linear Regression
More than one feature
We're interested in predicting y from x1 and x2
Linear Classification (Binary Classification)
Linear model with a threshold
This means that the linear model finds the hyperplane that divides the feature space (space of all possible feature vectors) into two half-spaces (two decision regions).
They assume a linear relationship between the features of a sample and the label.
Linear classification models (like linear SVM and logistic regression) divides the feature space into two half-spaces: each half-space is a decision region.
The decision regions are separated by a hyperplane.
Advantages/Disadvantages of Linear Models
Advantages
Simple
When used for prediction, that are computationally fast. Since they summarize data with a finite set of weights, storing trained linear models only requires storing the weights.
Linear models are interpretable: the weights help us understand the contribution of each feature to the overall prediction.
Disadvantages
Linear models have limitations. They can be too simple
Not linearly separable.
Other models are said to be more complex or flexible
Extending linear models
Linear models can be extended to accommodate non-linear relationships.
The very simple way: extend the feature space by adding polynomial features to the existing features in the data.
instead of fitting the line: mpg = w0 + w1 x horsepower
fit: mpg = w0 + w1 horsepower + w2 x horsepower^2
Engineer another feature to extend the linear model into a higher dimension
There are other methods to accommodate for non-linear relationships:
Kernel trick with SVM
Spline regression
Non-Linear Models
K-nearest neighbor
Decision trees
Random forest
Boosted trees
KNN and Tree Models
KNN and Trees are more complex models
KNN and Trees have extra settings (hyperparameters) that can be changed to change the complexity of the model itself.
How does the complexity of KNN change the changing K?
Decision boundaries
Any classification model divides the feature space into separate decision regions that can be used to classify unlabeled data points
With linear models, a hyperplane is what divides these regions
With non-linear models, a more flexible mathematical object is what separates the classes
Feature Scaling
Certain features might 'dominate' other features by having a greater range and therefore a greater weight
KNN are sensitive to scaling, while trees are not sensitive to scaling
Non-linear
more complex
higher chance of overfitting a model
might require more memory usage
Do we need non-linear models?
It depends on the data set.
For high dimensional data (number of features much greater than the number of rows), we might need to stick with simpler models).