Week 1 Notes


Intro to machine learning (Tuesday Sept 6th, 2022)

Date 09/06/2022
Topic Introduction
Professor Dr. John Bukowy
Week 1

What is machine learning? How is it different than traditional programming?

Types of machine learning problems

Goal is to make decisions

Terminology

Continued Lecture (Wednesday Sept 7th, 2022)

Date 09/07/2022
Topic Lab 1
Professor Dr. John Bukowy
Week 1

Supervised learning

Classification

Predict a category

Regression

Predict a continuous value (i.e. price of a house)

Link to article/image

Unsupervised Learning


Terms


Features

Independent variable, predictor, attribute

Label

Dependent variable, response, target

Sample

Instance, observation, record, example


Steps of building machine learning systems

  1. Data Preprocessing
  1. Training
  1. Evaluation
  2. Prediction

Data Preprocessing

Training


Models


Regression

Classification

Evaluation Metrics

Tools we will use in this class

Python - NumPy and Scikit-Learn

Data Structuring & Notations

Date 09/09/2022
Topic Data Structuring
Professor Dr. John Bukowy
Week 1

Tabular Data

Semi-Structured Data

Unstructured Data

Data format for classic ML models


-> categories

-> categories -> Rank order

-> categories -> Rank order -> Equal spacing

-> categories -> Rank order -> Equal spacing -> Ratio


Feature vector, feature matrix, label vector

Feature vector

x = [[x1], [x2], [...], [xm]]

Feature matrix

X = [xT, xT, xT]

Label Vector

y, y(hat) is a prediction

Feature Space

the set of all feature vectors. A feature vector can be depicted as a point in the feature space.

What is array reshaping? When is it useful?

Tooling (Numpy and Scipy)

NumPy Array Data structure

Numpy Basics

array = np.array([], dtype=np.float32)

array = np.zeros(4, dype=np.int32)

array = np.ones((4, 5))

# number of dimensions
array.ndim
# shape
array.shape
# data type
array.dtype

# indexing
array[0]
array[1, 2]
array[1, 2, 3, 4, 5]

# Selecting a single dimension
array[:, 5, :]

# Reversing a 1D array

# returns array with elements at indices in list
subset = array[[1, 8, 4, 5, 2]]
subset = array[another_array]

# boolean mask
subset = array[[True, False, False, True, True]]

# returns a boolean array
mask = array == 1
selected = array[mask]

# Sorting
sorted = np.sort(array)

# Maximum
max = np.amax(array)

# Minimum
min = np.amin(array)

sorted_idx = np.argsort(array)
# smallest element of the array
array[sorted_idx[0]]
# largest element of array
array[sorted_idx[-1]]

# reshape arrays
images.reshape((3, 4))

X1 = np.hstack([X1, X2])