# Machine Learning

**John Samuel**

CPE Lyon

**Year**: 2018-2019

**Email**: john(dot)samuel(at)cpe(dot)fr

- Machine Learning
- Deep Learning
- Artificial Intelligence

- Let
be the input feature space**X** - Let
be the output feature space (of labels)**Y** - The goal of classification algorithm (or classifier) is to find
{
*(x*}, i.e., assigning a known label to every input feature vector, where_{1}, y_{1}),...,(x_{l}, y_{k})*x*_{i}∈**X***y*_{i}∈**Y**- |
|**X***= l* - |
|**Y***= k* - l >= k

- Classifying Algorithm
- Two types of classifiers:
**Binary classifiers**assigning an object to any of two classes**Multiclass classifiers**assigning an object to one of several classes

- A linear function assigning a score to each possible category by combining the feature vector of an instance with a vector of weights, using a dot product.
- Formalization:
- Let
be the input feature space and**X****x**_{i}∈**X** - Let
be vector of weights for category**β**_{k}*k* *score(*, score for assigning category**x**_{i}, k) =**x**_{i}.**β**_{k}*k*to instance. The category that gives the highest score is assigned as the category of the instance.**x**_{i}

- Let

Let

*tp*: number of true postives*fp*: number of false postives*fn*: number of false negatives

Then

- Precision
*p = tp / (tp + fp)* - Recall
*r = tp / (tp + fn)* - F1-score
*f1 = 2 * ((p * r) / (p + r))* - F1-score: best value at 1 (perfect precision and recall) and worst at 0.

- Transformation to binary
- One-vs.-rest (One-vs.-all)
- One-vs.-one

- Extension from binary
- Neural networks
- k-nearest neighbours

- Algorithm for supervised learning of binary classifiers
- Binary classifier is a classifier which decides whether a given input belongs to a particular class or not
- Invented in 1958 by Frank Rosenblatt

- Let
*y = f(z)*be output of perceptron for an input vector*z* - Let
be the number of training examples**N** - Let
be the input feature space**X** - Let {
*(x*} be the_{1}, d_{1}),...,(x_{N}, d_{N})training examples, where**N***x*is the feature vector of_{i}*i*training example.^{th}*d*is the desired output value._{i}*x*be the_{j,i}*i*feature of^{th}*j*training example.^{th}*x*= 1_{j,0}

- Weights are represented in the following manner:
*w*is the_{i}*i*value of weight vector.^{th}*w*is the_{i}(t)*i*value of weight vector at a given time t.^{th}

- Initialize weights and threshold
- For each example
*(x*in training set_{j}, d_{j})- Calculate the weight:
*y*_{j}(t)=f[w(t).x_{j}] - Update the weights:
*w*_{i}(t + 1) = w_{i}(t) + r. (d_{j}-y_{j}(t))x_{j,i}

- Calculate the weight:
- Repeat step 2 until the iteration error
*1/s (Σ |d*is less than user-specified threshold._{j}- y_{j}(t)|) *s*is the sample size and*r*is the learning rate.

- Connections between the nodes do not form a cycle
- Information moves from the input nodes, through the hidden nodes (if any) and to the output nodes.
- Information moves in only one direction, forward

- computes the gradient of the loss function with respect to the weights of the network for a single input-output example.
- works by computing the gradient of the loss function with respect to each weight by the chain rule

- uses multiple layers to progressively extract higher level features from the raw input.

- Analysis of images
- Makes use of mathematical linear operation, convolution
- One input and one output layer
- Multiple hidden layers, consisting of convolutional layers

- https://en.wikipedia.org/wiki/Perceptron
- https://en.wikipedia.org/wiki/Multiclass_classification
- http://scikit-learn.org/stable/
- https://en.wikipedia.org/wiki/Multilayer_perceptron
- https://en.wikipedia.org/wiki/Feedforward_neural_network
- https://en.wikipedia.org/wiki/Recurrent_neural_network
- https://en.wikipedia.org/wiki/Long_short-term_memory
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html
- https://en.wikipedia.org/wiki/Activation_function