Introduction to Data Mining: John Samuel

NOTE: Article in Progress

From data to insights

1. Lifecycle of data

Lifecycle of data: data, knowledge, insights, action

Data to knowledge

Data acquisition
Data Extraction
Data Cleaning
Data Transformation
ETL
Data analysis modeling
Data Storage
Analysis
Visualisation

Knowledge to Insights

Knowledge to Knowledge

Insights to Action

Insights to Knowledge

Action to Data

Data analysis

Data visualisation

finding known and unknown patterns from data

ETL (Extraction Transformation and Loading)

Data Extraction
Data Cleaning
Data Transformation
Loading data to information stores

2. Data Acquistion and Storage

2.1. Data acquisition

Surveys

Manual surveys
Online surveys

Sensors¹

Temperature, pressure, humidity, rainfall
Acoustic, navigation
Proximity, presence sensors

Social networks
Video surveillance cameras
Web

https://en.wikipedia.org/wiki/List_of_sensors

2.2. Data storage formats

Binary and Textual Files
CSV/TSV
XML
JSON
Media (Images/Audio/Video)

2.3 Types of data stores

Structured data stores
- Relational databases
- Object-oriented databases
Unstructured data stores
- Filesystems
- Content-management systems
- Document collections
Semi-structured data stores
- Filesystems
- NoSQL data stores

Unstructured vs. Structured vs. Semi-structured

NoSQL versus SQL^{5: no strict schemas and no horizontal scaling for NoSQL
data stores.}

2.3.1. ACID Transactions¹

Atomicity: Each transaction must be "all or nothing".
Consistency: Any transaction must bring database from one valid state to another.
Isolation: Both concurrent execution and sequential execution of transactions must bring the database to same state.
Durability: Irrespective of power losses, crashes, a transaction once committed to the database must remain in that state.

https://en.wikipedia.org/wiki/ACID

Ensure validity of databases even in case of errors, power failures
Important in banking sector

2.3.2. Types of data stores

Relational databases
Object-oriented databases
NoSQL (Not only SQL) data stores
NewSQL

2.3.3. NoSQL

Comprises consistency
Focus on availability and speed

2.3.4. Types of NoSQL stores

Column-oriented database
Document-oriented database
Key-value database
Graph-oriented database

3. Data Extraction and Integration

3.1. Data extraction techniques

Data dumps
- Downloading complete data dumps
- Downloading selective data dumps
Periodical polling of data feeds (e.g., blogs, news feeds)
Data streams
- Subscrbing to data streams (push notifications)

3.2. Query interfaces

Query endpoints supporting declarative languages
- SQL
- SPARQL
Automated Manual search (and filter) options

3.3. Crawlers for web pages

Web crawlers: navigating the entire using hyperlinks

3.4. Application Programming Interface (API)

Web operations (CRUD) to manipulate externally managed resources
Requires programmers to develop wrappers for web service integration

4. Pre-treatement of Data

4.1 Data Cleaning: Types of Errors

Syntactical errors
Semantical errors
Data coverage errors

4.1.1. Syntactical errors

Lexical errors (e.g., user entered a string instead of a number)
Data format errors (e.g, order of last name, first name)
Irregular data errors (e.g., usage of different metrics)

4.1.2. Semantic errors

Violation of integrity constraints
Contradiction
Duplication
Invalid data (unable to detect despite presence of triggers and integrity constraints)

4.1.3. Coverage errors

Missing values
Missing data

4.2. Data Cleaning: Handling Errors

4.2.1. Handling Syntactical errors

Validation using schema (e.g., XSD, JSONP)
Data transformation

4.2.2. Handling Semantic errors

Duplicate elimination using techniques like specifying integrity constraints like functional dependencies

4.2.3. Handling Coverage errors

Interpolation techniques
External data sources

4.2.4. Administrators and handling errors

User feedback
Alerts and triggers

5. Data Transformation

Languages

Template languages
XSLT
AWK
Sed
Programming languages like PERL

6. ETL

6.1. ETL (Extraction Transformation and Loading)

Data Extraction
Data Cleaning
Data Transformation
Loading data to information stores

6.2.1. Models for data analysis

Multidimensional data analysis
Dimensions
- Attributes
- Levels
- Hierarchies
Facts
- Measures

Multidimensional data analysis: Examples

Dimensions (e.g.Spatio-temporal dimensions, Product)
- Attributes (e.g. Name, Manufactures etc.)
- Levels (e.g., Day, Month, Quarter, Store, City, Country etc.)
- Hierarchies (e.g., Day-Month-Quarter-Year, Store-City-Country etc.)
Facts
- Measures (e.g., Number of products sold/unsold)

6.2.2. Star Schema

6.2.3. Data Cubes

Data cubes for online analytical processing (OLAP)
OLAP Cube operations
- Slice
- Dice
- Drill up/down
- Pivot

6.2.4. Snow Schema

6.2. ETL: From one data store to another

From: Data sources
- Internal or external databases
- Web Services
To: Data warehouses
- Enterprise warehouses
- Web warehouses

7. Data Analysis

Activities of data analysis

Retrieving values
Filter
Compute derived values
Find extremum
Sort
Determine range
Characterize distribution
Find analysis
Cluster
Correlate
Contextualization

https://en.wikipedia.org/wiki/Data_analysis

8. Data Visualization

8.1. Types of Data Visualization

Time-series
Ranking
Part-to-whole
Deviation
Sort
Frequency distribution
Correlation
Nominal comparison
Geographic or geospatial

https://en.wikipedia.org/wiki/Data_visualization

8.2. Data Visualization: Examples

Bar-chart (Nominal comparison)
Pie-chart (part-to-whole)
Histograms (frequency-distribution)
Scatter-plot (correlation)
Network
Line-chart (time-series)
Treemap
Gantt chart
Heatmap

Pie Chart

Programming Language Paradigms (Bubble Chart)

Timeline of Programming Languages (using Histropedia)

Influence Graph of Programming Languages

k Predominant colours

RGB Scatter plots (Comparison)

9. Patterns

9.1. Patterns in Nature

Symmetry
Trees, Fractals
Spirals
Chaos
Waves
Bubbles, Foam
Tesselations
Cracks
Spots, stripes

9.2. Patterns by Humans

Buildings (Symmetry)
Cities
Virtual environments (e.g., video games)
Human artifacts

Pattern creation

Repitition
Fractals
- Julia set: f(z) = z² + c

Synonyms

Pattern recognition
Knowledge discovery in databases
Data mining²
Machine learning

Data mining trends² future (2007)³ finding patterns in data⁴

Pattern Recognition

Goal is to detect patterns and regularities in data
Approaches
1. Supervised learning: Availability of labeled training data
2. Unsupervised learning: No labeled training data available
3. Semi-supervised learning: Small set of labeled training data and a large amount of unlabeled data
4. Self-supervised learning: automated generation of labels for training

Formalization

Euclidean vector: geometric object with magnitude and direction
Vector space: collection of vectors that can be added together and multiplied by numbers.
Feature vector: n-dimensional vector
Feature space: Vector space associated with the vectors

Examples: Features

Images: pixel values.
Texts: Frequency of occurence of textual phrases.

Formalization

Feature construction¹: construction of new features from already available features
Feature construction operators
- Equality operators, arithmetic operators, array operators (min, max, average etc.)...

Example

Let Year of Birth and Year of Death be two existing features.
A new feature called Age = Year of Birth - Year of Death

https://en.wikipedia.org/wiki/Feature_vector

Formalization: Supervised learning

Let N be the number of training examples
Let X be the input feature space
Let Y be the output feature space (of labels)
Let {(x₁, y₁),...,(x_N, y_N)} be the N training examples, where
- x_i is the feature vector of i^th training example.
- y_i is its label.
The goal of supervised learning algorithm is to find g: X → Y, where
- g is one of the functions from the set of possible functions G (hypotheses space)
Scoring function F denote the space of scoring functions, where
- f: X × Y → R such that g returns the highest scoring function.

Formalization: Unsupervised learning

Let X be the input feature space
Let Y be the output feature space (of labels)
The goal of unsupervised learning algorithm is to
- find mapping X → Y

Formalization: Semi-supervised learning

Let X be the input feature space
Let Y be the output feature space (of labels)
Let {(x₁, y₁),...,(x_l, y_l)} be the l be the set of labeled training examples
Let {x_l+1,...,x_l+u} be the u be the set of unlabeled feature vectors of X.
The goal of semi-supervised learning algorithm is to do
- Transductive learning, i.e., find correct labels for {x_l+1,...,x_l+u}. OR
- Inductive learning, i.e., find correct mapping X → Y

10. Data Mining

Tasks in Data Mining

Classification
Clustering
Regression
Sequence Labeling
Association Rules
Anomaly Detection
Summarization

10.1. Classification

Generalizing known structure to apply to new data
Identifying the set of categories to which an object belongs
Binary vs. Multiclass classification

Applications

Spam vs Non-spam
Document classification
Handwriting recognition
Speech Recognition
Internet Search Engines

Formal definition

Let X be the input feature space
Let Y be the output feature space (of labels)
The goal of classification algorithm (or classifier) is to find { (x₁, y₁),...,(x_l, y_k)}, i.e., assigning a known label to every input feature vector, where
- x_i ∈ X
- y_i ∈ Y
- |X | = l
- |Y | = k
- l >= k

Classifiers

Classifying Algorithm
Two types of classifiers:
- Binary classifiers assigning an object to any of two classes
- Multiclass classifiers assigning an object to one of several classes

Linear Classifiers

A linear function assigning a score to each possible category by combining the feature vector of an instance with a vector of weights, using a dot product.
Formalization:
- Let X be the input feature space and x_i ∈ X
- Let β_k be vector of weights for category k
- score(x_i, k) = x_i.β_k, score for assigning category k to instance x_i. The category that gives the highest score is assigned as the category of the instance.

Classifiers

Let

tp: number of true postives
fp: number of false postives
fn: number of false negatives

Then

Precision p = tp / (tp + fp)
Recall r = tp / (tp + fn)
F1-score f1 = 2 * ((p * r) / (p + r))

Confusion Matrix for a SVM classifier of handwritten digits (MNIST)

One-vs.-rest strategy for Multiclass classification

One-vs.-one strategy for Multiclass classification

10.2. Clustering

Discovering groups and structures in the data without using known structures in the data
Objects in a cluster are more similar to each other than the objects in the other cluster

Applications

Social network analysis
Image segmentation
Recommender systems
Grouping of shopping items

Formal definition

Let X be the input feature space
The goal of clustering is to find k subsets of X, in such a way that
- C₁.. ∪ ..C_k ∪ C_outliers = X and
- C_i ∩ C_j = ϕ, i ≠ j; 1 <i,j <k
- C_outliers may consist of outlier instances (data anomaly)

Cluster models

Centroid models: cluster represented by a single mean vector
Connectivity models: distance connectivity
Distribution models: clusters modeled using statistical distributions
Density models: clusters as connected dense regions in the data space
Subspace models
Group models
Graph-based models
Neural models

10.3. Regression

Finding a function which models the data
Assigns a real-valued output to each input
Estimating the relationships among variables
Relationship between a dependent variable ('criterion variable') and one or more independent variables ('predictors').

Applications

Prediction
Forecasting
Machine learning
Finance

Formal definition

A function that maps a data item to a prediction variable
Let X be the independent variables
Let Y be the dependent variables
Let β be the unknown parameters (scalar or vector)
The goal of regression model is to approximate Y with X,β, i.e.,
- Y ≅ f(X,β)

Linear regression

straight line: y_i = β₀ + β₁x_i + ε_i OR
parabola: y_i = β₀ + β₁x_i + β₁x_i² +ε_i

Linear regression

straight line: y_i = β₀ + β₁x_i + ε_i OR
ŷ_i = β₀ + β₁_i OR
Residual: e_i = ŷ_i - y_i
Sum of squared residuals, SSE = Σ e_i, where 1 < i < n
The goal is to minimize SSE

10.4. Sequence Labeling

Assigning a class to each member of a sequence of values

Applications

Part of speech tagging
Linguistic translation
Video analysis
Handwriting recognition
Information extraction

Formal definition

Let X be the input feature space
Let Y be the output feature space (of labels)
Let 〈x₁,...,x_T〉 be a sequence of length T.
The goal of sequence labeling is to generate a corresponding sequnce
- 〈y₁,...,y_T〉 of labels
- x_i ∈ X
- y_j ∈ Y

10.5. Association Rules

Association Rules

Searches for relationships between variables

Applications

Web usage mining
Intrusion detection
Affinity analysis

Formal definition

Let I be a set of n binary attributes called items
Let T be a set of m transactions called database
Let I = {(i₁,...,i_n)} and T = {(t₁,...,t_m)}
The goal of association rule learning is to find
- X ⇒ Y, where X ⇒ Y ⊆ I
- X is the antecedent
- Y is the consequent

Formal definition

Support: how frequently an itemset appears in the database
- supp(X) = |t ∈T; X ⊆ t| / |T|
Confidence: how frequently the rule has been found to be true.
- conf(X ⇒ Y) = supp(X ∪ Y)/supp(X)
Lift: the ratio of the observed support to that of the expected if X and Y were independent
- lift(X ⇒ Y) = supp(X ∪ Y)/(supp(X) ⨉ supp(Y))

Example

{bread, butter} ⇒ {milk}

10.6. Anomaly Detection

Identification of unusual data records
Approaches
1. Unsupervised anomaly detection
2. Supervised anomaly detection
3. Semi-supervised anomaly detection

Applications

Intrusion detection
Fraud detection
Remove anomalous data
System health monitoring
Event detection in sensor networks
Misuse detection

Characteristics

Unexpected bursts

Formalization

Let Y be a set of measurements
Let P_Y(y) be a statistical model for the distribution of Y under 'normal' conditions.
Let T be a user-defined threshold.
A measurement x is an outlier if P_Y(x) < T

10.7. Summarization

Providing a more compact representation of the data set
Report Generation

Applications

Keyphrase extraction
Document summarization
Search engines
Image summarization
Video summarization: Finding important events from videos

Formalization: Multidocument summarization

Let {D = D₁, ..., D_k} be a document collection of k documents
A Document {D = t₁, ..., t_m} consists of m textual units (words, sentences, paragraphs etc.)
Let {D = t₁, ..., t_n} be the complete set of all textual units from all documents, where
- t_i ∈ D, if and only if ∃ D_j such that t_i ∈ D_j
S ⊆ D constitutes a summary
Two scoring functions
- Rel(i): relevance of textual unit i in the summary
- Red(i,j): Redundancy between two textual units t_i, t_j

Scoring for a summary S
- s(S) score of summary S
- l(i) is the length of the textual unit i
- K is the fixed maximum length of the summary

Finding a subset from the entire subset
Approaches
1. Extraction: Selecting a subset of existing words, phrases, or sentences in the original text without any modification
2. Abstraction: Build an internal semantic representation and then use natural language generation techniques

Extractive summarization

Approaches
1. Generic summarization: Obtaining a generic summary
2. Query relevant summarization: Summary relevant to a query

11. Algorithms

Support Vector Machines (SVM)
Stochastic Gradient Descent (SGD)
Nearest-Neighbours
Naive Bayes
Decision Trees
Ensemble Methods (Random Forest)

11.1. Support Vector Machines (SVM)

Introduction

Supervised learning approach
Binary classification algorithm
Constructs a hyperplane ensuring the maximum separation between two classes

Hyperplane

Hyperplane of n-dimensional space is a subspace of dimension n-1
Examples
- Hyperplane of a 2-dimensional space is 1-dimensional line
- Hyperplane of a 3-dimensional space is 2-dimensional plane

Formal definition

The goal of a SVM is to estimate a function f: R^N ⨉ {+1,-1}, i.e.,
- If x₁,...,x_l ∈ R^N are the N input data points,
- the goal is to find (x₁,y₁),...,(x_l,y_l) ∈ R^N ⨉ {+1,-1}
Any hyperplane can be written by the equation using set of input points x
- w.x - b = 0, where
- w ∈ R^N, a normal vector to the plane
- b ∈ R
A decision function is given by f(x) = sign(w.x - b )

Formal definition

If the training data are linearly separable, two hyperplanes can be selected
They separate the two classes of data, so that distance between them is as large as possible.
The hyperplanes can be given by the equations
- w.x - b = 1
- w.x - b = -1
The distance between the two hyperplanes can be given by 2/||w||
Region between these two hyperplanes is called margin.
Maximum-margin hyperplane is the hyperplane that lies halfway between them.

Formal definition

In order to prevent data points from falling into the margin, following constraints are added
- w.x_i - b >= 1, if y_i = 1
- w.x_i - b <= -1, if y_i = -1
y_i(w.x_i - b) >= 1 for 1<= i <= n
The goal is to minimize ||w|| subject to y_i(w.x_i - b) >= 1 for 1<= i <= n
Solving for both w and b gives our classifier f(x) = sign(w.x - b)
Max-margin hyperplane is completely determined by the points that lie nearest to it, called the support vectors

Data mining tasks

Classification (Multi-class classification)
Regression
Anomaly detection

Applications

Text and hypertext categorization
Image classification
Handwriting recognition

11.2. Stochastic Gradient Descent (SGD)

A stochastic approximation of the gradient descent optimization
Iterative method for minimizing an objective function that is written as a sum of differentiable functions.
Finds minima or maxima by iteration

Gradient

Multi-variable generalization of the derivative.
Gives slope of the tangent of the graph of a function
Gradient points in the direction of the greatest rate of increase of a function
Magnitude of gradient is the slope of the graph in that direction

Gradient vs Derivative

Derivatives defined on functions of single variable
Gradient defined on functions of multiple variables
Gradient is a vector-valued function (range is a vector)
Derivative is a scalar-valued function

Gradient descent

First-order iterative optimization algorithm for finding the minimum of a function.
Finding a local minima involves taking steps proportional to the negative of the gradient of the function at the current point.

Standard gradient descent method

Let's take the problem of minimizing an objective function
- Q(w) = 1/n (ΣQ_i(w)), 1<=i<n
- Summand function Q_i associated with i^th observation in the data set.
w = w - η.∇ Q(w)

Iterative method

Choose an initial vector of parameters w and learning rate η.
Repeat until an approximate minimum is obtained:
- Randomly shuffle examples in the training set.
- w = w - η.∇ Q_i(w), for i=1...n

Applications

Classification
Regression

11.3. Nearest-Neighbours

k-nearest neighbors algorithm

k-NN classification: output is a class membership
(object is classified by a majority vote of its neighbors.)
k-NN regression: output is the property value for the object (average values of its k nearest neighbors)

Applications

Regression
Anomaly detection

11.4. Naive Bayes classifiers

Collection of simple probabilistic classifiers based on applying Bayes' theorem with strong independence assumption between the features.

Applications

Document classification (spam/non-spam)

Bayes' Theorem

If A and B are events.
P(A), P(B) are probabilities of observing A and B independently of each other..
P(A|B) is conditional probability, the likelihood of event A occurring given that B is true
P(B|A) is conditional probability, the likelihood of event B occurring given that A is true
P(B) ≠ 0
P(A|B) = (P(B|A).P(A))/P(B)

11.5. Decision Trees

Decision support tool
Tree-like model of decisions and their possible consequences

Applications

Classification
Regression
Decision Analysis: identifying strategies to reach a goal
Operations Research

11.6. Ensemble Methods (Random Forest)

Defintion

Collection of multiple learning algorithms to obtain better predictive performance than could be obtained from one of the constituting algorithms alone.
Random forests are obtained by building multiple decision trees at training time

Multiclass classification
Multilabel classification (the problem of assigning one or more label to each instance. There is no limit on the number of classes an instance can be assigned to.)
Regression
Anomaly detection

12. Feature Selection

Definition

Process of selecting a subset of relevant features
Used in domains with large number of features and comparatively few sample points

Applications

Analysis of written texts
Analysis of DNA microarray data

Formal defintion[8]

Let X be the original set of n features, i.e., |X| = n
Let w_i be the weight assigned to feature x_i∈ X
Binary feature selection assigns binary weights whereas continuous feature selection assigns weights preserving the order of its relevance.
Let J(X') be an evaluation measure, defined as J: X' ⊆ X → R
Feature selection problem may be defined in three following ways
1. |X'| = m < n. Find X' ⊂ X such that J(X') is maximum
2. Choose J₀, Find X' ⊆ X, such that J(X') >= J₀
3. Find a compromise among minimizing |X'| and maximizing J(X')

Data Mining

Goals

Artifical Neural Networks
Deep Learning
Reinforcement Learning
Data Licences, Ethics and Privacy

13. Artificial Neural Networks

Inspired by biological neural networks
Collection of connected nodes called artificial neurons.
Artificial neurons can transmit signal from one to another (like in a synapse).
Signal between artificial neurons is a real number
The output of a neuron is the sum of weighted inputs.

Perceptron

Algorithm for supervised learning of binary classifiers
Binary classifier

Perceptron: Formal definition

Let y = f(z) be output of perceptron for an input vector z
Let N be the number of training examples
Let X be the input feature space
Let {(x₁, d₁),...,(x_N, d_N)} be the N training examples, where
- x_i is the feature vector of i^th training example.
- d_i is the desired output value.
- x_j,i be the i^th feature of j^th training example.
- x_j,0 = 1

Weights are represented in the following manner:
- w_i is the i^th value of weight vector.
- w_i(t) is the i^th value of weight vector at a given time t.

Perceptron: Steps

Initialize weights and threshold
For each example (x_j, d_j) in training set
- Calculate the weight: y_j(t)=f[w(t).x_j]
- Update the weights: w_i(t + 1) = w_i(t) + (d_j-y_j(t))x_j,i
Repeat step 2 until the iteration error 1/s (Σ |d_j - y_j(t)|) is less than user-specified threshold.

Backpropagation

Backward propagation of errors
Adjust the weight of neurons by calculating the gradient of the loss function
Error is calculated and propagated back to the network layers

14. Deep Learning

Deep neural networks

Multiple hidden layers between the input and output layers

Applications

Computer vision
Speech recognition
Drug design
Natural language processing
Machine translation

Convolutional deep neural networks

Analysis of images
Inspired by neurons in the virtual cortex
Network learns the filters

15. Reinforcement Learning

Inspired by behaviourist psychology
Actions to be taken in order to maximize the cumulative award.

16. Data Licences, Ethics and Privacy

Data usage licences
Confidentiality and Privacy
Ethics

Big Data

Volume
Variety
Velocity
Veracity
Value

References

Data Mining course by John Samuel (2017)
Piatetsky-Shapiro, Gregory. “Data Mining and Knowledge Discovery 1996 to 2005: Overcoming the Hype and Moving from ‘University’ to ‘Business’ and ‘Analytics.’” Data Mining and Knowledge Discovery, vol. 15, no. 1, July 2007, pp. 99–105. DOI.org (Crossref), doi:10.1007/s10618-006-0058-2.
Kriegel, Hans-Peter, et al. “Future Trends in Data Mining.” Data Mining and Knowledge Discovery, vol. 15, no. 1, July 2007, pp. 87–97. DOI.org (Crossref), doi:10.1007/s10618-007-0067-9.
Fayyad, Usama, et al. “Knowledge Discovery and Data Mining: Towards a Unifying Framework.” Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, AAAI Press, 1996, pp. 82–88.
NoSQL vs. SQL

Introduction to Data Mining

John Samuel

1. Lifecycle of data

ETL (Extraction Transformation and Loading)

2. Data Acquistion and Storage

2.1. Data acquisition

2.2. Data storage formats

2.3 Types of data stores

2.3.1. ACID Transactions1

2.3.2. Types of data stores

2.3.3. NoSQL

2.3.4. Types of NoSQL stores

3. Data Extraction and Integration

3.1. Data extraction techniques

3.2. Query interfaces

3.3. Crawlers for web pages

3.4. Application Programming Interface (API)

4. Pre-treatement of Data

4.1 Data Cleaning: Types of Errors

4.1.1. Syntactical errors

4.1.2. Semantic errors

4.1.3. Coverage errors

4.2. Data Cleaning: Handling Errors

4.2.1. Handling Syntactical errors

4.2.2. Handling Semantic errors

4.2.3. Handling Coverage errors

4.2.4. Administrators and handling errors

5. Data Transformation

Languages

6. ETL

6.1. ETL (Extraction Transformation and Loading)

6.2.1. Models for data analysis

6.2.2. Star Schema

6.2.3. Data Cubes

6.2.4. Snow Schema

6.2. ETL: From one data store to another

7. Data Analysis

Activities of data analysis

8. Data Visualization

8.1. Types of Data Visualization

8.2. Data Visualization: Examples

Pie Chart

k Predominant colours

RGB Scatter plots (Comparison)

9. Patterns

9.1. Patterns in Nature

9.2. Patterns by Humans

Pattern creation

Synonyms

Pattern Recognition

Formalization

Examples: Features

Formalization

Example

Formalization: Supervised learning

Formalization: Unsupervised learning

Formalization: Semi-supervised learning

10. Data Mining

Tasks in Data Mining

10.1. Classification

Applications

Formal definition

Classifiers

Linear Classifiers

Classifiers

10.2. Clustering

Applications

Formal definition

Cluster models

10.3. Regression

Applications

Formal definition

Linear regression

Linear regression

10.4. Sequence Labeling

Applications

Formal definition

10.5. Association Rules

Association Rules

Applications

2.3.1. ACID Transactions¹