Data Science for Chemists

IPL Summer School, CPE Lyon

3. Data Analysis and visualization

John Samuel
CPE Lyon

Year: 2023-2024
Email: john.samuel@cpe.fr

Creative Commons License

3.1. Data Acquistion and Storage

Data acquisition

  1. Surveys
    • Manual surveys
    • Online surveys
  2. Sensors1
    • Temperature, pressure, humidity, rainfall
    • Acoustic, navigation
    • Proximity, presence sensors
  3. Social networks
  4. Video surveillance cameras
  5. Web
  1. https://en.wikipedia.org/wiki/List_of_sensors

3.2. Data Acquistion and Storage

Data storage formats

3.2. Data Acquistion and Storage

Types of data stores

  1. Structured data stores
    • Relational databases
    • Object-oriented databases
  2. Unstructured data stores
    • Filesystems
    • Content-management systems
    • Document collections
  3. Semi-structured data stores
    • Filesystems
    • NoSQL data stores
Unstructured vs. Structured vs. Semi-structured

3.2. Data Acquistion and Storage

ACID Transactions1

  1. https://en.wikipedia.org/wiki/ACID

3.2. Data Acquistion and Storage

ACID Transactions

3.2. Data Acquistion and Storage

Types of data stores

3.2. Data Acquistion and Storage

NoSQL

3.2. Data Acquistion and Storage

Types of NoSQL stores

3.3. Data Extraction and Integration

Data extraction techniques

3.3. Data Extraction and Integration

Query interfaces

3.3. Data Extraction and Integration

3.3. Crawlers for web pages

Web crawlers: navigating the entire using hyperlinks

3.3. Data Extraction and Integration

Application Programming Interface (API)

API (Interface de programmation)

3.4. Pre-treatement of Data

Data Cleaning: Types of Errors

3.4. Pre-treatement of Data

Syntactical errors

3.4. Pre-treatement of Data

Semantic errors

3.4. Pre-treatement of Data

Coverage errors

3.4. Pre-treatement of Data

Handling Syntactical errors

3.4. Pre-treatement of Data

Handling Semantic errors

3.4. Pre-treatement of Data

Handling Coverage errors

3.4. Pre-treatement of Data

Administrators and handling errors

3.5. Data Transformation

Languages

3.6. ETL

ETL (Extraction Transformation and Loading)

  1. Data Extraction
  2. Data Cleaning
  3. Data Transformation
  4. Loading data to information stores

3.6. ETL

Models for data analysis

3.6. ETL

Models for data analysis

3.6. ETL

Star Schema

3.6. ETL

Data Cubes

3.6. ETL

Snow Schema

3.6. ETL

ETL: From one data store to another

3.7. Data Analysis

Activities of data analysis

  1. Retrieving values
  2. Filter
  3. Compute derived values
  4. Find extremum
  5. Sort
  6. Determine range
  7. Characterize distribution
  8. Find analysis
  9. Cluster
  10. Correlate
  11. Contextualization
  1. https://en.wikipedia.org/wiki/Data_analysis

3.8. Data Visualization

Data Visualization

  1. Time-series
  2. Ranking
  3. Part-to-whole
  4. Deviation
  5. Sort
  6. Frequency distribution
  7. Correlation
  8. Nominal comparison
  9. Geographic or geospatial
  1. https://en.wikipedia.org/wiki/Data_visualization

3.8. Data Visualization

Data Visualization: Examples

  1. Bar-chart (Nominal comparison)
  2. Pie-chart (part-to-whole)
  3. Histograms (frequency-distribution)
  4. Scatter-plot (correlation)
  5. Network
  6. Line-chart (time-series)
  7. Treemap
  8. Gantt chart
  9. Heatmap

3.8. Data Visualization

Pie Chart

3.8. Data Visualization

Programming Language Paradigms (Bubble Chart)

3.8. Data Visualization

Timeline of Programming Languages (using Histropedia)

3.8. Data Visualization

Influence Graph of Programming Languages

3.8. Data Visualization

k Predominant colours

3.8. Data Visualization

RGB Scatter plots (Comparison)

References

Sites web

Couleurs

Images