Creative Commons License

Questions: Second session

Question 1

What is data storage? What are the different ways to store digital data? Briefly describe them. (1 point)

Question 2.a

Consider a sensor that can measure the following values: luminosity, pressure, UV rays, temperature and humidity. How will you represent the daily measures in a data storage system of your choice? Write a program in Python using Pandas library to read these values from the storage system. (1 point)

Question 2.b

What is a NoSQL data store? What are the different types of NoSQL data stores? Briefly describe each one of them. (1 point)

Question 2.c

Consider the above sensor with five different types of measurement capabilities: luminosity, pressure, ultraviolet rays, temperature and humidity? How will you represent the daily data collected by these sensors in a column based NoSQL? Explain with an example. (1 point)

Question 3

What are the different types of errors in the data? How do you deal with them? (1 point)

Question 4

What are the differences between classification and clustering algorithms? (1 point)

Question 5.a

Consider a CSV file containing the following columns: City, Year, and Population, i.e., it contains the information of population of a city as recorded every year from 1950. Your goal is to write a Python program perform the following:

  1. Read the CSV file
  2. Show the temporal evolution of population of every city
  3. Find the city with the maximum population in the year 2000
  4. For every country, compute the average population of the cities in the year 2000

(1.5 points)

Question 5.b

Consider a CSV file containing the following columns: Country, City, Year, and Population, i.e., it contains the information of population of a city (of a country) as recorded every year from 1900. Your goal is to write a Python program using pandas that can read this CSV file and perform the following:

  1. Find the city with the minimum population in the year 2010
  2. For every country, compute the average population of the cities in the year 2010.

(2 points)

Question 5.c

We assume that the CSV file of population data of cities does not contain any errors and have a complete data of population of Paris from the year 1900 to 2017. Your next goal is to predict the population of Paris in the year 2050. Write a Python program to achieve this prediction task. (1.5 points)

Question 6.a

What is a perceptron? (1 point)

Question 6.b

What is an artificial neural network? (0.5 point)

Question 6.c

What is Reinforcement learning? (1 point)

Question 7

What is the difference between supervised and non-supervised learning? (0.5 point)

Question 8

Before starting analysis on data from external sources, what are your considerations? (1 point)