Creative Commons License

Goals

  1. Plotting graphs using matplotlib
  2. Reading and plotting image histograms.
  3. Working with clustering and classification algorithms
  4. Start building a recommender system

Scoring

Every exercise has an associated difficulty level. Easy and medium-difficult exercises help you understand the fundamentals and give you ideas to work on difficult exercises. It is highly recommended that you finish easy and medium-difficult exercises to have a good score. Given below is the difficulty scale that will be marked with every exercise:

  1. : Easy
  2. ★★: Medium
  3. ★★★: Difficult

Guidelines

  1. To get complete guidance from the mentors, it is highly recommended that you work on today's practical session and not on the preceding ones.
  2. Make sure that you rename your submissions properly and correctly. Double-check your submissions.
  3. Please check the references.
  4. There are several ways to achieve a task. Hence there are many possible solutions. But try to make maximum use of the libraries that have been suggested to you for your exercises.

Installation

Please refer installation page.

Exercise 2.1

matplotlib can be used to plot graphs. Given below is a very simple code with only x values. After importing the matplotlib library, we initialize x values and plot it.

import matplotlib.pyplot as plot

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
plot.plot(x)
plot.show()

Now let's change the color, style and width of the line.

import matplotlib.pyplot as plot

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
plot.plot(x, linewidth=3, drawstyle="steps", color="#00363a")
plot.show()

We will now initialize the y-values and plot the graph.

import matplotlib.pyplot as plot

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
y = [0, 1, 0, 0, 1, 0, 1, 1, 1, 0]
plot.plot(x, y, linewidth=3, drawstyle="steps", color="#00363a")
plot.show()

In the first practical session, we saw how to parse JSON files. Continuing with the same JSON file, we will now plot the results of number of programming languages released per year. Verify the output.

from pandas.io.json import json_normalize
import pandas as pd
import json
import matplotlib.pyplot as plot

data = json.load(open('pl.json'))
dataframe = json_normalize(data)
grouped = dataframe.groupby('year').count()
plot.plot(grouped)
plot.show()

Following program will add title and labels to the x-axis and y-axis.

from pandas.io.json import json_normalize
import pandas as pd
import json
import matplotlib.pyplot as plot

data = json.load(open('pl.json'))
dataframe = json_normalize(data)
grouped = dataframe.groupby('year').count()
plot.plot(grouped)
plot.title("Programming languages per year")
plot.xlabel('year', fontsize=16)
plot.ylabel('count', fontsize=16)
plot.show()

There is yet another way to plot the dataframes, by using pandas.DataFrame.plot.

from pandas.io.json import json_normalize
import pandas as pd
import json
import matplotlib.pyplot as plot

data = json.load(open('pl.json'))
dataframe = json_normalize(data)

grouped = dataframe.groupby('year').count()
grouped = grouped.rename(columns={'languageLabel':'count'}).reset_index()

grouped.plot(x=0, kind='bar', title="Programming languages per year")

Now, we want to create multiple subplots. A simple way is given below. Recall in first practical session, we did group by on multiple columns. Subplots can be used to visualize these data.

from pandas.io.json import json_normalize
import pandas as pd
import json
import math
import matplotlib.pyplot as plot

jsondata = json.load(open('plparadigm.json'))
array = []

for data in jsondata:
    array.append([data['year'], data['languageLabel'], data['paradigmLabel']])

dataframe = pd.DataFrame(array, columns=['year', 'languageLabel', 'paradigmLabel'])
dataframe = dataframe.astype(dtype= {"year" : "int64", "languageLabel" : "<U200", "paradigmLabel" : "<U200"})

grouped = dataframe.groupby(['paradigmLabel', 'year']).count()
grouped = grouped.rename(columns={'languageLabel':'count'})
grouped = grouped.groupby(['paradigmLabel'])

#Initialization of subplots
nr = math.ceil(grouped.ngroups/2)
fig, axes = plot.subplots(nrows=nr, ncols=2, figsize=(20,25))

#Creation of subplots
for i, group in enumerate(grouped.groups.keys()):
    g = grouped.get_group(group).reset_index()
    g.plot(x='year', y='count', kind='bar', title=group, ax=axes[math.floor(i/2),i%2])

plot.show()

Make changes to the above code, so that we can get visual information on count of different programming paradigms released in every available year.

Exercise 2.2

In this exercise, we will work on images. Download an image (e.g., picture.bmp and flower.jpg) in your current working folder and open it in the following manner. We will first try to get some metadata of the image.

import os,sys
from PIL import Image
imgfile = Image.open("picture.bmp")
print(imgfile.size, imgfile.format)

We use Image module of Python PIL library (Documentation). We will now try to get data of 100 pixels from an image.

import os,sys
from PIL import Image
imgfile = Image.open("flower.jpg")

data = imgfile.getdata()

for i in range(10):
    for j in range(10):
        print(i,j, data.getpixel((i,j)))

You may notice the pixel position and pixel values (a tuple of 3 values). Let's try to get additional metadata of the images, i.e., mode of image (e.g., RGB), number of bands, number of bits for each band, width and height of image (in pixels).

import os,sys
from PIL import Image
imgfile = Image.open("flower.jpg")

print(imgfile.mode, imgfile.bands, imgfile.bits, imgfile.width, imgfile.height)

Let's now get an histogram of colors. When you execute the following code, you will get a single array of values, frequency of each band (R, G, B etc.) concatenated together. In the following code, we will assume that we are working with an image of 3 bands (RGB mode) and each band is represented by 8 bits. We will plot the histogram of different colors.

from PIL import Image
import matplotlib.pyplot as plot

imgfile = Image.open("flower.jpg")

histogram = imgfile.histogram()
red = histogram[0:255]
green = histogram[256:511]
blue = histogram[512:767]

fig, (axis1, axis2, axis3) = plot.subplots(nrows=3, ncols=1)
axis1.plot(red, color='red')
axis2.plot(green, color='green')
axis3.plot(blue, color='blue')
plot.show()

But if wish to see all of them in one single plot.

from PIL import Image
import matplotlib.pyplot as plot

imgfile = Image.open("flower.jpg")

histogram = imgfile.histogram()
red = histogram[0:255]
green = histogram[256:511]
blue = histogram[512:767]

x=range(255)

y = []
for i in x:
    y.append((red[i],green[i],blue[i]))

plot.plot(x,y)
plot.show()

But we do not wish to loose the band colors.

from PIL import Image
import matplotlib.pyplot as plot

imgfile = Image.open("flower.jpg")

histogram = imgfile.histogram()
red = histogram[0:255]
green = histogram[256:511]
blue = histogram[512:767]

x=range(255)

y = []
for i in x:
    y.append((red[i],green[i],blue[i]))

figure, axes = plot.subplots()
axes.set_prop_cycle('color', ['red', 'green', 'blue'])
plot.plot(x,y)
plot.show()

Your next question is to get the top 20 colors in each band and create a single plot of histogram of these top colors. Write a python program that can achieve this.

Exercise 2.3 ★★

In this exercise, we will take a look at KMeans clustering algorithm. Continuing with images, we will now find 4 predominant colors in an image.

from PIL import Image
import numpy
import math
import matplotlib.pyplot as plot
from sklearn.cluster import KMeans

imgfile = Image.open("flower.jpg")

numarray = numpy.array(imgfile.getdata(), numpy.uint8)

clusters = KMeans(n_clusters = 4)
clusters.fit(numarray)


npbins = numpy.arange(0, 5)
histogram = numpy.histogram(clusters.labels_, bins=npbins)
labels = numpy.unique(clusters.labels_)


barlist = plot.bar(labels, histogram[0])
for i in range(4):
    barlist[i].set_color('#%02x%02x%02x' % (math.ceil(clusters.cluster_centers_[i][0]),
        math.ceil(clusters.cluster_centers_[i][1]), math.ceil(clusters.cluster_centers_[i][2])))
plot.show()

For your next question, your goal is to understand the above code and achieve the following:

  1. Assume that the number of clusters is given by the user, generalize the above code.
  2. In case of bar chart, ensure that the bars are arranged in the descending order of the frequency of colors.
  3. Also add support for pie chart in addition to the bar chart. Ensure that we use the image colors as the wedge colors. (e.g., given below)
  4. Do you have any interesting observations?

Exercise 2.4 ★★

In this exercise, we will explore Support vector machines (SVM) for classification. We will now use colors. Given a RGB number, we want to classify a number into reddish and non-reddish colors.

from sklearn import svm
x = [[186, 0, 13], [255, 121, 97], [244, 67, 54],[69, 39, 160],[121, 83, 210],[0, 51, 0],[27, 94, 32]]
y = [1, 1, 1,0,0,0,0]
clf = svm.SVC()
clf.fit(x, y)

print(clf.predict([[186, 0, 13]]))
print(clf.predict([[30, 136, 229]]))

Look at the training data (x,y). We first use the data to train our classifier. Then we try to predict a new color. We first try with one element of our training data and later with a new color.

Your next goal is to achieve multi-class classification.

  1. Train a classifier that can classify three colors: reddish, bluish and greenish colors.
  2. Test your classifier with new colors.
  3. What are your observations?

Exercise 2.5: Project ★★★

Project: Image recommender system: 3 practical sessions

The goal of this project is to recommend images based on the color preferences of the user. We will build this system in three practical sessions.

We have to collect the following data.
  1. A set of images and the predominant colors in each image.
  2. Ask the user to select some images. We assume that the chosen images contain the favourite colors of the user.
  3. We analyse user color-preferences and predominant colors of available images to propose new images to the user.

For this question, we start with analysing predominant images. You have the following tasks to program:

  1. Create a folder called testimages.
  2. Download open-licensed images to the folder testimages.
  3. Get N (configurable number) colors of each test image and save this information in a JSON file.

Submission

References

Link