Practical 4#

Goals#

Work on builtin Python functions#

  1. map()

  2. filter()

  3. reduce()

Multiprocessing#

  1. multiprocessing.cpu_count

  2. multiprocessing.Pool

  3. map

Exercice 1 [★]#

In this exercise, we will take a look at the Python builtin function called filter() which can be used to select items from a collection matching a particular condition.

# Initialization
num = [i for i in range(1, 20)]
print(num)

We will make use of the available documentation for the different functions. For this purpose, we will use a question mark (?) after the name of the function or a class as shown below.

?filter

The function filter(function, iterable) takes two parameters: a function and an iterable. The function acts on each element of an iterable data type.

In the first example, we use Noneas the first parameter. In this case, filter will act as an identity function and returns the iterable.

# Use of filter function with None as the first parameter
num = [i for i in range(1, 20)]
filtered = list(filter(None, num))
print(filtered)

In the next example, we will filter out the even numbers from the input list. Note that we have written a function even() which returns True if the input number is even, else False.

filter will return the items from the list which returned True when passed as argument to the function even().

def even(item):
    if item % 2 == 0:
        return True
    return False


num = [i for i in range(1, 20)]
filtered = list(filter(even, num))
print(filtered)

In the following example, we have a new function odd() which returns True when the input number is odd.

We use this new function as input to the filter() function.

def odd(item):
    if item % 2 == 0:
        return False
    return True


num = [i for i in range(1, 20)]
filtered = list(filter(odd, num))
print(filtered)

Question Write a program using filter() that takes a list of strings and filters out the palindromes.

Filtering with Nested Structures#

The filter() function can also be applied to complex data structures like lists of dictionaries or tuples.

You are given a list of dictionaries representing employees, each with name, age, and department keys:

employees = [
    {"name": "Alice", "age": 28, "department": "HR"},
    {"name": "Bob", "age": 35, "department": "Engineering"},
    {"name": "Charlie", "age": 22, "department": "Marketing"},
    {"name": "David", "age": 45, "department": "Engineering"},
    {"name": "Pierre", "age": 29, "department": "HR"}
]

Questions

  1. Filter by Department: Write a program that uses filter() to create a list of employees who work in the “Engineering” department.

  2. Filter by Age Range: Write a program that uses filter() to find employees whose age is between 25 and 40 (inclusive).

  3. Filter by Name Length: Write a program that uses filter() to find employees whose name has more than 3 characters.

Advanced String Filtering#

You are given a list of sentences. Your task is to filter sentences based on various conditions.

sentences = [
    "Bienvenue dans le monde de la programmation!",
    "Le débogage fait partie du jeu.",
    "La pratique rend parfait, alors continue à coder.",
    "Les algorithmes d'apprentissage automatique sont de plus en plus puissants.",
    "La visualisation de données est un moyen de communiquer des informations complexes.",
    "Les données non structurées sont un défi pour les Data Scientists."
]

Questions

  1. Filter by Length: Write a program that uses filter() to select sentences that have fewer than 5 words.

  2. Filter by Keywords: Write a program that uses filter() to select sentences that contain the word “coder”.

  3. Filter by Palindromic Words: Write a program that uses filter() to select sentences that contain at least one palindrome.

Exercice 2 [★]#

What if we want to apply the same function on multiple elements in a list.

Take for example, let’s assume that we have a function that can return the square of a number. Now we want to apply this to all the numbers in a list. We can write a program with a loop to achieve this. But, we are going to write a smaller program to achieve this.

\(f(x) = x ^ 2\)

\(g([a,b,...]) = [f(a), f(b), ..]\)

\(g([a,b,...]) = [a^2, b^2, ..]\)

Python provides another builtin function called map(function, iterable, ...).

?map
def square(item):
    return item * item


num = [i for i in range(1, 20)]
squared = list(map(square, num))
print(filtered)

But what if our program takes multiple inputs.

The following example shows this cases. The function product() takes two numbers as input and returns their product.

def product(item1, item2):
    return item1 * item2


num1 = [i for i in range(1, 20)]
print(num1)
num2 = [i for i in range(10, 20)]
print(num2)
product_value = list(map(product, num1, num2))
print(filtered)

Finally, we look at another function called reduce() that applies a function of two arguments cumulatively on the members of the list, from left to right.

\(f(x) = x ^ 2\)

\(g([a,b,...]) = [f(a), f(b), ..]\)

\(g([a,b,...]) = [a^2, b^2, ..]\)

\(h(g([a,b,c,..])) = (((a^2 + b^2) + c^2) + ...) \)

from functools import reduce

?reduce

In the following example, we calculate the sum of members of a list.

We pass the function sum_num() as the first argument to the reduce function. sum_num() takes two numbers as input and returns their number.

from functools import reduce
import random


def sum_num(item1, item2):
    return item1 + item2


num = [i for i in range(1, 20)]
print(num)

sum_value = reduce(sum_num, num)
print(sum_value)

In the next example, we make use of the same function sum_num(), but on real numbers.

num = [random.uniform(0, i) for i in range(1, 20)]
print(num)
sum_value = reduce(sum_num, num)
print(sum_value)

In the next example, we use another function product().

from functools import reduce


def product(item1, item2):
    return item1 * item2


num = [i for i in range(1, 20)]
print(num)
sum_num = reduce(product, num)
print(sum_num)

Question: Write a program that takes a list of matrices of size 2x2 and computes the sum of all matrices.

Matrix Operations with map() and reduce()#

You are given a list of 2x2 matrices (lists of lists). You need to apply various operations using map() and reduce().

from functools import reduce

matrices = [
    [[1, 2], [3, 4]],
    [[0, 1], [2, 3]],
    [[-1, -2], [-3, -4]],
    [[5, 6], [7, 8]]
]

Questions

  1. Sum of All Matrices: Write a program that uses reduce() to compute the sum of all matrices.

  2. Element-wise Multiplication: Write a program that uses map() to compute the element-wise multiplication of two matrices.

  3. Matrix Filtering: Write a program that uses filter() to select only matrices where all elements are positive.

Data Transformation and Aggregation#

You have a list of dictionaries representing products, with keys name, price, and quantity.

from functools import reduce

products = [
    {"name": "Laptop", "price": 1200, "quantity": 3},
    {"name": "Smartphone", "price": 800, "quantity": 5},
    {"name": "Tablet", "price": 300, "quantity": 10},
    {"name": "Smartwatch", "price": 200, "quantity": 15}
]

Questions

  1. Total Inventory Value: Write a program that uses map() and reduce() to calculate the total value of all products in stock.

  2. Price Filtering: Write a program that uses filter() to find products priced above a certain threshold (e.g., 500).

  3. Discount Application: Write a program that uses map() to apply a 10% discount to all products and returns the updated list.

Exercice 3 [★★]#

In the following examples, we use lambda expressions and pass them as arguments to the functions filter(), map(), and reduce().

In the following example the lambda expression lambda x: x%2 takes x as input and returns the value for x%2. This is similar to the approach we saw above with the function even().

num = [i for i in range(1, 20)]
filtered = list(filter(lambda x: x % 2 == 0, num))
print(filtered)

In the following example, we take the example with the function odd() and replace it by a lambda expression.

num = [i for i in range(1, 20)]
filtered = list(filter(lambda x: x % 2 != 0, num))
print(filtered)

In the following example, we take the example with the function square() and replace it by a lambda expression.

num = [i for i in range(1, 20)]
squared = list(map(lambda x: x * 2, num))
print(squared)

What if we want to pass two arguments, like in the example product() above.

num1 = [i for i in range(1, 20)]
print(num1)
num2 = [i for i in range(10, 20)]
print(num2)
product = list(map(lambda x, y: x * y, num1, num2))
print(product)

In the following examples, we use lambda expression with the reduce() function.

from functools import reduce
import random

num = [i for i in range(1, 20)]
print(num)

sum_value = reduce(lambda x, y: x + y, num)
print(sum_value)

Like in the example with sum_num(), we test real numbers with the lambda expressions.

from functools import reduce
import random

num = [i for i in range(1, 20)]
print(num)

sum_value = reduce(lambda x, y: x + y, num)
print(sum_value)

Now we replace the product() with a lambda expression.

from functools import reduce
import random

num = [i for i in range(1, 20)]
print(num)

product_value = reduce(lambda x, y: x * y, num)
print(product_value)

Question Write a program using map(), reduce() and lambda expressions to count the total length of all strings in a list.

Text Analysis with Lambda Expressions#

You are given a list of sentences. Each sentence is a string containing multiple words.
Use map(), filter(), and reduce() with lambda expressions to analyze the text.

sentences = [
    "Bienvenue dans le monde de la programmation!",
    "Le débogage fait partie du jeu.",
    "La pratique rend parfait, alors continue à coder.",
    "Les algorithmes d'apprentissage automatique sont de plus en plus puissants.",
    "La visualisation de données est un moyen de communiquer des informations complexes.",
    "Les données non structurées sont un défi pour les Data Scientists."
]

Questions

  1. Word Count: Write a program that uses map() and reduce() to count the total number of words in all sentences.

  2. Longest Sentence: Write a program that uses reduce() to find the longest sentence by word count.

  3. Filtering Short Sentences: Write a program that uses filter() to keep only sentences with more than 6 words.

from functools import reduce

transactions = [
    {"date": "2025-03-10", "type": "income", "amount": 1200},
    {"date": "2025-03-11", "type": "expense", "amount": 400},
    {"date": "2025-03-12", "type": "income", "amount": 1500},
    {"date": "2025-03-13", "type": "expense", "amount": 800},
    {"date": "2025-03-14", "type": "income", "amount": 2000},
    {"date": "2025-03-15", "type": "expense", "amount": 500},
    {"date": "2025-03-16", "type": "income", "amount": 1800},
]

Financial Data Processing with Lambda Expressions#

You have a list of transactions represented as dictionaries.
Use map(), filter(), and reduce() with lambda expressions to process the data.

Questions

  1. Net Balance Calculation: Write a program that uses reduce() to calculate the net balance (sum of all income minus sum of all expenses).

  2. Filter Transactions: Write a program that uses filter() to retrieve only income transactions above a threshold (e.g., 1500).

  3. Transaction Amounts: Write a program that uses map() to extract only the amounts from the transactions and returns them as a list.

Exercice 4 [★★★]#

Next, we want to use multiprocessing to compute the values in parallel. For this purpose we will use multiprocessing package.

First we find the number of processors in our machine.

import multiprocessing as mp

?mp.cpu_count
import multiprocessing as mp

print(mp.cpu_count())

Next, we will create a pool of processes for the calculation and we make use of the Pool() method.

import multiprocessing as mp

?mp.Pool

In the following example, we create a pool with the number of processes equal to the number of processors in our machine.

Take a look how we tranform our previous example of map-reduce in the mulitprocessing context.

from functools import reduce
import multiprocessing as mp

cpu_count = mp.cpu_count()


def squared(x):
    return x * x


num = [i for i in range(1, 20)]
with mp.Pool(processes=cpu_count) as pool:
    list_squared = pool.map(squared, num)
    print(list_squared)
    product_value = reduce(lambda x, y: x * y, list_squared)
    print(product_value)

In the following example, we want to download a number of pages in parallel. We pass the download_page() as an input to the pool.map() function. The goal of the function is to download Wikidata pages. Check the output of the following code.

Change the number of processes and test the output.

import requests


def download_page(item):
    r = requests.get(
        "https://www.wikidata.org/wiki/Special:EntityData/" + item + ".json"
    )
    # success
    if r.status_code == 200:
        with open(item + ".json", "w") as w:
            w.write(str(r.json()))
        w.close()
    return r.status_code


process_count = 2
pages = ["Q1", "Q2", "Q3", "Q4", "Q5", "Q6"]
with mp.Pool(processes=process_count) as pool:
    status = pool.map(download_page, pages)
    print(status)

Now, we want to analyse the downloaded pages. In the following example, we count the number of URLs containing “wikipedia.org”.

import os


def analyse_file(filename):
    with open(filename, "r") as w:
        data = w.read()
        tokens = data.split(",")
        urls = list(filter(lambda w: "wikipedia.org" in w, tokens))
        return len(urls)
    return 0


files = os.listdir(".")
json_files = list(filter(lambda f: ".json" in f, files))

with mp.Pool(processes=cpu_count) as pool:
    counts = pool.map(analyse_file, json_files)
    print(counts)
    total_count = reduce(lambda x, y: x + y, counts)
    print(total_count)

Question: Write a program that queries Wikidata to obtain 100 image URLs of cities and downloads the images to your machine using multiprocessing and map(). The program must then analyse every downloaded image and find two predominant colours of each image, again using multiprocessing and map().

Parallel Data Processing with Multiprocessing#

Question

Write a program that takes a list of URLs pointing to text files hosted online.

  1. Download all the text files concurrently using multiprocessing.Pool() and map().

  2. Once downloaded, perform the following tasks in parallel:

    • Count the total number of words in each file.

    • Count the frequency of each word across all files (case-insensitive).

  3. Aggregate the results and display the 10 most common words and their frequencies.

Note: : Use multiprocessing.Pool() and map() for processing.

Image Analysis with Multiprocessing#

Question

Write a program that takes a list of image file paths and processes them concurrently to perform the following tasks:

  1. Resize all images to a fixed resolution (e.g., 128x128).

  2. Convert the images to grayscale and compute their average intensity.

  3. Generate thumbnails for all images and save them in a specified directory.

  4. Aggregate the results and display:

    • The image with the highest average intensity.

    • The image with the lowest average intensity.

Note: : Use multiprocessing.Pool() and map() for processing.