Practical 6: Docker for Data Processing Pipelines

Goals¶

This practical session introduces Docker containerization for building scalable and reproducible data processing pipelines. You will learn how to package applications, manage multi-container environments, and deploy data processing workflows.

Learning Objectives¶

Understand Docker architecture and containerization concepts
Write Dockerfiles to package Python applications
Use Docker Compose for multi-container orchestration
Implement data pipelines with shared volumes
Build producer-consumer patterns with message queues
Connect applications to databases in containers
Implement frontend-backend architectures
Deploy and scale data processing applications

Prerequisites¶

Completion of Practical 5 (Apache Spark)
Docker Desktop installed (Installation Guide)
Basic understanding of Linux commands
Python programming fundamentals

Installation¶

Verify Docker is installed:

docker --version
docker-compose --version

Exercises Overview¶

Exercise	Topic	Difficulty
1	Docker Fundamentals and Basic Commands	★
2	Writing Dockerfiles for Python Applications	★
3	Docker Compose for Multi-Container Applications	★★
4	Data Pipelines with Shared Volumes	★★
5	Producer-Consumer with Message Queues	★★
6	Application-Database Integration	★★
7	Frontend-Backend Architectures	★★★
8	Scaling and Monitoring Containers	★★★

Exercise 1: Docker Fundamentals and Basic Commands [★]¶

Docker Architecture¶

Docker uses a client-server architecture:

┌─────────────────────────────────────────────────────────────┐
│                     Docker Host                              │
│  ┌─────────────┐    ┌─────────────────────────────────────┐ │
│  │   Docker    │    │          Docker Daemon               │ │
│  │   Client    │◄──►│  ┌─────────┐  ┌─────────┐           │ │
│  │   (CLI)     │    │  │Container│  │Container│           │ │
│  └─────────────┘    │  │   1     │  │   2     │           │ │
│                     │  └─────────┘  └─────────┘           │ │
│                     │       │            │                 │ │
│                     │  ┌────┴────────────┴────┐           │ │
│                     │  │      Images          │           │ │
│                     │  └─────────────────────┘           │ │
│                     └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Concepts¶

Image: Read-only template with instructions for creating a container
Container: Runnable instance of an image
Dockerfile: Text file with instructions to build an image
Registry: Storage for Docker images (e.g., Docker Hub)

Basic Docker Commands¶

Run the following commands in your terminal to familiarize yourself with Docker:

# Check Docker version
docker --version

# View system-wide information
docker info

# List available images
docker images

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

Running Your First Container¶

# Run a simple hello-world container
docker run hello-world

# Run an interactive Python container
docker run -it python:3.10 python

# Run a container with a specific command
docker run python:3.10 python -c "print('Hello from Docker!')"

# Run a container in the background (detached mode)
docker run -d --name my_python python:3.10 sleep 60

# Stop a running container
docker stop my_python

# Remove a container
docker rm my_python

Container Lifecycle¶

┌─────────┐   docker run   ┌─────────┐   docker stop   ┌─────────┐
│ Created │───────────────►│ Running │────────────────►│ Stopped │
└─────────┘                └─────────┘                 └─────────┘
     │                          │                           │
     │                          │ docker pause              │
     │                          ▼                           │
     │                    ┌─────────┐                       │
     │                    │ Paused  │                       │
     │                    └─────────┘                       │
     │                                                      │
     └──────────────────────────────────────────────────────┘
                        docker rm

# View container logs
docker logs <container_id>

# Execute command in running container
docker exec -it <container_id> bash

# Copy files to/from container
docker cp local_file.txt <container_id>:/path/in/container/
docker cp <container_id>:/path/in/container/file.txt ./local_file.txt

# View container resource usage
docker stats

Questions - Exercise 1¶

Q1.1 Run a Python container that prints the system’s Python version, OS name, and current date/time. Capture the output.

Q1.2 Run an Ubuntu container interactively. Inside the container:

Update the package list
Install curl
Download a web page
Exit the container

Q1.3 Run three containers in detached mode with different names. Use docker ps to verify they’re running, then stop and remove all of them using a single command each.

Exercise 2: Writing Dockerfiles for Python Applications [★]¶

Dockerfile Basics¶

A Dockerfile is a script containing instructions to build a Docker image.

Common Dockerfile Instructions¶

Instruction	Description
`FROM`	Base image to start from
`WORKDIR`	Set working directory
`COPY`	Copy files from host to image
`RUN`	Execute commands during build
`ENV`	Set environment variables
`EXPOSE`	Document which ports the container listens on
`CMD`	Default command when container starts
`ENTRYPOINT`	Configure container to run as executable

Example: Simple Python Application¶

Create a file app.py:

# app.py
import sys
import platform
from datetime import datetime

def main():
    print(f"Python version: {sys.version}")
    print(f"Platform: {platform.platform()}")
    print(f"Current time: {datetime.now()}")
    print("Hello from Docker!")

if __name__ == "__main__":
    main()

Create a Dockerfile:

# Use official Python image as base
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy application code
COPY app.py .

# Set the default command
CMD ["python", "app.py"]

Build and run:

# Build the image
docker build -t my-python-app .

# Run the container
docker run my-python-app

Example: Python Application with Dependencies¶

Create requirements.txt:

pandas==2.0.0
numpy==1.24.0
requests==2.28.0

Create data_processor.py:

import pandas as pd
import numpy as np

def process_data():
    # Create sample data
    data = {
        'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
        'value': np.random.randint(1, 100, 4)
    }
    df = pd.DataFrame(data)
    
    print("Data Processing Results:")
    print(df)
    print(f"\nSum: {df['value'].sum()}")
    print(f"Mean: {df['value'].mean():.2f}")

if __name__ == "__main__":
    process_data()

Optimized Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Copy requirements first (for better layer caching)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY data_processor.py .

CMD ["python", "data_processor.py"]

Multi-Stage Builds¶

Multi-stage builds help create smaller production images:

# Build stage
FROM python:3.10 AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.10-slim

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

COPY app.py .

CMD ["python", "app.py"]

Best Practices for Dockerfiles¶

Use specific base image tags: python:3.10-slim instead of python:latest
Order instructions by frequency of change: Copy requirements before code
Use .dockerignore: Exclude unnecessary files
Minimize layers: Combine related RUN commands
Don’t run as root: Create a non-root user when possible
Use multi-stage builds: For smaller production images

Example .dockerignore:

__pycache__
*.pyc
*.pyo
.git
.gitignore
*.md
.env
venv/
.pytest_cache/

Questions - Exercise 2¶

Q2.1 Create a Dockerfile for a PySpark application that:

Uses bitnami/spark as the base image
Installs additional Python packages (pandas, matplotlib)
Copies a Spark script that processes CSV data
Runs the script when the container starts

Q2.2 Create a Dockerfile that:

Uses a non-root user for security
Implements health checks
Uses environment variables for configuration
Includes proper labeling (maintainer, version, description)

Q2.3 Compare the image sizes of:

A simple Dockerfile using python:3.10
The same application using python:3.10-slim
A multi-stage build version

Document the size differences and explain when each approach is appropriate.

Exercise 3: Docker Compose for Multi-Container Applications [★★]¶

Docker Compose Overview¶

Docker Compose allows you to define and run multi-container applications using a YAML file.

Basic docker-compose.yml Structure¶

version: "3.8"

services:
  service_name:
    image: image_name:tag
    # OR build from Dockerfile
    build: ./path/to/dockerfile
    ports:
      - "host_port:container_port"
    volumes:
      - ./local/path:/container/path
    environment:
      - VAR_NAME=value
    depends_on:
      - other_service

volumes:
  named_volume:

networks:
  custom_network:

Docker Compose Commands¶

# Start all services
docker-compose up

# Start in detached mode
docker-compose up -d

# Build images before starting
docker-compose up --build

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

# View logs
docker-compose logs
docker-compose logs -f service_name

# Scale a service
docker-compose up --scale service_name=3

# Execute command in a service
docker-compose exec service_name command

Example: Web Application with Redis¶

Create app.py:

from flask import Flask
import redis

app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)

@app.route('/')
def hello():
    count = cache.incr('hits')
    return f'Hello! This page has been viewed {count} times.'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Create requirements.txt:

flask
redis

Create Dockerfile:

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

EXPOSE 5000

CMD ["python", "app.py"]

Create docker-compose.yml:

version: "3.8"

services:
  web:
    build: .
    ports:
      - "5000:5000"
    depends_on:
      - redis
    environment:
      - FLASK_ENV=development

  redis:
    image: redis:alpine
    volumes:
      - redis_data:/data

volumes:
  redis_data:

Run with:

docker-compose up --build

Service Dependencies and Health Checks¶

version: "3.8"

services:
  web:
    build: .
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

Questions - Exercise 3¶

Q3.1 Create a Docker Compose configuration for a data processing pipeline with:

A Python data generator service
A Redis service for caching
A data processor service that reads from Redis
Proper service dependencies

Q3.2 Modify the previous example to use:

Custom networks for service isolation
Environment files (.env)
Volume mounts for data persistence

Q3.3 Create a Docker Compose file that starts a Jupyter Notebook server with:

Pre-installed data science libraries (pandas, numpy, matplotlib, sklearn)
Persistent notebook storage
Access to a shared data volume

Exercise 4: Data Pipelines with Shared Volumes [★★]¶

Shared Volumes for Container Communication¶

Shared volumes allow containers to exchange data through the file system.

┌─────────────────┐       ┌─────────────────┐
│    Uploader     │       │    Processor    │
│    Container    │       │    Container    │
│                 │       │                 │
│   writes to     │       │   reads from    │
│   /shared       │       │   /shared       │
└────────┬────────┘       └────────┬────────┘
         │                         │
         └─────────┬───────────────┘
                   │
            ┌──────┴──────┐
            │   Shared    │
            │   Volume    │
            └─────────────┘

Example: File Processing Pipeline¶

Navigate to the SharedVolume folder in this practical:

cd SharedVolume

Examine the existing structure:

Uploader Service (Uploader/upload.py):

import time
from shutil import copyfile

def upload_file():
    while True:
        # Simulate uploading a new file every 5 seconds
        print("Uploading new file...")
        copyfile("sample.txt", "/shared/sample_uploaded.txt")
        time.sleep(5)

if __name__ == "__main__":
    upload_file()

Processor Service (Processor/process.py):

import time
import os

def process_files():
    while True:
        if os.path.exists("/shared/sample_uploaded.txt"):
            with open("/shared/sample_uploaded.txt", "r") as f:
                content = f.read()
            print(f"Processing: {content}")
            # Process the file...
            os.remove("/shared/sample_uploaded.txt")
        else:
            print("Waiting for files...")
        time.sleep(2)

if __name__ == "__main__":
    process_files()

docker-compose.yml:

version: "3.8"

services:
  uploader:
    build:
      context: ./uploader
    volumes:
      - ./shared:/shared
    depends_on:
      - processor

  processor:
    build:
      context: ./processor
    volumes:
      - ./shared:/shared

Run with:

docker-compose up --build

Enhanced Data Pipeline Example¶

Create a more sophisticated data pipeline:

data_generator.py:

import json
import time
import random
from datetime import datetime

def generate_data():
    counter = 0
    while True:
        data = {
            "id": counter,
            "timestamp": datetime.now().isoformat(),
            "sensor_id": f"sensor_{random.randint(1, 10)}",
            "temperature": round(random.uniform(20, 35), 2),
            "humidity": round(random.uniform(30, 80), 2)
        }
        
        filename = f"/shared/input/data_{counter}.json"
        with open(filename, 'w') as f:
            json.dump(data, f)
        
        print(f"Generated: {filename}")
        counter += 1
        time.sleep(2)

if __name__ == "__main__":
    import os
    os.makedirs("/shared/input", exist_ok=True)
    generate_data()

data_processor.py:

import json
import os
import time

def process_files():
    os.makedirs("/shared/output", exist_ok=True)
    
    while True:
        input_dir = "/shared/input"
        if os.path.exists(input_dir):
            files = [f for f in os.listdir(input_dir) if f.endswith('.json')]
            
            for filename in files:
                filepath = os.path.join(input_dir, filename)
                
                with open(filepath, 'r') as f:
                    data = json.load(f)
                
                # Process the data
                data['processed'] = True
                data['temp_fahrenheit'] = round(data['temperature'] * 9/5 + 32, 2)
                
                # Write to output
                output_path = f"/shared/output/processed_{filename}"
                with open(output_path, 'w') as f:
                    json.dump(data, f, indent=2)
                
                # Remove input file
                os.remove(filepath)
                print(f"Processed: {filename}")
        
        time.sleep(1)

if __name__ == "__main__":
    process_files()

Questions - Exercise 4¶

Q4.1 Extend the SharedVolume example to:

Add a third service that aggregates processed files
Generate statistics (average temperature, humidity by sensor)
Output a summary report every minute

Q4.2 Implement error handling in the pipeline:

Move failed files to an “error” directory
Log errors with timestamps
Add a monitoring service that reports pipeline health

Q4.3 Create a parallel processing pipeline:

Multiple processor containers (use --scale)
Implement file locking to prevent duplicate processing
Measure throughput with different numbers of processors

Exercise 5: Producer-Consumer with Message Queues [★★]¶

Message Queue Pattern¶

Message queues decouple producers and consumers, enabling:

Asynchronous processing
Load balancing
Fault tolerance

┌──────────┐     ┌─────────────┐     ┌──────────┐
│ Producer │────►│   Message   │────►│ Consumer │
│    1     │     │    Queue    │     │    1     │
└──────────┘     │  (RabbitMQ) │     └──────────┘
┌──────────┐     │             │     ┌──────────┐
│ Producer │────►│             │────►│ Consumer │
│    2     │     └─────────────┘     │    2     │
└──────────┘                         └──────────┘

RabbitMQ Example¶

Navigate to the ProducerConsumerRabbitMQ folder:

cd ProducerConsumerRabbitMQ

producer/producer.py:

import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)

for i in range(100):
    msg = f"Task #{i}"
    channel.basic_publish(
        exchange='',
        routing_key='task_queue',
        body=msg,
        properties=pika.BasicProperties(delivery_mode=2)  # Make message persistent
    )
    print(f"Sent: {msg}")
    time.sleep(1)

connection.close()

consumer/consumer.py:

import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

def callback(ch, method, properties, body):
    print(f"Received: {body.decode()}")
    time.sleep(0.5)  # Simulate processing
    print(f"Processed: {body.decode()}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
channel.basic_qos(prefetch_count=1)  # Fair dispatch
channel.basic_consume(queue='task_queue', on_message_callback=callback)

print('Waiting for messages...')
channel.start_consuming()

docker-compose.yml:

services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"   # AMQP protocol
      - "15672:15672" # Management UI
    environment:
      RABBITMQ_DEFAULT_USER: guest
      RABBITMQ_DEFAULT_PASS: guest

  producer:
    build: ./producer
    depends_on:
      - rabbitmq

  consumer:
    build: ./consumer
    depends_on:
      - rabbitmq

Run with:

docker-compose up --build

# Scale consumers
docker-compose up --scale consumer=3

Access RabbitMQ management UI at: http://localhost:15672 (guest/guest)

Data Processing with Message Queues¶

Enhanced producer for data processing:

# data_producer.py
import pika
import json
import random
import time
from datetime import datetime

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            time.sleep(2)
    raise Exception("Could not connect")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='data_queue', durable=True)

sensors = ['temperature', 'humidity', 'pressure']

while True:
    data = {
        'sensor_type': random.choice(sensors),
        'value': round(random.uniform(0, 100), 2),
        'timestamp': datetime.now().isoformat()
    }
    
    channel.basic_publish(
        exchange='',
        routing_key='data_queue',
        body=json.dumps(data),
        properties=pika.BasicProperties(delivery_mode=2)
    )
    
    print(f"Sent: {data}")
    time.sleep(0.5)

Questions - Exercise 5¶

Q5.1 Extend the RabbitMQ example to:

Use topic-based routing (different queues for different data types)
Implement multiple consumer types (one for each sensor type)
Store processed data in a shared volume

Q5.2 Implement dead letter handling:

Configure a dead letter queue for failed messages
Add a retry mechanism (max 3 retries)
Create a monitoring consumer that alerts on DLQ messages

Q5.3 Compare RabbitMQ with Redis Pub/Sub:

Implement the same producer-consumer pattern with Redis
Measure message throughput
Document the trade-offs between the two approaches

Exercise 6: Application-Database Integration [★★]¶

Connecting Applications to Databases¶

Navigate to the AppDB folder:

cd AppDB

This example demonstrates a Flask application connected to PostgreSQL.

app/app.py:

from flask import Flask
import psycopg2

app = Flask(__name__)

@app.route("/")
def index():
    conn = psycopg2.connect(
        host="bd",  # Service name in Docker
        database="livres",
        user="postgres",
        password="postgres"
    )
    cur = conn.cursor()
    cur.execute("SELECT titre FROM livres")
    livres = cur.fetchall()
    cur.close()
    conn.close()
    return "<br>".join(title for (title,) in livres)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

init_bd/init.sql:

CREATE TABLE IF NOT EXISTS livres (
    id SERIAL PRIMARY KEY,
    titre VARCHAR(255) NOT NULL,
    auteur VARCHAR(255),
    annee INTEGER
);

INSERT INTO livres (titre, auteur, annee) VALUES
    ('Les Misérables', 'Victor Hugo', 1862),
    ('Le Petit Prince', 'Antoine de Saint-Exupéry', 1943),
    ('L''Étranger', 'Albert Camus', 1942);

docker-compose.yml:

services:
  app:
    build: ./app
    ports:
      - "5000:5000"
    depends_on:
      - bd

  bd:
    image: postgres:15
    environment:
      POSTGRES_DB: livres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - ./init_bd:/docker-entrypoint-initdb.d
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  postgres_data:

Run with:

docker-compose up --build

Access at: http://localhost:5000

Enhanced Example with SQLAlchemy¶

# app_enhanced.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
import os

app = Flask(__name__)

# Database configuration from environment
db_host = os.environ.get('DB_HOST', 'bd')
db_name = os.environ.get('DB_NAME', 'livres')
db_user = os.environ.get('DB_USER', 'postgres')
db_pass = os.environ.get('DB_PASS', 'postgres')

app.config['SQLALCHEMY_DATABASE_URI'] = f'postgresql://{db_user}:{db_pass}@{db_host}/{db_name}'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

db = SQLAlchemy(app)

class Book(db.Model):
    __tablename__ = 'livres'
    id = db.Column(db.Integer, primary_key=True)
    titre = db.Column(db.String(255), nullable=False)
    auteur = db.Column(db.String(255))
    annee = db.Column(db.Integer)

@app.route('/books')
def get_books():
    books = Book.query.all()
    return jsonify([{
        'id': b.id,
        'title': b.titre,
        'author': b.auteur,
        'year': b.annee
    } for b in books])

@app.route('/books', methods=['POST'])
def add_book():
    data = request.json
    book = Book(titre=data['title'], auteur=data['author'], annee=data['year'])
    db.session.add(book)
    db.session.commit()
    return jsonify({'id': book.id}), 201

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Questions - Exercise 6¶

Q6.1 Extend the AppDB example to include:

CRUD operations (Create, Read, Update, Delete)
Input validation
Error handling with appropriate HTTP status codes

Q6.2 Add data analytics capabilities:

Endpoint to get books by year range
Statistics endpoint (count by author, books per decade)
Full-text search capability

Q6.3 Implement a data import service:

Create a separate container that imports CSV data into the database
Watch a shared volume for new CSV files
Log import results and errors

Exercise 7: Frontend-Backend Architectures [★★★]¶

Microservices Architecture¶

Navigate to the WebAppFrontBack folder:

cd WebAppFrontBack

This example demonstrates a React frontend with a Flask backend.

┌─────────────────┐      ┌─────────────────┐
│    Frontend     │      │    Backend      │
│    (React)      │─────►│    (Flask)      │
│   Port: 3000    │      │   Port: 5000    │
└─────────────────┘      └─────────────────┘

Backend API (Flask)¶

backend/app.py:

from flask import Flask, jsonify, request
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # Enable Cross-Origin requests

# In-memory data store
tasks = [
    {"id": 1, "title": "Learn Docker", "completed": True},
    {"id": 2, "title": "Build a pipeline", "completed": False}
]

@app.route('/api/tasks', methods=['GET'])
def get_tasks():
    return jsonify(tasks)

@app.route('/api/tasks', methods=['POST'])
def add_task():
    data = request.json
    new_task = {
        "id": len(tasks) + 1,
        "title": data['title'],
        "completed": False
    }
    tasks.append(new_task)
    return jsonify(new_task), 201

@app.route('/api/tasks/<int:task_id>', methods=['PUT'])
def update_task(task_id):
    task = next((t for t in tasks if t['id'] == task_id), None)
    if task:
        data = request.json
        task['completed'] = data.get('completed', task['completed'])
        return jsonify(task)
    return jsonify({"error": "Task not found"}), 404

@app.route('/api/tasks/<int:task_id>', methods=['DELETE'])
def delete_task(task_id):
    global tasks
    tasks = [t for t in tasks if t['id'] != task_id]
    return '', 204

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Docker Compose for Full Stack¶

docker-compose.yml:

version: "3.8"

services:
  frontend:
    build:
      context: ./frontend
    ports:
      - "3000:3000"
    depends_on:
      - backend
    environment:
      - REACT_APP_API_URL=http://localhost:5000

  backend:
    build:
      context: ./backend
    ports:
      - "5000:5000"
    volumes:
      - ./backend:/app
    environment:
      - FLASK_ENV=development

Adding Nginx as Reverse Proxy¶

For production deployments, use Nginx as a reverse proxy:

nginx.conf:

upstream frontend {
    server frontend:3000;
}

upstream backend {
    server backend:5000;
}

server {
    listen 80;

    location / {
        proxy_pass http://frontend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    location /api {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

docker-compose.prod.yml:

version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - frontend
      - backend

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.prod

  backend:
    build:
      context: ./backend

Questions - Exercise 7¶

Q7.1 Extend the frontend-backend example to include:

User authentication (login/logout)
Protected routes
JWT token handling

Q7.2 Add a database to the stack:

Replace in-memory storage with PostgreSQL
Add database migrations
Implement data persistence across restarts

Q7.3 Create a data visualization dashboard:

Backend API that serves analytics data
Frontend with charts (using Chart.js or similar)
Real-time updates using WebSockets

Exercise 8: Scaling and Monitoring Containers [★★★]¶

Container Scaling¶

# Scale a specific service
docker-compose up --scale worker=5

# View running containers
docker-compose ps

# View resource usage
docker stats

Load Balancing with Nginx¶

docker-compose.yml:

version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api

  api:
    build: .
    # No ports exposed - accessed through nginx
    deploy:
      replicas: 3

nginx.conf for load balancing:

events {
    worker_connections 1024;
}

http {
    upstream api_servers {
        least_conn;  # Load balancing method
        server api:5000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Monitoring with Prometheus and Grafana¶

docker-compose.monitoring.yml:

version: "3.8"

services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

volumes:
  prometheus_data:
  grafana_data:

prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Resource Limits¶

version: "3.8"

services:
  api:
    build: .
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M

Questions - Exercise 8¶

Q8.1 Create a scalable data processing pipeline:

Producer service generating data
Worker services that can be scaled (1-10 instances)
Load balancer distributing work
Measure throughput with different numbers of workers

Q8.2 Set up monitoring for your application:

Configure Prometheus to collect metrics
Create Grafana dashboards for:
- CPU and memory usage
- Request rates and latencies
- Error rates

Q8.3 Implement auto-scaling simulation:

Monitor CPU usage of worker containers
Create a script that scales workers based on load
Test with varying load patterns

Summary¶

In this practical, you learned:

Docker Fundamentals: Images, containers, and basic commands
Dockerfiles: Writing efficient Dockerfiles for Python applications
Docker Compose: Orchestrating multi-container applications
Shared Volumes: Building data pipelines with file-based communication
Message Queues: Producer-consumer patterns with RabbitMQ
Database Integration: Connecting applications to PostgreSQL
Frontend-Backend: Building full-stack applications
Scaling and Monitoring: Load balancing and observability

Key Takeaways¶

Use Docker Compose for development and testing
Implement proper health checks for service dependencies
Use volumes for data persistence
Choose the right communication pattern (files, messages, API)
Monitor and scale based on metrics

Next Steps¶

In Practical 7, you will learn about Kubernetes for:

Production-grade container orchestration
Declarative configuration management
Automatic scaling and self-healing
Service discovery and load balancing

Practical 6: Docker for Data Processing Pipelines

Goals¶

Learning Objectives¶

Prerequisites¶

Installation¶

Exercises Overview¶

Exercise 1: Docker Fundamentals and Basic Commands [★]¶

Docker Architecture¶

Key Concepts¶

Basic Docker Commands¶

Running Your First Container¶

Container Lifecycle¶

Questions - Exercise 1¶

Exercise 2: Writing Dockerfiles for Python Applications [★]¶

Dockerfile Basics¶

Common Dockerfile Instructions¶

Example: Simple Python Application¶

Example: Python Application with Dependencies¶

Multi-Stage Builds¶

Best Practices for Dockerfiles¶

Questions - Exercise 2¶

Exercise 3: Docker Compose for Multi-Container Applications [★★]¶

Docker Compose Overview¶

Basic docker-compose.yml Structure¶

Docker Compose Commands¶

Example: Web Application with Redis¶

Service Dependencies and Health Checks¶

Questions - Exercise 3¶

Exercise 4: Data Pipelines with Shared Volumes [★★]¶

Shared Volumes for Container Communication¶

Example: File Processing Pipeline¶

Enhanced Data Pipeline Example¶

Questions - Exercise 4¶

Exercise 5: Producer-Consumer with Message Queues [★★]¶

Message Queue Pattern¶

RabbitMQ Example¶

Data Processing with Message Queues¶

Questions - Exercise 5¶

Exercise 6: Application-Database Integration [★★]¶

Connecting Applications to Databases¶

Enhanced Example with SQLAlchemy¶

Questions - Exercise 6¶

Exercise 7: Frontend-Backend Architectures [★★★]¶

Microservices Architecture¶

Backend API (Flask)¶

Docker Compose for Full Stack¶

Adding Nginx as Reverse Proxy¶

Questions - Exercise 7¶

Exercise 8: Scaling and Monitoring Containers [★★★]¶

Container Scaling¶

Load Balancing with Nginx¶

Monitoring with Prometheus and Grafana¶

Resource Limits¶

Questions - Exercise 8¶

Summary¶

Key Takeaways¶

Next Steps¶

Further Reading¶