Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Practical 6: Docker for Data Processing Pipelines

Goals

This practical session introduces Docker containerization for building scalable and reproducible data processing pipelines. You will learn how to package applications, manage multi-container environments, and deploy data processing workflows.

Learning Objectives

  • Understand Docker architecture and containerization concepts

  • Write Dockerfiles to package Python applications

  • Use Docker Compose for multi-container orchestration

  • Implement data pipelines with shared volumes

  • Build producer-consumer patterns with message queues

  • Connect applications to databases in containers

  • Implement frontend-backend architectures

  • Deploy and scale data processing applications

Prerequisites

  • Completion of Practical 5 (Apache Spark)

  • Docker Desktop installed (Installation Guide)

  • Basic understanding of Linux commands

  • Python programming fundamentals

Installation

Verify Docker is installed:

docker --version
docker-compose --version

Exercises Overview

ExerciseTopicDifficulty
1Docker Fundamentals and Basic Commands
2Writing Dockerfiles for Python Applications
3Docker Compose for Multi-Container Applications★★
4Data Pipelines with Shared Volumes★★
5Producer-Consumer with Message Queues★★
6Application-Database Integration★★
7Frontend-Backend Architectures★★★
8Scaling and Monitoring Containers★★★

Exercise 1: Docker Fundamentals and Basic Commands [★]

Docker Architecture

Docker uses a client-server architecture:

┌─────────────────────────────────────────────────────────────┐
│                     Docker Host                              │
│  ┌─────────────┐    ┌─────────────────────────────────────┐ │
│  │   Docker    │    │          Docker Daemon               │ │
│  │   Client    │◄──►│  ┌─────────┐  ┌─────────┐           │ │
│  │   (CLI)     │    │  │Container│  │Container│           │ │
│  └─────────────┘    │  │   1     │  │   2     │           │ │
│                     │  └─────────┘  └─────────┘           │ │
│                     │       │            │                 │ │
│                     │  ┌────┴────────────┴────┐           │ │
│                     │  │      Images          │           │ │
│                     │  └─────────────────────┘           │ │
│                     └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Concepts

  • Image: Read-only template with instructions for creating a container

  • Container: Runnable instance of an image

  • Dockerfile: Text file with instructions to build an image

  • Registry: Storage for Docker images (e.g., Docker Hub)

Basic Docker Commands

Run the following commands in your terminal to familiarize yourself with Docker:

# Check Docker version
docker --version

# View system-wide information
docker info

# List available images
docker images

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

Running Your First Container

# Run a simple hello-world container
docker run hello-world

# Run an interactive Python container
docker run -it python:3.10 python

# Run a container with a specific command
docker run python:3.10 python -c "print('Hello from Docker!')"

# Run a container in the background (detached mode)
docker run -d --name my_python python:3.10 sleep 60

# Stop a running container
docker stop my_python

# Remove a container
docker rm my_python

Container Lifecycle

┌─────────┐   docker run   ┌─────────┐   docker stop   ┌─────────┐
│ Created │───────────────►│ Running │────────────────►│ Stopped │
└─────────┘                └─────────┘                 └─────────┘
     │                          │                           │
     │                          │ docker pause              │
     │                          ▼                           │
     │                    ┌─────────┐                       │
     │                    │ Paused  │                       │
     │                    └─────────┘                       │
     │                                                      │
     └──────────────────────────────────────────────────────┘
                        docker rm
# View container logs
docker logs <container_id>

# Execute command in running container
docker exec -it <container_id> bash

# Copy files to/from container
docker cp local_file.txt <container_id>:/path/in/container/
docker cp <container_id>:/path/in/container/file.txt ./local_file.txt

# View container resource usage
docker stats

Questions - Exercise 1

Q1.1 Run a Python container that prints the system’s Python version, OS name, and current date/time. Capture the output.

Q1.2 Run an Ubuntu container interactively. Inside the container:

  • Update the package list

  • Install curl

  • Download a web page

  • Exit the container

Q1.3 Run three containers in detached mode with different names. Use docker ps to verify they’re running, then stop and remove all of them using a single command each.


Exercise 2: Writing Dockerfiles for Python Applications [★]

Dockerfile Basics

A Dockerfile is a script containing instructions to build a Docker image.

Common Dockerfile Instructions

InstructionDescription
FROMBase image to start from
WORKDIRSet working directory
COPYCopy files from host to image
RUNExecute commands during build
ENVSet environment variables
EXPOSEDocument which ports the container listens on
CMDDefault command when container starts
ENTRYPOINTConfigure container to run as executable

Example: Simple Python Application

Create a file app.py:

# app.py
import sys
import platform
from datetime import datetime

def main():
    print(f"Python version: {sys.version}")
    print(f"Platform: {platform.platform()}")
    print(f"Current time: {datetime.now()}")
    print("Hello from Docker!")

if __name__ == "__main__":
    main()

Create a Dockerfile:

# Use official Python image as base
FROM python:3.10-slim

# Set working directory
WORKDIR /app

# Copy application code
COPY app.py .

# Set the default command
CMD ["python", "app.py"]

Build and run:

# Build the image
docker build -t my-python-app .

# Run the container
docker run my-python-app

Example: Python Application with Dependencies

Create requirements.txt:

pandas==2.0.0
numpy==1.24.0
requests==2.28.0

Create data_processor.py:

import pandas as pd
import numpy as np

def process_data():
    # Create sample data
    data = {
        'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
        'value': np.random.randint(1, 100, 4)
    }
    df = pd.DataFrame(data)
    
    print("Data Processing Results:")
    print(df)
    print(f"\nSum: {df['value'].sum()}")
    print(f"Mean: {df['value'].mean():.2f}")

if __name__ == "__main__":
    process_data()

Optimized Dockerfile:

FROM python:3.10-slim

WORKDIR /app

# Copy requirements first (for better layer caching)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY data_processor.py .

CMD ["python", "data_processor.py"]

Multi-Stage Builds

Multi-stage builds help create smaller production images:

# Build stage
FROM python:3.10 AS builder

WORKDIR /app

COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.10-slim

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

COPY app.py .

CMD ["python", "app.py"]

Best Practices for Dockerfiles

  1. Use specific base image tags: python:3.10-slim instead of python:latest

  2. Order instructions by frequency of change: Copy requirements before code

  3. Use .dockerignore: Exclude unnecessary files

  4. Minimize layers: Combine related RUN commands

  5. Don’t run as root: Create a non-root user when possible

  6. Use multi-stage builds: For smaller production images

Example .dockerignore:

__pycache__
*.pyc
*.pyo
.git
.gitignore
*.md
.env
venv/
.pytest_cache/

Questions - Exercise 2

Q2.1 Create a Dockerfile for a PySpark application that:

  • Uses bitnami/spark as the base image

  • Installs additional Python packages (pandas, matplotlib)

  • Copies a Spark script that processes CSV data

  • Runs the script when the container starts

Q2.2 Create a Dockerfile that:

  • Uses a non-root user for security

  • Implements health checks

  • Uses environment variables for configuration

  • Includes proper labeling (maintainer, version, description)

Q2.3 Compare the image sizes of:

  • A simple Dockerfile using python:3.10

  • The same application using python:3.10-slim

  • A multi-stage build version

Document the size differences and explain when each approach is appropriate.


Exercise 3: Docker Compose for Multi-Container Applications [★★]

Docker Compose Overview

Docker Compose allows you to define and run multi-container applications using a YAML file.

Basic docker-compose.yml Structure

version: "3.8"

services:
  service_name:
    image: image_name:tag
    # OR build from Dockerfile
    build: ./path/to/dockerfile
    ports:
      - "host_port:container_port"
    volumes:
      - ./local/path:/container/path
    environment:
      - VAR_NAME=value
    depends_on:
      - other_service

volumes:
  named_volume:

networks:
  custom_network:

Docker Compose Commands

# Start all services
docker-compose up

# Start in detached mode
docker-compose up -d

# Build images before starting
docker-compose up --build

# Stop all services
docker-compose down

# Stop and remove volumes
docker-compose down -v

# View logs
docker-compose logs
docker-compose logs -f service_name

# Scale a service
docker-compose up --scale service_name=3

# Execute command in a service
docker-compose exec service_name command

Example: Web Application with Redis

Create app.py:

from flask import Flask
import redis

app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)

@app.route('/')
def hello():
    count = cache.incr('hits')
    return f'Hello! This page has been viewed {count} times.'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Create requirements.txt:

flask
redis

Create Dockerfile:

FROM python:3.10-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py .

EXPOSE 5000

CMD ["python", "app.py"]

Create docker-compose.yml:

version: "3.8"

services:
  web:
    build: .
    ports:
      - "5000:5000"
    depends_on:
      - redis
    environment:
      - FLASK_ENV=development

  redis:
    image: redis:alpine
    volumes:
      - redis_data:/data

volumes:
  redis_data:

Run with:

docker-compose up --build

Service Dependencies and Health Checks

version: "3.8"

services:
  web:
    build: .
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

Questions - Exercise 3

Q3.1 Create a Docker Compose configuration for a data processing pipeline with:

  • A Python data generator service

  • A Redis service for caching

  • A data processor service that reads from Redis

  • Proper service dependencies

Q3.2 Modify the previous example to use:

  • Custom networks for service isolation

  • Environment files (.env)

  • Volume mounts for data persistence

Q3.3 Create a Docker Compose file that starts a Jupyter Notebook server with:

  • Pre-installed data science libraries (pandas, numpy, matplotlib, sklearn)

  • Persistent notebook storage

  • Access to a shared data volume


Exercise 4: Data Pipelines with Shared Volumes [★★]

Shared Volumes for Container Communication

Shared volumes allow containers to exchange data through the file system.

┌─────────────────┐       ┌─────────────────┐
│    Uploader     │       │    Processor    │
│    Container    │       │    Container    │
│                 │       │                 │
│   writes to     │       │   reads from    │
│   /shared       │       │   /shared       │
└────────┬────────┘       └────────┬────────┘
         │                         │
         └─────────┬───────────────┘
                   │
            ┌──────┴──────┐
            │   Shared    │
            │   Volume    │
            └─────────────┘

Example: File Processing Pipeline

Navigate to the SharedVolume folder in this practical:

cd SharedVolume

Examine the existing structure:

Uploader Service (Uploader/upload.py):

import time
from shutil import copyfile

def upload_file():
    while True:
        # Simulate uploading a new file every 5 seconds
        print("Uploading new file...")
        copyfile("sample.txt", "/shared/sample_uploaded.txt")
        time.sleep(5)

if __name__ == "__main__":
    upload_file()

Processor Service (Processor/process.py):

import time
import os

def process_files():
    while True:
        if os.path.exists("/shared/sample_uploaded.txt"):
            with open("/shared/sample_uploaded.txt", "r") as f:
                content = f.read()
            print(f"Processing: {content}")
            # Process the file...
            os.remove("/shared/sample_uploaded.txt")
        else:
            print("Waiting for files...")
        time.sleep(2)

if __name__ == "__main__":
    process_files()

docker-compose.yml:

version: "3.8"

services:
  uploader:
    build:
      context: ./uploader
    volumes:
      - ./shared:/shared
    depends_on:
      - processor

  processor:
    build:
      context: ./processor
    volumes:
      - ./shared:/shared

Run with:

docker-compose up --build

Enhanced Data Pipeline Example

Create a more sophisticated data pipeline:

data_generator.py:

import json
import time
import random
from datetime import datetime

def generate_data():
    counter = 0
    while True:
        data = {
            "id": counter,
            "timestamp": datetime.now().isoformat(),
            "sensor_id": f"sensor_{random.randint(1, 10)}",
            "temperature": round(random.uniform(20, 35), 2),
            "humidity": round(random.uniform(30, 80), 2)
        }
        
        filename = f"/shared/input/data_{counter}.json"
        with open(filename, 'w') as f:
            json.dump(data, f)
        
        print(f"Generated: {filename}")
        counter += 1
        time.sleep(2)

if __name__ == "__main__":
    import os
    os.makedirs("/shared/input", exist_ok=True)
    generate_data()

data_processor.py:

import json
import os
import time

def process_files():
    os.makedirs("/shared/output", exist_ok=True)
    
    while True:
        input_dir = "/shared/input"
        if os.path.exists(input_dir):
            files = [f for f in os.listdir(input_dir) if f.endswith('.json')]
            
            for filename in files:
                filepath = os.path.join(input_dir, filename)
                
                with open(filepath, 'r') as f:
                    data = json.load(f)
                
                # Process the data
                data['processed'] = True
                data['temp_fahrenheit'] = round(data['temperature'] * 9/5 + 32, 2)
                
                # Write to output
                output_path = f"/shared/output/processed_{filename}"
                with open(output_path, 'w') as f:
                    json.dump(data, f, indent=2)
                
                # Remove input file
                os.remove(filepath)
                print(f"Processed: {filename}")
        
        time.sleep(1)

if __name__ == "__main__":
    process_files()

Questions - Exercise 4

Q4.1 Extend the SharedVolume example to:

  • Add a third service that aggregates processed files

  • Generate statistics (average temperature, humidity by sensor)

  • Output a summary report every minute

Q4.2 Implement error handling in the pipeline:

  • Move failed files to an “error” directory

  • Log errors with timestamps

  • Add a monitoring service that reports pipeline health

Q4.3 Create a parallel processing pipeline:

  • Multiple processor containers (use --scale)

  • Implement file locking to prevent duplicate processing

  • Measure throughput with different numbers of processors


Exercise 5: Producer-Consumer with Message Queues [★★]

Message Queue Pattern

Message queues decouple producers and consumers, enabling:

  • Asynchronous processing

  • Load balancing

  • Fault tolerance

┌──────────┐     ┌─────────────┐     ┌──────────┐
│ Producer │────►│   Message   │────►│ Consumer │
│    1     │     │    Queue    │     │    1     │
└──────────┘     │  (RabbitMQ) │     └──────────┘
┌──────────┐     │             │     ┌──────────┐
│ Producer │────►│             │────►│ Consumer │
│    2     │     └─────────────┘     │    2     │
└──────────┘                         └──────────┘

RabbitMQ Example

Navigate to the ProducerConsumerRabbitMQ folder:

cd ProducerConsumerRabbitMQ

producer/producer.py:

import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)

for i in range(100):
    msg = f"Task #{i}"
    channel.basic_publish(
        exchange='',
        routing_key='task_queue',
        body=msg,
        properties=pika.BasicProperties(delivery_mode=2)  # Make message persistent
    )
    print(f"Sent: {msg}")
    time.sleep(1)

connection.close()

consumer/consumer.py:

import pika
import time

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            print("Retrying connection to RabbitMQ...")
            time.sleep(2)
    raise Exception("Could not connect to RabbitMQ")

def callback(ch, method, properties, body):
    print(f"Received: {body.decode()}")
    time.sleep(0.5)  # Simulate processing
    print(f"Processed: {body.decode()}")
    ch.basic_ack(delivery_tag=method.delivery_tag)

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
channel.basic_qos(prefetch_count=1)  # Fair dispatch
channel.basic_consume(queue='task_queue', on_message_callback=callback)

print('Waiting for messages...')
channel.start_consuming()

docker-compose.yml:

services:
  rabbitmq:
    image: rabbitmq:3-management
    ports:
      - "5672:5672"   # AMQP protocol
      - "15672:15672" # Management UI
    environment:
      RABBITMQ_DEFAULT_USER: guest
      RABBITMQ_DEFAULT_PASS: guest

  producer:
    build: ./producer
    depends_on:
      - rabbitmq

  consumer:
    build: ./consumer
    depends_on:
      - rabbitmq

Run with:

docker-compose up --build

# Scale consumers
docker-compose up --scale consumer=3

Access RabbitMQ management UI at: http://localhost:15672 (guest/guest)

Data Processing with Message Queues

Enhanced producer for data processing:

# data_producer.py
import pika
import json
import random
import time
from datetime import datetime

def connect():
    for i in range(5):
        try:
            return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
        except:
            time.sleep(2)
    raise Exception("Could not connect")

connection = connect()
channel = connection.channel()
channel.queue_declare(queue='data_queue', durable=True)

sensors = ['temperature', 'humidity', 'pressure']

while True:
    data = {
        'sensor_type': random.choice(sensors),
        'value': round(random.uniform(0, 100), 2),
        'timestamp': datetime.now().isoformat()
    }
    
    channel.basic_publish(
        exchange='',
        routing_key='data_queue',
        body=json.dumps(data),
        properties=pika.BasicProperties(delivery_mode=2)
    )
    
    print(f"Sent: {data}")
    time.sleep(0.5)

Questions - Exercise 5

Q5.1 Extend the RabbitMQ example to:

  • Use topic-based routing (different queues for different data types)

  • Implement multiple consumer types (one for each sensor type)

  • Store processed data in a shared volume

Q5.2 Implement dead letter handling:

  • Configure a dead letter queue for failed messages

  • Add a retry mechanism (max 3 retries)

  • Create a monitoring consumer that alerts on DLQ messages

Q5.3 Compare RabbitMQ with Redis Pub/Sub:

  • Implement the same producer-consumer pattern with Redis

  • Measure message throughput

  • Document the trade-offs between the two approaches


Exercise 6: Application-Database Integration [★★]

Connecting Applications to Databases

Navigate to the AppDB folder:

cd AppDB

This example demonstrates a Flask application connected to PostgreSQL.

app/app.py:

from flask import Flask
import psycopg2

app = Flask(__name__)

@app.route("/")
def index():
    conn = psycopg2.connect(
        host="bd",  # Service name in Docker
        database="livres",
        user="postgres",
        password="postgres"
    )
    cur = conn.cursor()
    cur.execute("SELECT titre FROM livres")
    livres = cur.fetchall()
    cur.close()
    conn.close()
    return "<br>".join(title for (title,) in livres)

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=5000)

init_bd/init.sql:

CREATE TABLE IF NOT EXISTS livres (
    id SERIAL PRIMARY KEY,
    titre VARCHAR(255) NOT NULL,
    auteur VARCHAR(255),
    annee INTEGER
);

INSERT INTO livres (titre, auteur, annee) VALUES
    ('Les Misérables', 'Victor Hugo', 1862),
    ('Le Petit Prince', 'Antoine de Saint-Exupéry', 1943),
    ('L''Étranger', 'Albert Camus', 1942);

docker-compose.yml:

services:
  app:
    build: ./app
    ports:
      - "5000:5000"
    depends_on:
      - bd

  bd:
    image: postgres:15
    environment:
      POSTGRES_DB: livres
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    volumes:
      - ./init_bd:/docker-entrypoint-initdb.d
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

volumes:
  postgres_data:

Run with:

docker-compose up --build

Access at: http://localhost:5000

Enhanced Example with SQLAlchemy

# app_enhanced.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
import os

app = Flask(__name__)

# Database configuration from environment
db_host = os.environ.get('DB_HOST', 'bd')
db_name = os.environ.get('DB_NAME', 'livres')
db_user = os.environ.get('DB_USER', 'postgres')
db_pass = os.environ.get('DB_PASS', 'postgres')

app.config['SQLALCHEMY_DATABASE_URI'] = f'postgresql://{db_user}:{db_pass}@{db_host}/{db_name}'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False

db = SQLAlchemy(app)

class Book(db.Model):
    __tablename__ = 'livres'
    id = db.Column(db.Integer, primary_key=True)
    titre = db.Column(db.String(255), nullable=False)
    auteur = db.Column(db.String(255))
    annee = db.Column(db.Integer)

@app.route('/books')
def get_books():
    books = Book.query.all()
    return jsonify([{
        'id': b.id,
        'title': b.titre,
        'author': b.auteur,
        'year': b.annee
    } for b in books])

@app.route('/books', methods=['POST'])
def add_book():
    data = request.json
    book = Book(titre=data['title'], auteur=data['author'], annee=data['year'])
    db.session.add(book)
    db.session.commit()
    return jsonify({'id': book.id}), 201

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Questions - Exercise 6

Q6.1 Extend the AppDB example to include:

  • CRUD operations (Create, Read, Update, Delete)

  • Input validation

  • Error handling with appropriate HTTP status codes

Q6.2 Add data analytics capabilities:

  • Endpoint to get books by year range

  • Statistics endpoint (count by author, books per decade)

  • Full-text search capability

Q6.3 Implement a data import service:

  • Create a separate container that imports CSV data into the database

  • Watch a shared volume for new CSV files

  • Log import results and errors


Exercise 7: Frontend-Backend Architectures [★★★]

Microservices Architecture

Navigate to the WebAppFrontBack folder:

cd WebAppFrontBack

This example demonstrates a React frontend with a Flask backend.

┌─────────────────┐      ┌─────────────────┐
│    Frontend     │      │    Backend      │
│    (React)      │─────►│    (Flask)      │
│   Port: 3000    │      │   Port: 5000    │
└─────────────────┘      └─────────────────┘

Backend API (Flask)

backend/app.py:

from flask import Flask, jsonify, request
from flask_cors import CORS

app = Flask(__name__)
CORS(app)  # Enable Cross-Origin requests

# In-memory data store
tasks = [
    {"id": 1, "title": "Learn Docker", "completed": True},
    {"id": 2, "title": "Build a pipeline", "completed": False}
]

@app.route('/api/tasks', methods=['GET'])
def get_tasks():
    return jsonify(tasks)

@app.route('/api/tasks', methods=['POST'])
def add_task():
    data = request.json
    new_task = {
        "id": len(tasks) + 1,
        "title": data['title'],
        "completed": False
    }
    tasks.append(new_task)
    return jsonify(new_task), 201

@app.route('/api/tasks/<int:task_id>', methods=['PUT'])
def update_task(task_id):
    task = next((t for t in tasks if t['id'] == task_id), None)
    if task:
        data = request.json
        task['completed'] = data.get('completed', task['completed'])
        return jsonify(task)
    return jsonify({"error": "Task not found"}), 404

@app.route('/api/tasks/<int:task_id>', methods=['DELETE'])
def delete_task(task_id):
    global tasks
    tasks = [t for t in tasks if t['id'] != task_id]
    return '', 204

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Docker Compose for Full Stack

docker-compose.yml:

version: "3.8"

services:
  frontend:
    build:
      context: ./frontend
    ports:
      - "3000:3000"
    depends_on:
      - backend
    environment:
      - REACT_APP_API_URL=http://localhost:5000

  backend:
    build:
      context: ./backend
    ports:
      - "5000:5000"
    volumes:
      - ./backend:/app
    environment:
      - FLASK_ENV=development

Adding Nginx as Reverse Proxy

For production deployments, use Nginx as a reverse proxy:

nginx.conf:

upstream frontend {
    server frontend:3000;
}

upstream backend {
    server backend:5000;
}

server {
    listen 80;

    location / {
        proxy_pass http://frontend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }

    location /api {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

docker-compose.prod.yml:

version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf
    depends_on:
      - frontend
      - backend

  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.prod

  backend:
    build:
      context: ./backend

Questions - Exercise 7

Q7.1 Extend the frontend-backend example to include:

  • User authentication (login/logout)

  • Protected routes

  • JWT token handling

Q7.2 Add a database to the stack:

  • Replace in-memory storage with PostgreSQL

  • Add database migrations

  • Implement data persistence across restarts

Q7.3 Create a data visualization dashboard:

  • Backend API that serves analytics data

  • Frontend with charts (using Chart.js or similar)

  • Real-time updates using WebSockets


Exercise 8: Scaling and Monitoring Containers [★★★]

Container Scaling

# Scale a specific service
docker-compose up --scale worker=5

# View running containers
docker-compose ps

# View resource usage
docker stats

Load Balancing with Nginx

docker-compose.yml:

version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api

  api:
    build: .
    # No ports exposed - accessed through nginx
    deploy:
      replicas: 3

nginx.conf for load balancing:

events {
    worker_connections 1024;
}

http {
    upstream api_servers {
        least_conn;  # Load balancing method
        server api:5000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://api_servers;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Monitoring with Prometheus and Grafana

docker-compose.monitoring.yml:

version: "3.8"

services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus

  grafana:
    image: grafana/grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

  cadvisor:
    image: gcr.io/cadvisor/cadvisor
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

volumes:
  prometheus_data:
  grafana_data:

prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

Resource Limits

version: "3.8"

services:
  api:
    build: .
    deploy:
      resources:
        limits:
          cpus: '0.50'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M

Questions - Exercise 8

Q8.1 Create a scalable data processing pipeline:

  • Producer service generating data

  • Worker services that can be scaled (1-10 instances)

  • Load balancer distributing work

  • Measure throughput with different numbers of workers

Q8.2 Set up monitoring for your application:

  • Configure Prometheus to collect metrics

  • Create Grafana dashboards for:

    • CPU and memory usage

    • Request rates and latencies

    • Error rates

Q8.3 Implement auto-scaling simulation:

  • Monitor CPU usage of worker containers

  • Create a script that scales workers based on load

  • Test with varying load patterns


Summary

In this practical, you learned:

  1. Docker Fundamentals: Images, containers, and basic commands

  2. Dockerfiles: Writing efficient Dockerfiles for Python applications

  3. Docker Compose: Orchestrating multi-container applications

  4. Shared Volumes: Building data pipelines with file-based communication

  5. Message Queues: Producer-consumer patterns with RabbitMQ

  6. Database Integration: Connecting applications to PostgreSQL

  7. Frontend-Backend: Building full-stack applications

  8. Scaling and Monitoring: Load balancing and observability

Key Takeaways

  • Use Docker Compose for development and testing

  • Implement proper health checks for service dependencies

  • Use volumes for data persistence

  • Choose the right communication pattern (files, messages, API)

  • Monitor and scale based on metrics

Next Steps

In Practical 7, you will learn about Kubernetes for:

  • Production-grade container orchestration

  • Declarative configuration management

  • Automatic scaling and self-healing

  • Service discovery and load balancing

Further Reading