Goals¶
This practical session introduces Docker containerization for building scalable and reproducible data processing pipelines. You will learn how to package applications, manage multi-container environments, and deploy data processing workflows.
Learning Objectives¶
Understand Docker architecture and containerization concepts
Write Dockerfiles to package Python applications
Use Docker Compose for multi-container orchestration
Implement data pipelines with shared volumes
Build producer-consumer patterns with message queues
Connect applications to databases in containers
Implement frontend-backend architectures
Deploy and scale data processing applications
Prerequisites¶
Completion of Practical 5 (Apache Spark)
Docker Desktop installed (Installation Guide)
Basic understanding of Linux commands
Python programming fundamentals
Installation¶
Verify Docker is installed:
docker --version
docker-compose --versionExercises Overview¶
| Exercise | Topic | Difficulty |
|---|---|---|
| 1 | Docker Fundamentals and Basic Commands | ★ |
| 2 | Writing Dockerfiles for Python Applications | ★ |
| 3 | Docker Compose for Multi-Container Applications | ★★ |
| 4 | Data Pipelines with Shared Volumes | ★★ |
| 5 | Producer-Consumer with Message Queues | ★★ |
| 6 | Application-Database Integration | ★★ |
| 7 | Frontend-Backend Architectures | ★★★ |
| 8 | Scaling and Monitoring Containers | ★★★ |
Exercise 1: Docker Fundamentals and Basic Commands [★]¶
Docker Architecture¶
Docker uses a client-server architecture:
┌─────────────────────────────────────────────────────────────┐
│ Docker Host │
│ ┌─────────────┐ ┌─────────────────────────────────────┐ │
│ │ Docker │ │ Docker Daemon │ │
│ │ Client │◄──►│ ┌─────────┐ ┌─────────┐ │ │
│ │ (CLI) │ │ │Container│ │Container│ │ │
│ └─────────────┘ │ │ 1 │ │ 2 │ │ │
│ │ └─────────┘ └─────────┘ │ │
│ │ │ │ │ │
│ │ ┌────┴────────────┴────┐ │ │
│ │ │ Images │ │ │
│ │ └─────────────────────┘ │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Key Concepts¶
Image: Read-only template with instructions for creating a container
Container: Runnable instance of an image
Dockerfile: Text file with instructions to build an image
Registry: Storage for Docker images (e.g., Docker Hub)
Basic Docker Commands¶
Run the following commands in your terminal to familiarize yourself with Docker:
# Check Docker version
docker --version
# View system-wide information
docker info
# List available images
docker images
# List running containers
docker ps
# List all containers (including stopped)
docker ps -aRunning Your First Container¶
# Run a simple hello-world container
docker run hello-world
# Run an interactive Python container
docker run -it python:3.10 python
# Run a container with a specific command
docker run python:3.10 python -c "print('Hello from Docker!')"
# Run a container in the background (detached mode)
docker run -d --name my_python python:3.10 sleep 60
# Stop a running container
docker stop my_python
# Remove a container
docker rm my_pythonContainer Lifecycle¶
┌─────────┐ docker run ┌─────────┐ docker stop ┌─────────┐
│ Created │───────────────►│ Running │────────────────►│ Stopped │
└─────────┘ └─────────┘ └─────────┘
│ │ │
│ │ docker pause │
│ ▼ │
│ ┌─────────┐ │
│ │ Paused │ │
│ └─────────┘ │
│ │
└──────────────────────────────────────────────────────┘
docker rm# View container logs
docker logs <container_id>
# Execute command in running container
docker exec -it <container_id> bash
# Copy files to/from container
docker cp local_file.txt <container_id>:/path/in/container/
docker cp <container_id>:/path/in/container/file.txt ./local_file.txt
# View container resource usage
docker statsQuestions - Exercise 1¶
Q1.1 Run a Python container that prints the system’s Python version, OS name, and current date/time. Capture the output.
Q1.2 Run an Ubuntu container interactively. Inside the container:
Update the package list
Install
curlDownload a web page
Exit the container
Q1.3 Run three containers in detached mode with different names. Use docker ps to verify they’re running, then stop and remove all of them using a single command each.
Exercise 2: Writing Dockerfiles for Python Applications [★]¶
Dockerfile Basics¶
A Dockerfile is a script containing instructions to build a Docker image.
Common Dockerfile Instructions¶
| Instruction | Description |
|---|---|
FROM | Base image to start from |
WORKDIR | Set working directory |
COPY | Copy files from host to image |
RUN | Execute commands during build |
ENV | Set environment variables |
EXPOSE | Document which ports the container listens on |
CMD | Default command when container starts |
ENTRYPOINT | Configure container to run as executable |
Example: Simple Python Application¶
Create a file app.py:
# app.py
import sys
import platform
from datetime import datetime
def main():
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Current time: {datetime.now()}")
print("Hello from Docker!")
if __name__ == "__main__":
main()Create a Dockerfile:
# Use official Python image as base
FROM python:3.10-slim
# Set working directory
WORKDIR /app
# Copy application code
COPY app.py .
# Set the default command
CMD ["python", "app.py"]Build and run:
# Build the image
docker build -t my-python-app .
# Run the container
docker run my-python-appExample: Python Application with Dependencies¶
Create requirements.txt:
pandas==2.0.0
numpy==1.24.0
requests==2.28.0Create data_processor.py:
import pandas as pd
import numpy as np
def process_data():
# Create sample data
data = {
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'value': np.random.randint(1, 100, 4)
}
df = pd.DataFrame(data)
print("Data Processing Results:")
print(df)
print(f"\nSum: {df['value'].sum()}")
print(f"Mean: {df['value'].mean():.2f}")
if __name__ == "__main__":
process_data()Optimized Dockerfile:
FROM python:3.10-slim
WORKDIR /app
# Copy requirements first (for better layer caching)
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY data_processor.py .
CMD ["python", "data_processor.py"]Multi-Stage Builds¶
Multi-stage builds help create smaller production images:
# Build stage
FROM python:3.10 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.10-slim
WORKDIR /app
# Copy installed packages from builder
COPY --from=builder /root/.local /root/.local
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
COPY app.py .
CMD ["python", "app.py"]Best Practices for Dockerfiles¶
Use specific base image tags:
python:3.10-sliminstead ofpython:latestOrder instructions by frequency of change: Copy requirements before code
Use
.dockerignore: Exclude unnecessary filesMinimize layers: Combine related RUN commands
Don’t run as root: Create a non-root user when possible
Use multi-stage builds: For smaller production images
Example .dockerignore:
__pycache__
*.pyc
*.pyo
.git
.gitignore
*.md
.env
venv/
.pytest_cache/Questions - Exercise 2¶
Q2.1 Create a Dockerfile for a PySpark application that:
Uses
bitnami/sparkas the base imageInstalls additional Python packages (pandas, matplotlib)
Copies a Spark script that processes CSV data
Runs the script when the container starts
Q2.2 Create a Dockerfile that:
Uses a non-root user for security
Implements health checks
Uses environment variables for configuration
Includes proper labeling (maintainer, version, description)
Q2.3 Compare the image sizes of:
A simple Dockerfile using
python:3.10The same application using
python:3.10-slimA multi-stage build version
Document the size differences and explain when each approach is appropriate.
Exercise 3: Docker Compose for Multi-Container Applications [★★]¶
Docker Compose Overview¶
Docker Compose allows you to define and run multi-container applications using a YAML file.
Basic docker-compose.yml Structure¶
version: "3.8"
services:
service_name:
image: image_name:tag
# OR build from Dockerfile
build: ./path/to/dockerfile
ports:
- "host_port:container_port"
volumes:
- ./local/path:/container/path
environment:
- VAR_NAME=value
depends_on:
- other_service
volumes:
named_volume:
networks:
custom_network:Docker Compose Commands¶
# Start all services
docker-compose up
# Start in detached mode
docker-compose up -d
# Build images before starting
docker-compose up --build
# Stop all services
docker-compose down
# Stop and remove volumes
docker-compose down -v
# View logs
docker-compose logs
docker-compose logs -f service_name
# Scale a service
docker-compose up --scale service_name=3
# Execute command in a service
docker-compose exec service_name commandExample: Web Application with Redis¶
Create app.py:
from flask import Flask
import redis
app = Flask(__name__)
cache = redis.Redis(host='redis', port=6379)
@app.route('/')
def hello():
count = cache.incr('hits')
return f'Hello! This page has been viewed {count} times.'
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)Create requirements.txt:
flask
redisCreate Dockerfile:
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
EXPOSE 5000
CMD ["python", "app.py"]Create docker-compose.yml:
version: "3.8"
services:
web:
build: .
ports:
- "5000:5000"
depends_on:
- redis
environment:
- FLASK_ENV=development
redis:
image: redis:alpine
volumes:
- redis_data:/data
volumes:
redis_data:Run with:
docker-compose up --buildService Dependencies and Health Checks¶
version: "3.8"
services:
web:
build: .
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:5000/health"]
interval: 30s
timeout: 10s
retries: 3
db:
image: postgres:15
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5Questions - Exercise 3¶
Q3.1 Create a Docker Compose configuration for a data processing pipeline with:
A Python data generator service
A Redis service for caching
A data processor service that reads from Redis
Proper service dependencies
Q3.2 Modify the previous example to use:
Custom networks for service isolation
Environment files (
.env)Volume mounts for data persistence
Q3.3 Create a Docker Compose file that starts a Jupyter Notebook server with:
Pre-installed data science libraries (pandas, numpy, matplotlib, sklearn)
Persistent notebook storage
Access to a shared data volume
Exercise 4: Data Pipelines with Shared Volumes [★★]¶
Shared Volumes for Container Communication¶
Shared volumes allow containers to exchange data through the file system.
┌─────────────────┐ ┌─────────────────┐
│ Uploader │ │ Processor │
│ Container │ │ Container │
│ │ │ │
│ writes to │ │ reads from │
│ /shared │ │ /shared │
└────────┬────────┘ └────────┬────────┘
│ │
└─────────┬───────────────┘
│
┌──────┴──────┐
│ Shared │
│ Volume │
└─────────────┘Example: File Processing Pipeline¶
Navigate to the SharedVolume folder in this practical:
cd SharedVolumeExamine the existing structure:
Uploader Service (Uploader/upload.py):
import time
from shutil import copyfile
def upload_file():
while True:
# Simulate uploading a new file every 5 seconds
print("Uploading new file...")
copyfile("sample.txt", "/shared/sample_uploaded.txt")
time.sleep(5)
if __name__ == "__main__":
upload_file()Processor Service (Processor/process.py):
import time
import os
def process_files():
while True:
if os.path.exists("/shared/sample_uploaded.txt"):
with open("/shared/sample_uploaded.txt", "r") as f:
content = f.read()
print(f"Processing: {content}")
# Process the file...
os.remove("/shared/sample_uploaded.txt")
else:
print("Waiting for files...")
time.sleep(2)
if __name__ == "__main__":
process_files()docker-compose.yml:
version: "3.8"
services:
uploader:
build:
context: ./uploader
volumes:
- ./shared:/shared
depends_on:
- processor
processor:
build:
context: ./processor
volumes:
- ./shared:/sharedRun with:
docker-compose up --buildEnhanced Data Pipeline Example¶
Create a more sophisticated data pipeline:
data_generator.py:
import json
import time
import random
from datetime import datetime
def generate_data():
counter = 0
while True:
data = {
"id": counter,
"timestamp": datetime.now().isoformat(),
"sensor_id": f"sensor_{random.randint(1, 10)}",
"temperature": round(random.uniform(20, 35), 2),
"humidity": round(random.uniform(30, 80), 2)
}
filename = f"/shared/input/data_{counter}.json"
with open(filename, 'w') as f:
json.dump(data, f)
print(f"Generated: {filename}")
counter += 1
time.sleep(2)
if __name__ == "__main__":
import os
os.makedirs("/shared/input", exist_ok=True)
generate_data()data_processor.py:
import json
import os
import time
def process_files():
os.makedirs("/shared/output", exist_ok=True)
while True:
input_dir = "/shared/input"
if os.path.exists(input_dir):
files = [f for f in os.listdir(input_dir) if f.endswith('.json')]
for filename in files:
filepath = os.path.join(input_dir, filename)
with open(filepath, 'r') as f:
data = json.load(f)
# Process the data
data['processed'] = True
data['temp_fahrenheit'] = round(data['temperature'] * 9/5 + 32, 2)
# Write to output
output_path = f"/shared/output/processed_{filename}"
with open(output_path, 'w') as f:
json.dump(data, f, indent=2)
# Remove input file
os.remove(filepath)
print(f"Processed: {filename}")
time.sleep(1)
if __name__ == "__main__":
process_files()Questions - Exercise 4¶
Q4.1 Extend the SharedVolume example to:
Add a third service that aggregates processed files
Generate statistics (average temperature, humidity by sensor)
Output a summary report every minute
Q4.2 Implement error handling in the pipeline:
Move failed files to an “error” directory
Log errors with timestamps
Add a monitoring service that reports pipeline health
Q4.3 Create a parallel processing pipeline:
Multiple processor containers (use
--scale)Implement file locking to prevent duplicate processing
Measure throughput with different numbers of processors
Exercise 5: Producer-Consumer with Message Queues [★★]¶
Message Queue Pattern¶
Message queues decouple producers and consumers, enabling:
Asynchronous processing
Load balancing
Fault tolerance
┌──────────┐ ┌─────────────┐ ┌──────────┐
│ Producer │────►│ Message │────►│ Consumer │
│ 1 │ │ Queue │ │ 1 │
└──────────┘ │ (RabbitMQ) │ └──────────┘
┌──────────┐ │ │ ┌──────────┐
│ Producer │────►│ │────►│ Consumer │
│ 2 │ └─────────────┘ │ 2 │
└──────────┘ └──────────┘RabbitMQ Example¶
Navigate to the ProducerConsumerRabbitMQ folder:
cd ProducerConsumerRabbitMQproducer/producer.py:
import pika
import time
def connect():
for i in range(5):
try:
return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
except:
print("Retrying connection to RabbitMQ...")
time.sleep(2)
raise Exception("Could not connect to RabbitMQ")
connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
for i in range(100):
msg = f"Task #{i}"
channel.basic_publish(
exchange='',
routing_key='task_queue',
body=msg,
properties=pika.BasicProperties(delivery_mode=2) # Make message persistent
)
print(f"Sent: {msg}")
time.sleep(1)
connection.close()consumer/consumer.py:
import pika
import time
def connect():
for i in range(5):
try:
return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
except:
print("Retrying connection to RabbitMQ...")
time.sleep(2)
raise Exception("Could not connect to RabbitMQ")
def callback(ch, method, properties, body):
print(f"Received: {body.decode()}")
time.sleep(0.5) # Simulate processing
print(f"Processed: {body.decode()}")
ch.basic_ack(delivery_tag=method.delivery_tag)
connection = connect()
channel = connection.channel()
channel.queue_declare(queue='task_queue', durable=True)
channel.basic_qos(prefetch_count=1) # Fair dispatch
channel.basic_consume(queue='task_queue', on_message_callback=callback)
print('Waiting for messages...')
channel.start_consuming()docker-compose.yml:
services:
rabbitmq:
image: rabbitmq:3-management
ports:
- "5672:5672" # AMQP protocol
- "15672:15672" # Management UI
environment:
RABBITMQ_DEFAULT_USER: guest
RABBITMQ_DEFAULT_PASS: guest
producer:
build: ./producer
depends_on:
- rabbitmq
consumer:
build: ./consumer
depends_on:
- rabbitmqRun with:
docker-compose up --build
# Scale consumers
docker-compose up --scale consumer=3Access RabbitMQ management UI at: http://
Data Processing with Message Queues¶
Enhanced producer for data processing:
# data_producer.py
import pika
import json
import random
import time
from datetime import datetime
def connect():
for i in range(5):
try:
return pika.BlockingConnection(pika.ConnectionParameters('rabbitmq'))
except:
time.sleep(2)
raise Exception("Could not connect")
connection = connect()
channel = connection.channel()
channel.queue_declare(queue='data_queue', durable=True)
sensors = ['temperature', 'humidity', 'pressure']
while True:
data = {
'sensor_type': random.choice(sensors),
'value': round(random.uniform(0, 100), 2),
'timestamp': datetime.now().isoformat()
}
channel.basic_publish(
exchange='',
routing_key='data_queue',
body=json.dumps(data),
properties=pika.BasicProperties(delivery_mode=2)
)
print(f"Sent: {data}")
time.sleep(0.5)Questions - Exercise 5¶
Q5.1 Extend the RabbitMQ example to:
Use topic-based routing (different queues for different data types)
Implement multiple consumer types (one for each sensor type)
Store processed data in a shared volume
Q5.2 Implement dead letter handling:
Configure a dead letter queue for failed messages
Add a retry mechanism (max 3 retries)
Create a monitoring consumer that alerts on DLQ messages
Q5.3 Compare RabbitMQ with Redis Pub/Sub:
Implement the same producer-consumer pattern with Redis
Measure message throughput
Document the trade-offs between the two approaches
Exercise 6: Application-Database Integration [★★]¶
Connecting Applications to Databases¶
Navigate to the AppDB folder:
cd AppDBThis example demonstrates a Flask application connected to PostgreSQL.
app/app.py:
from flask import Flask
import psycopg2
app = Flask(__name__)
@app.route("/")
def index():
conn = psycopg2.connect(
host="bd", # Service name in Docker
database="livres",
user="postgres",
password="postgres"
)
cur = conn.cursor()
cur.execute("SELECT titre FROM livres")
livres = cur.fetchall()
cur.close()
conn.close()
return "<br>".join(title for (title,) in livres)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)init_bd/init.sql:
CREATE TABLE IF NOT EXISTS livres (
id SERIAL PRIMARY KEY,
titre VARCHAR(255) NOT NULL,
auteur VARCHAR(255),
annee INTEGER
);
INSERT INTO livres (titre, auteur, annee) VALUES
('Les Misérables', 'Victor Hugo', 1862),
('Le Petit Prince', 'Antoine de Saint-Exupéry', 1943),
('L''Étranger', 'Albert Camus', 1942);docker-compose.yml:
services:
app:
build: ./app
ports:
- "5000:5000"
depends_on:
- bd
bd:
image: postgres:15
environment:
POSTGRES_DB: livres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- ./init_bd:/docker-entrypoint-initdb.d
- postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
volumes:
postgres_data:Run with:
docker-compose up --buildAccess at: http://
Enhanced Example with SQLAlchemy¶
# app_enhanced.py
from flask import Flask, jsonify, request
from flask_sqlalchemy import SQLAlchemy
import os
app = Flask(__name__)
# Database configuration from environment
db_host = os.environ.get('DB_HOST', 'bd')
db_name = os.environ.get('DB_NAME', 'livres')
db_user = os.environ.get('DB_USER', 'postgres')
db_pass = os.environ.get('DB_PASS', 'postgres')
app.config['SQLALCHEMY_DATABASE_URI'] = f'postgresql://{db_user}:{db_pass}@{db_host}/{db_name}'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
db = SQLAlchemy(app)
class Book(db.Model):
__tablename__ = 'livres'
id = db.Column(db.Integer, primary_key=True)
titre = db.Column(db.String(255), nullable=False)
auteur = db.Column(db.String(255))
annee = db.Column(db.Integer)
@app.route('/books')
def get_books():
books = Book.query.all()
return jsonify([{
'id': b.id,
'title': b.titre,
'author': b.auteur,
'year': b.annee
} for b in books])
@app.route('/books', methods=['POST'])
def add_book():
data = request.json
book = Book(titre=data['title'], auteur=data['author'], annee=data['year'])
db.session.add(book)
db.session.commit()
return jsonify({'id': book.id}), 201
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)Questions - Exercise 6¶
Q6.1 Extend the AppDB example to include:
CRUD operations (Create, Read, Update, Delete)
Input validation
Error handling with appropriate HTTP status codes
Q6.2 Add data analytics capabilities:
Endpoint to get books by year range
Statistics endpoint (count by author, books per decade)
Full-text search capability
Q6.3 Implement a data import service:
Create a separate container that imports CSV data into the database
Watch a shared volume for new CSV files
Log import results and errors
Exercise 7: Frontend-Backend Architectures [★★★]¶
Microservices Architecture¶
Navigate to the WebAppFrontBack folder:
cd WebAppFrontBackThis example demonstrates a React frontend with a Flask backend.
┌─────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │
│ (React) │─────►│ (Flask) │
│ Port: 3000 │ │ Port: 5000 │
└─────────────────┘ └─────────────────┘Backend API (Flask)¶
backend/app.py:
from flask import Flask, jsonify, request
from flask_cors import CORS
app = Flask(__name__)
CORS(app) # Enable Cross-Origin requests
# In-memory data store
tasks = [
{"id": 1, "title": "Learn Docker", "completed": True},
{"id": 2, "title": "Build a pipeline", "completed": False}
]
@app.route('/api/tasks', methods=['GET'])
def get_tasks():
return jsonify(tasks)
@app.route('/api/tasks', methods=['POST'])
def add_task():
data = request.json
new_task = {
"id": len(tasks) + 1,
"title": data['title'],
"completed": False
}
tasks.append(new_task)
return jsonify(new_task), 201
@app.route('/api/tasks/<int:task_id>', methods=['PUT'])
def update_task(task_id):
task = next((t for t in tasks if t['id'] == task_id), None)
if task:
data = request.json
task['completed'] = data.get('completed', task['completed'])
return jsonify(task)
return jsonify({"error": "Task not found"}), 404
@app.route('/api/tasks/<int:task_id>', methods=['DELETE'])
def delete_task(task_id):
global tasks
tasks = [t for t in tasks if t['id'] != task_id]
return '', 204
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)Docker Compose for Full Stack¶
docker-compose.yml:
version: "3.8"
services:
frontend:
build:
context: ./frontend
ports:
- "3000:3000"
depends_on:
- backend
environment:
- REACT_APP_API_URL=http://localhost:5000
backend:
build:
context: ./backend
ports:
- "5000:5000"
volumes:
- ./backend:/app
environment:
- FLASK_ENV=developmentAdding Nginx as Reverse Proxy¶
For production deployments, use Nginx as a reverse proxy:
nginx.conf:
upstream frontend {
server frontend:3000;
}
upstream backend {
server backend:5000;
}
server {
listen 80;
location / {
proxy_pass http://frontend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location /api {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}docker-compose.prod.yml:
version: "3.8"
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/conf.d/default.conf
depends_on:
- frontend
- backend
frontend:
build:
context: ./frontend
dockerfile: Dockerfile.prod
backend:
build:
context: ./backendQuestions - Exercise 7¶
Q7.1 Extend the frontend-backend example to include:
User authentication (login/logout)
Protected routes
JWT token handling
Q7.2 Add a database to the stack:
Replace in-memory storage with PostgreSQL
Add database migrations
Implement data persistence across restarts
Q7.3 Create a data visualization dashboard:
Backend API that serves analytics data
Frontend with charts (using Chart.js or similar)
Real-time updates using WebSockets
Exercise 8: Scaling and Monitoring Containers [★★★]¶
Container Scaling¶
# Scale a specific service
docker-compose up --scale worker=5
# View running containers
docker-compose ps
# View resource usage
docker statsLoad Balancing with Nginx¶
docker-compose.yml:
version: "3.8"
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- api
api:
build: .
# No ports exposed - accessed through nginx
deploy:
replicas: 3nginx.conf for load balancing:
events {
worker_connections 1024;
}
http {
upstream api_servers {
least_conn; # Load balancing method
server api:5000;
}
server {
listen 80;
location / {
proxy_pass http://api_servers;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
}Monitoring with Prometheus and Grafana¶
docker-compose.monitoring.yml:
version: "3.8"
services:
prometheus:
image: prom/prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
grafana:
image: grafana/grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
cadvisor:
image: gcr.io/cadvisor/cadvisor
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
volumes:
prometheus_data:
grafana_data:prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']Resource Limits¶
version: "3.8"
services:
api:
build: .
deploy:
resources:
limits:
cpus: '0.50'
memory: 512M
reservations:
cpus: '0.25'
memory: 256MQuestions - Exercise 8¶
Q8.1 Create a scalable data processing pipeline:
Producer service generating data
Worker services that can be scaled (1-10 instances)
Load balancer distributing work
Measure throughput with different numbers of workers
Q8.2 Set up monitoring for your application:
Configure Prometheus to collect metrics
Create Grafana dashboards for:
CPU and memory usage
Request rates and latencies
Error rates
Q8.3 Implement auto-scaling simulation:
Monitor CPU usage of worker containers
Create a script that scales workers based on load
Test with varying load patterns
Summary¶
In this practical, you learned:
Docker Fundamentals: Images, containers, and basic commands
Dockerfiles: Writing efficient Dockerfiles for Python applications
Docker Compose: Orchestrating multi-container applications
Shared Volumes: Building data pipelines with file-based communication
Message Queues: Producer-consumer patterns with RabbitMQ
Database Integration: Connecting applications to PostgreSQL
Frontend-Backend: Building full-stack applications
Scaling and Monitoring: Load balancing and observability
Key Takeaways¶
Use Docker Compose for development and testing
Implement proper health checks for service dependencies
Use volumes for data persistence
Choose the right communication pattern (files, messages, API)
Monitor and scale based on metrics
Next Steps¶
In Practical 7, you will learn about Kubernetes for:
Production-grade container orchestration
Declarative configuration management
Automatic scaling and self-healing
Service discovery and load balancing