Performance Tuning Guide¶

Quick Start¶

For 10x Performance Gain¶

pip install multilingualprogramming[wasm]
# Then benchmark your workload; speedups are workload-dependent.

For Maximum Performance¶

pip install multilingualprogramming[performance]
# Adds NumPy optimization for fallback paths

Performance Optimization Levels¶

Level 0: Baseline (Python Only)¶

from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

# Force pure Python
selector = BackendSelector(prefer_backend=Backend.PYTHON)
result = selector.call_function("matrix_multiply", a, b)

Speed: Baseline (reference) Use When: Testing, debugging, or WASM unavailable

Level 1: WASM with Auto-Detection (Recommended)¶

from multilingualprogramming.runtime.backend_selector import BackendSelector

# Auto-detect WASM or fallback
selector = BackendSelector()  # Backend.AUTO is default
result = selector.call_function("matrix_multiply", a, b)

Speed: Can be significantly faster on compute-heavy ops (benchmark-dependent) Use When: Production code, maximum portability Requirement: pip install multilingualprogramming[wasm]

Level 2: NumPy-Accelerated Fallback¶

from multilingualprogramming.runtime.python_fallbacks import MatrixOperations

# Direct fallback with NumPy acceleration (if installed)
result = MatrixOperations.multiply(a, b)  # Uses NumPy internally

Speed: 5-10x faster than pure Python (fallback) Use When: WASM unavailable but NumPy installed Requirement: pip install numpy

Level 3: Hybrid Execution (WASM + NumPy)¶

# This is automatic with Backend.AUTO
from multilingualprogramming.runtime.backend_selector import BackendSelector

selector = BackendSelector()
# Uses WASM if available (100x)
# Falls back to NumPy (10x)
# Falls back to pure Python (baseline)
result = selector.call_function("matrix_multiply", a, b)

Speed: 100x (WASM) → 10x (NumPy) → 1x (Python) Use When: Production with varied environments Requirements: pip install multilingualprogramming[performance]

Detailed Optimization Guide¶

1. Choose the Right Operation¶

High speedup potential - Use WASM/NumPy:

# Matrix multiplication (n > 100)
result = matrix_multiply(a, b)  # Often much faster on large matrices

# Cryptographic operations
encrypted = xor_cipher(plaintext, key)  # Can be much faster for large payloads

# Scientific computing
pi = estimate_pi_monte_carlo(1000000)  # Can be much faster for large iterations

Moderate speedup (10x) - Use WASM/NumPy:

# JSON parsing (> 1MB)
data = parse_json_simple(large_json)  # 10x faster

# Image processing
blurred = blur_simple(image)  # 10x faster

Low speedup (<5x) - Avoid WASM overhead:

# Small operations (n < 10)
fib = fibonacci(5)  # Too small, overhead > benefit

# Simple string ops
rev = reverse("hello")  # Too small

2. Batch Operations¶

Instead of:

# Multiple small calls = multiple WASM calls = overhead overhead
for i in range(100):
    result = fibonacci(5)  # 100 small WASM calls

Do:

# Single large operation = single WASM call = amortize overhead
matrices = [generate_matrix(100) for _ in range(100)]
results = [matrix_multiply(m, m) for m in matrices]  # Much faster

3. Data Structure Optimization¶

Matrix Operations:

# For 100x100 matrices
a = [[1.0 for _ in range(100)] for _ in range(100)]
b = [[2.0 for _ in range(100)] for _ in range(100)]

# Use larger matrices to get better speedup
result = matrix_multiply(a, b)  # Often much faster on large matrices

# Small matrices have overhead
a_small = [[1, 2], [3, 4]]
b_small = [[5, 6], [7, 8]]
result = matrix_multiply(a_small, b_small)  # Maybe 2x faster (overhead high)

JSON Parsing:

# For large JSON (> 1MB)
large_json = json.dumps([{"id": i, "data": range(100)} for i in range(10000)])
data = parse_json_simple(large_json)  # 10x faster

# Small JSON has little benefit
tiny_json = '{"name": "Alice"}'
data = parse_json_simple(tiny_json)  # Parity or slower

4. Memory Optimization¶

Prevent Memory Bloat:

# Don't store intermediate results
result = matrix_multiply(
    matrix_multiply(a, b),  # Don't need to store this
    c
)

# Don't create large temporary arrays
result = matrix_multiply(
    [generate_row(1000) for _ in range(1000)],  # ← Can overflow WASM memory
    large_matrix
)

# Solution: Stream or chunk
def chunked_multiply(a, b, chunk_size=100):
    for i in range(0, len(a), chunk_size):
        chunk = [matrix_multiply(a[i:i+chunk_size], b) for _ in range(chunk_size)]
        yield chunk

Check Memory Usage:

import sys
a = [[1.0 for _ in range(10000)] for _ in range(10000)]
print(f"Matrix size: {sys.getsizeof(a) / 1024 / 1024:.1f} MB")
# If > 64MB, won't fit in WASM linear memory

5. Benchmark Your Code¶

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

def benchmark(operation_name, operation_func, *args):
    """Benchmark operation on both backends."""

    # Python fallback
    selector_py = BackendSelector(prefer_backend=Backend.PYTHON)
    start = time.perf_counter()
    result_py = selector_py.call_function(operation_name, *args)
    py_time = time.perf_counter() - start

    # WASM (if available)
    selector_wasm = BackendSelector(prefer_backend=Backend.WASM)
    start = time.perf_counter()
    result_wasm = selector_wasm.call_function(operation_name, *args)
    wasm_time = time.perf_counter() - start

    # Report
    speedup = py_time / wasm_time if wasm_time > 0 else float('inf')
    print(f"{operation_name}:")
    print(f"  Python:  {py_time*1000:8.2f} ms")
    print(f"  WASM:    {wasm_time*1000:8.2f} ms")
    print(f"  Speedup: {speedup:8.1f}x")

    return speedup

# Example
import numpy as np
a = np.random.random((500, 500)).tolist()
b = np.random.random((500, 500)).tolist()
benchmark("matrix_multiply", None, a, b)

6. Configuration Tuning¶

# Force specific backend via code
# Select backend explicitly in code
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
selector = BackendSelector(prefer_backend=Backend.WASM)   # force WASM
selector = BackendSelector(prefer_backend=Backend.PYTHON)  # force Python
selector = BackendSelector()                               # auto (default)

Note: The MULTILINGUAL_BACKEND environment variable and the enable_module_cache / enable_function_cache attributes on BackendSelector are not implemented in the current codebase. Use the prefer_backend constructor argument to select the backend.

Real-World Examples¶

Example 1: Matrix Multiplication¶

Problem: Multiply 1000×1000 matrices Baseline: 5 seconds (pure Python) Target: < 100ms

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data
size = 1000
a = [[i+j for j in range(size)] for i in range(size)]
b = [[i+j for j in range(size)] for i in range(size)]

# Auto-optimized (WASM if available)
selector = BackendSelector()

start = time.perf_counter()
result = selector.call_function("matrix_multiply", a, b)
elapsed = time.perf_counter() - start

if elapsed < 0.1:
    print(f"✓ PASSED: {elapsed*1000:.1f}ms (WASM enabled)")
else:
    print(f"⚠ SLOWER: {elapsed*1000:.1f}ms (Python fallback)")

Expected Results: - WASM enabled: 50ms ✓ - Python fallback: 5000ms (but still correct)

Example 2: JSON Data Processing¶

Problem: Parse 10MB JSON file, extract data, filter Baseline: 2 seconds Target: 200ms

import json
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data
data = [{"id": i, "value": i*2, "name": f"item{i}"} for i in range(100000)]
json_str = json.dumps(data)
print(f"JSON size: {len(json_str) / 1024 / 1024:.1f} MB")

# With WASM + fallback
selector = BackendSelector()

start = time.perf_counter()
parsed = selector.call_function("parse_json_simple", json_str)
elapsed = time.perf_counter() - start

if elapsed < 0.2:
    print(f"✓ GOOD: {elapsed*1000:.1f}ms")
elif elapsed < 2:
    print(f"⚠ OK: {elapsed*1000:.1f}ms (Python fallback)")
else:
    print(f"✗ SLOW: {elapsed*1000:.1f}ms (needs optimization)")

Example 3: Cryptographic Operations¶

Problem: Encrypt 100MB file Baseline: 100 seconds Target: 1 second

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data (simulated, don't actually create 100MB)
plaintext = "a" * 10000000  # 10MB
key = "secretkey"

selector = BackendSelector()

start = time.perf_counter()
encrypted = selector.call_function("xor_cipher", plaintext, key)
elapsed = time.perf_counter() - start

# Estimate for 100MB based on linear scaling
estimated_100mb = elapsed * 10

if estimated_100mb < 1:
    print(f"✓ EXCELLENT: {estimated_100mb:.1f}s for 100MB (WASM enabled)")
elif estimated_100mb < 10:
    print(f"⚠ ACCEPTABLE: {estimated_100mb:.1f}s (Python fallback)")
else:
    print(f"✗ TOO SLOW: {estimated_100mb:.1f}s (needs optimization)")

Performance Troubleshooting¶

Symptom: No Speedup Seen¶

from multilingualprogramming.runtime.backend_selector import BackendSelector

selector = BackendSelector()
print(f"WASM Available: {selector.is_wasm_available()}")

if not selector.is_wasm_available():
    print("Solution: pip install wasmtime")
    print("          or verify WASM files in package")

Silent fallback warning: When Backend.WASM is requested but no compiled .wasm binary exists for the requested function (e.g. because the WASM corpus is not yet built), BackendSelector silently falls through to the Python implementation with no error or warning. The call succeeds and returns a correct result, but WASM is not actually used. Always check selector.is_wasm_available() and confirm that the relevant .wasm file is present if you expect WASM execution.

Symptom: Slower Than Python¶

# This happens with small operations
# WASM call overhead (~0.031ms) > operation time (e.g. 0.01ms)

# Solution: Batch operations
# Instead of:
results = [fibonacci(5) for _ in range(100)]  # 100 small WASM calls

# Do:
# Compute something larger where WASM overhead is amortized
large_result = fibonacci(1000)  # Single call, overhead amortized
large_result = fibonacci(1000)  # Single heavy call, often faster

Symptom: Memory Errors with WASM¶

# WASM has 64MB linear memory limit (1024 pages × 64KB)
# If you hit this:
# 1. Check matrix sizes: max ~10000x10000 for floats
# 2. Stream processing instead of all-at-once
# 3. Use Python fallback for this operation
# 4. Use Backend.PYTHON for memory-intensive ops

from multilingualprogramming.runtime.backend_selector import Backend
selector = BackendSelector(prefer_backend=Backend.PYTHON)

Metrics to Monitor¶

Key Performance Indicators¶

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

class PerformanceMonitor:
    def __init__(self):
        self.measurements = []

    def measure(self, name, function, *args):
        """Measure operation performance."""
        selector = BackendSelector()

        # Warm up
        function(*args)

        # Measure
        start = time.perf_counter()
        result = function(*args)
        elapsed = time.perf_counter() - start

        self.measurements.append({
            'name': name,
            'time_ms': elapsed * 1000,
            'backend': 'WASM' if selector.is_wasm_available() else 'Python'
        })

        return result

    def report(self):
        """Print performance report."""
        print(f"\n{'Operation':<20} {'Time':<10} {'Backend':<10}")
        print("-" * 40)
        for m in self.measurements:
            print(f"{m['name']:<20} {m['time_ms']:>8.2f}ms {m['backend']:<10}")

# Usage
monitor = PerformanceMonitor()
monitor.measure("fibonacci(30)", fibonacci, 30)
monitor.measure("matrix_multiply(100x100)", matrix_multiply, a, b)
monitor.report()

Best Practices Summary¶

✅ DO: 1. Use WASM for operations > ~0.05ms 2. Batch operations to amortize overhead 3. Monitor actual performance with benchmarks 4. Use auto-detection in production 5. Test both backends 6. Profile before and after optimization

❌ DON'T: 1. Assume WASM is always faster 2. Over-optimize small operations 3. Ignore fallback path 4. Create massive data structures 5. Assume no overhead 6. Skip testing on target platform

Resources¶

Version: PyPI Distribution Final Status: Stable; validate performance in your target environment.