Performance Tuning Guide¶
Quick Start¶
For 10x Performance Gain¶
pip install multilingualprogramming[wasm]
# Then benchmark your workload; speedups are workload-dependent.
For Maximum Performance¶
Performance Optimization Levels¶
Level 0: Baseline (Python Only)¶
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
# Force pure Python
selector = BackendSelector(prefer_backend=Backend.PYTHON)
result = selector.call_function("matrix_multiply", a, b)
Speed: Baseline (reference) Use When: Testing, debugging, or WASM unavailable
Level 1: WASM with Auto-Detection (Recommended)¶
from multilingualprogramming.runtime.backend_selector import BackendSelector
# Auto-detect WASM or fallback
selector = BackendSelector() # Backend.AUTO is default
result = selector.call_function("matrix_multiply", a, b)
Speed: Can be significantly faster on compute-heavy ops (benchmark-dependent)
Use When: Production code, maximum portability
Requirement: pip install multilingualprogramming[wasm]
Level 2: NumPy-Accelerated Fallback¶
from multilingualprogramming.runtime.python_fallbacks import MatrixOperations
# Direct fallback with NumPy acceleration (if installed)
result = MatrixOperations.multiply(a, b) # Uses NumPy internally
Speed: 5-10x faster than pure Python (fallback)
Use When: WASM unavailable but NumPy installed
Requirement: pip install numpy
Level 3: Hybrid Execution (WASM + NumPy)¶
# This is automatic with Backend.AUTO
from multilingualprogramming.runtime.backend_selector import BackendSelector
selector = BackendSelector()
# Uses WASM if available (100x)
# Falls back to NumPy (10x)
# Falls back to pure Python (baseline)
result = selector.call_function("matrix_multiply", a, b)
Speed: 100x (WASM) → 10x (NumPy) → 1x (Python)
Use When: Production with varied environments
Requirements: pip install multilingualprogramming[performance]
Detailed Optimization Guide¶
1. Choose the Right Operation¶
High speedup potential - Use WASM/NumPy:
# Matrix multiplication (n > 100)
result = matrix_multiply(a, b) # Often much faster on large matrices
# Cryptographic operations
encrypted = xor_cipher(plaintext, key) # Can be much faster for large payloads
# Scientific computing
pi = estimate_pi_monte_carlo(1000000) # Can be much faster for large iterations
Moderate speedup (10x) - Use WASM/NumPy:
# JSON parsing (> 1MB)
data = parse_json_simple(large_json) # 10x faster
# Image processing
blurred = blur_simple(image) # 10x faster
Low speedup (<5x) - Avoid WASM overhead:
# Small operations (n < 10)
fib = fibonacci(5) # Too small, overhead > benefit
# Simple string ops
rev = reverse("hello") # Too small
2. Batch Operations¶
Instead of:
# Multiple small calls = multiple WASM calls = overhead overhead
for i in range(100):
result = fibonacci(5) # 100 small WASM calls
Do:
# Single large operation = single WASM call = amortize overhead
matrices = [generate_matrix(100) for _ in range(100)]
results = [matrix_multiply(m, m) for m in matrices] # Much faster
3. Data Structure Optimization¶
Matrix Operations:
# For 100x100 matrices
a = [[1.0 for _ in range(100)] for _ in range(100)]
b = [[2.0 for _ in range(100)] for _ in range(100)]
# Use larger matrices to get better speedup
result = matrix_multiply(a, b) # Often much faster on large matrices
# Small matrices have overhead
a_small = [[1, 2], [3, 4]]
b_small = [[5, 6], [7, 8]]
result = matrix_multiply(a_small, b_small) # Maybe 2x faster (overhead high)
JSON Parsing:
# For large JSON (> 1MB)
large_json = json.dumps([{"id": i, "data": range(100)} for i in range(10000)])
data = parse_json_simple(large_json) # 10x faster
# Small JSON has little benefit
tiny_json = '{"name": "Alice"}'
data = parse_json_simple(tiny_json) # Parity or slower
4. Memory Optimization¶
Prevent Memory Bloat:
# Don't store intermediate results
result = matrix_multiply(
matrix_multiply(a, b), # Don't need to store this
c
)
# Don't create large temporary arrays
result = matrix_multiply(
[generate_row(1000) for _ in range(1000)], # ← Can overflow WASM memory
large_matrix
)
# Solution: Stream or chunk
def chunked_multiply(a, b, chunk_size=100):
for i in range(0, len(a), chunk_size):
chunk = [matrix_multiply(a[i:i+chunk_size], b) for _ in range(chunk_size)]
yield chunk
Check Memory Usage:
import sys
a = [[1.0 for _ in range(10000)] for _ in range(10000)]
print(f"Matrix size: {sys.getsizeof(a) / 1024 / 1024:.1f} MB")
# If > 64MB, won't fit in WASM linear memory
5. Benchmark Your Code¶
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
def benchmark(operation_name, operation_func, *args):
"""Benchmark operation on both backends."""
# Python fallback
selector_py = BackendSelector(prefer_backend=Backend.PYTHON)
start = time.perf_counter()
result_py = selector_py.call_function(operation_name, *args)
py_time = time.perf_counter() - start
# WASM (if available)
selector_wasm = BackendSelector(prefer_backend=Backend.WASM)
start = time.perf_counter()
result_wasm = selector_wasm.call_function(operation_name, *args)
wasm_time = time.perf_counter() - start
# Report
speedup = py_time / wasm_time if wasm_time > 0 else float('inf')
print(f"{operation_name}:")
print(f" Python: {py_time*1000:8.2f} ms")
print(f" WASM: {wasm_time*1000:8.2f} ms")
print(f" Speedup: {speedup:8.1f}x")
return speedup
# Example
import numpy as np
a = np.random.random((500, 500)).tolist()
b = np.random.random((500, 500)).tolist()
benchmark("matrix_multiply", None, a, b)
6. Configuration Tuning¶
# Force specific backend via code
# Select backend explicitly in code
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
selector = BackendSelector(prefer_backend=Backend.WASM) # force WASM
selector = BackendSelector(prefer_backend=Backend.PYTHON) # force Python
selector = BackendSelector() # auto (default)
Note: The
MULTILINGUAL_BACKENDenvironment variable and theenable_module_cache/enable_function_cacheattributes onBackendSelectorare not implemented in the current codebase. Use theprefer_backendconstructor argument to select the backend.
Real-World Examples¶
Example 1: Matrix Multiplication¶
Problem: Multiply 1000×1000 matrices Baseline: 5 seconds (pure Python) Target: < 100ms
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector
# Create test data
size = 1000
a = [[i+j for j in range(size)] for i in range(size)]
b = [[i+j for j in range(size)] for i in range(size)]
# Auto-optimized (WASM if available)
selector = BackendSelector()
start = time.perf_counter()
result = selector.call_function("matrix_multiply", a, b)
elapsed = time.perf_counter() - start
if elapsed < 0.1:
print(f"✓ PASSED: {elapsed*1000:.1f}ms (WASM enabled)")
else:
print(f"⚠ SLOWER: {elapsed*1000:.1f}ms (Python fallback)")
Expected Results: - WASM enabled: 50ms ✓ - Python fallback: 5000ms (but still correct)
Example 2: JSON Data Processing¶
Problem: Parse 10MB JSON file, extract data, filter Baseline: 2 seconds Target: 200ms
import json
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector
# Create test data
data = [{"id": i, "value": i*2, "name": f"item{i}"} for i in range(100000)]
json_str = json.dumps(data)
print(f"JSON size: {len(json_str) / 1024 / 1024:.1f} MB")
# With WASM + fallback
selector = BackendSelector()
start = time.perf_counter()
parsed = selector.call_function("parse_json_simple", json_str)
elapsed = time.perf_counter() - start
if elapsed < 0.2:
print(f"✓ GOOD: {elapsed*1000:.1f}ms")
elif elapsed < 2:
print(f"⚠ OK: {elapsed*1000:.1f}ms (Python fallback)")
else:
print(f"✗ SLOW: {elapsed*1000:.1f}ms (needs optimization)")
Example 3: Cryptographic Operations¶
Problem: Encrypt 100MB file Baseline: 100 seconds Target: 1 second
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector
# Create test data (simulated, don't actually create 100MB)
plaintext = "a" * 10000000 # 10MB
key = "secretkey"
selector = BackendSelector()
start = time.perf_counter()
encrypted = selector.call_function("xor_cipher", plaintext, key)
elapsed = time.perf_counter() - start
# Estimate for 100MB based on linear scaling
estimated_100mb = elapsed * 10
if estimated_100mb < 1:
print(f"✓ EXCELLENT: {estimated_100mb:.1f}s for 100MB (WASM enabled)")
elif estimated_100mb < 10:
print(f"⚠ ACCEPTABLE: {estimated_100mb:.1f}s (Python fallback)")
else:
print(f"✗ TOO SLOW: {estimated_100mb:.1f}s (needs optimization)")
Performance Troubleshooting¶
Symptom: No Speedup Seen¶
from multilingualprogramming.runtime.backend_selector import BackendSelector
selector = BackendSelector()
print(f"WASM Available: {selector.is_wasm_available()}")
if not selector.is_wasm_available():
print("Solution: pip install wasmtime")
print(" or verify WASM files in package")
Silent fallback warning: When
Backend.WASMis requested but no compiled.wasmbinary exists for the requested function (e.g. because the WASM corpus is not yet built),BackendSelectorsilently falls through to the Python implementation with no error or warning. The call succeeds and returns a correct result, but WASM is not actually used. Always checkselector.is_wasm_available()and confirm that the relevant.wasmfile is present if you expect WASM execution.
Symptom: Slower Than Python¶
# This happens with small operations
# WASM call overhead (~0.031ms) > operation time (e.g. 0.01ms)
# Solution: Batch operations
# Instead of:
results = [fibonacci(5) for _ in range(100)] # 100 small WASM calls
# Do:
# Compute something larger where WASM overhead is amortized
large_result = fibonacci(1000) # Single call, overhead amortized
large_result = fibonacci(1000) # Single heavy call, often faster
Symptom: Memory Errors with WASM¶
# WASM has 64MB linear memory limit (1024 pages × 64KB)
# If you hit this:
# 1. Check matrix sizes: max ~10000x10000 for floats
# 2. Stream processing instead of all-at-once
# 3. Use Python fallback for this operation
# 4. Use Backend.PYTHON for memory-intensive ops
from multilingualprogramming.runtime.backend_selector import Backend
selector = BackendSelector(prefer_backend=Backend.PYTHON)
Metrics to Monitor¶
Key Performance Indicators¶
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
class PerformanceMonitor:
def __init__(self):
self.measurements = []
def measure(self, name, function, *args):
"""Measure operation performance."""
selector = BackendSelector()
# Warm up
function(*args)
# Measure
start = time.perf_counter()
result = function(*args)
elapsed = time.perf_counter() - start
self.measurements.append({
'name': name,
'time_ms': elapsed * 1000,
'backend': 'WASM' if selector.is_wasm_available() else 'Python'
})
return result
def report(self):
"""Print performance report."""
print(f"\n{'Operation':<20} {'Time':<10} {'Backend':<10}")
print("-" * 40)
for m in self.measurements:
print(f"{m['name']:<20} {m['time_ms']:>8.2f}ms {m['backend']:<10}")
# Usage
monitor = PerformanceMonitor()
monitor.measure("fibonacci(30)", fibonacci, 30)
monitor.measure("matrix_multiply(100x100)", matrix_multiply, a, b)
monitor.report()
Best Practices Summary¶
✅ DO: 1. Use WASM for operations > ~0.05ms 2. Batch operations to amortize overhead 3. Monitor actual performance with benchmarks 4. Use auto-detection in production 5. Test both backends 6. Profile before and after optimization
❌ DON'T: 1. Assume WASM is always faster 2. Over-optimize small operations 3. Ignore fallback path 4. Create massive data structures 5. Assume no overhead 6. Skip testing on target platform
Resources¶
Version: PyPI Distribution Final Status: Stable; validate performance in your target environment.