Skip to content

Performance Tuning Guide

Quick Start

For 10x Performance Gain

pip install multilingualprogramming[wasm]
# Then benchmark your workload; speedups are workload-dependent.

For Maximum Performance

pip install multilingualprogramming[performance]
# Adds NumPy optimization for fallback paths

Performance Optimization Levels

Level 0: Baseline (Python Only)

from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

# Force pure Python
selector = BackendSelector(prefer_backend=Backend.PYTHON)
result = selector.call_function("matrix_multiply", a, b)

Speed: Baseline (reference) Use When: Testing, debugging, or WASM unavailable


from multilingualprogramming.runtime.backend_selector import BackendSelector

# Auto-detect WASM or fallback
selector = BackendSelector()  # Backend.AUTO is default
result = selector.call_function("matrix_multiply", a, b)

Speed: Can be significantly faster on compute-heavy ops (benchmark-dependent) Use When: Production code, maximum portability Requirement: pip install multilingualprogramming[wasm]


Level 2: NumPy-Accelerated Fallback

from multilingualprogramming.runtime.python_fallbacks import MatrixOperations

# Direct fallback with NumPy acceleration (if installed)
result = MatrixOperations.multiply(a, b)  # Uses NumPy internally

Speed: 5-10x faster than pure Python (fallback) Use When: WASM unavailable but NumPy installed Requirement: pip install numpy


Level 3: Hybrid Execution (WASM + NumPy)

# This is automatic with Backend.AUTO
from multilingualprogramming.runtime.backend_selector import BackendSelector

selector = BackendSelector()
# Uses WASM if available (100x)
# Falls back to NumPy (10x)
# Falls back to pure Python (baseline)
result = selector.call_function("matrix_multiply", a, b)

Speed: 100x (WASM) → 10x (NumPy) → 1x (Python) Use When: Production with varied environments Requirements: pip install multilingualprogramming[performance]


Detailed Optimization Guide

1. Choose the Right Operation

High speedup potential - Use WASM/NumPy:

# Matrix multiplication (n > 100)
result = matrix_multiply(a, b)  # Often much faster on large matrices

# Cryptographic operations
encrypted = xor_cipher(plaintext, key)  # Can be much faster for large payloads

# Scientific computing
pi = estimate_pi_monte_carlo(1000000)  # Can be much faster for large iterations

Moderate speedup (10x) - Use WASM/NumPy:

# JSON parsing (> 1MB)
data = parse_json_simple(large_json)  # 10x faster

# Image processing
blurred = blur_simple(image)  # 10x faster

Low speedup (<5x) - Avoid WASM overhead:

# Small operations (n < 10)
fib = fibonacci(5)  # Too small, overhead > benefit

# Simple string ops
rev = reverse("hello")  # Too small


2. Batch Operations

Instead of:

# Multiple small calls = multiple WASM calls = overhead overhead
for i in range(100):
    result = fibonacci(5)  # 100 small WASM calls

Do:

# Single large operation = single WASM call = amortize overhead
matrices = [generate_matrix(100) for _ in range(100)]
results = [matrix_multiply(m, m) for m in matrices]  # Much faster


3. Data Structure Optimization

Matrix Operations:

# For 100x100 matrices
a = [[1.0 for _ in range(100)] for _ in range(100)]
b = [[2.0 for _ in range(100)] for _ in range(100)]

# Use larger matrices to get better speedup
result = matrix_multiply(a, b)  # Often much faster on large matrices

# Small matrices have overhead
a_small = [[1, 2], [3, 4]]
b_small = [[5, 6], [7, 8]]
result = matrix_multiply(a_small, b_small)  # Maybe 2x faster (overhead high)

JSON Parsing:

# For large JSON (> 1MB)
large_json = json.dumps([{"id": i, "data": range(100)} for i in range(10000)])
data = parse_json_simple(large_json)  # 10x faster

# Small JSON has little benefit
tiny_json = '{"name": "Alice"}'
data = parse_json_simple(tiny_json)  # Parity or slower


4. Memory Optimization

Prevent Memory Bloat:

# Don't store intermediate results
result = matrix_multiply(
    matrix_multiply(a, b),  # Don't need to store this
    c
)

# Don't create large temporary arrays
result = matrix_multiply(
    [generate_row(1000) for _ in range(1000)],  # ← Can overflow WASM memory
    large_matrix
)

# Solution: Stream or chunk
def chunked_multiply(a, b, chunk_size=100):
    for i in range(0, len(a), chunk_size):
        chunk = [matrix_multiply(a[i:i+chunk_size], b) for _ in range(chunk_size)]
        yield chunk

Check Memory Usage:

import sys
a = [[1.0 for _ in range(10000)] for _ in range(10000)]
print(f"Matrix size: {sys.getsizeof(a) / 1024 / 1024:.1f} MB")
# If > 64MB, won't fit in WASM linear memory


5. Benchmark Your Code

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

def benchmark(operation_name, operation_func, *args):
    """Benchmark operation on both backends."""

    # Python fallback
    selector_py = BackendSelector(prefer_backend=Backend.PYTHON)
    start = time.perf_counter()
    result_py = selector_py.call_function(operation_name, *args)
    py_time = time.perf_counter() - start

    # WASM (if available)
    selector_wasm = BackendSelector(prefer_backend=Backend.WASM)
    start = time.perf_counter()
    result_wasm = selector_wasm.call_function(operation_name, *args)
    wasm_time = time.perf_counter() - start

    # Report
    speedup = py_time / wasm_time if wasm_time > 0 else float('inf')
    print(f"{operation_name}:")
    print(f"  Python:  {py_time*1000:8.2f} ms")
    print(f"  WASM:    {wasm_time*1000:8.2f} ms")
    print(f"  Speedup: {speedup:8.1f}x")

    return speedup

# Example
import numpy as np
a = np.random.random((500, 500)).tolist()
b = np.random.random((500, 500)).tolist()
benchmark("matrix_multiply", None, a, b)

6. Configuration Tuning

# Force specific backend via code
# Select backend explicitly in code
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
selector = BackendSelector(prefer_backend=Backend.WASM)   # force WASM
selector = BackendSelector(prefer_backend=Backend.PYTHON)  # force Python
selector = BackendSelector()                               # auto (default)

Note: The MULTILINGUAL_BACKEND environment variable and the enable_module_cache / enable_function_cache attributes on BackendSelector are not implemented in the current codebase. Use the prefer_backend constructor argument to select the backend.


Real-World Examples

Example 1: Matrix Multiplication

Problem: Multiply 1000×1000 matrices Baseline: 5 seconds (pure Python) Target: < 100ms

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data
size = 1000
a = [[i+j for j in range(size)] for i in range(size)]
b = [[i+j for j in range(size)] for i in range(size)]

# Auto-optimized (WASM if available)
selector = BackendSelector()

start = time.perf_counter()
result = selector.call_function("matrix_multiply", a, b)
elapsed = time.perf_counter() - start

if elapsed < 0.1:
    print(f"✓ PASSED: {elapsed*1000:.1f}ms (WASM enabled)")
else:
    print(f"⚠ SLOWER: {elapsed*1000:.1f}ms (Python fallback)")

Expected Results: - WASM enabled: 50ms ✓ - Python fallback: 5000ms (but still correct)


Example 2: JSON Data Processing

Problem: Parse 10MB JSON file, extract data, filter Baseline: 2 seconds Target: 200ms

import json
import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data
data = [{"id": i, "value": i*2, "name": f"item{i}"} for i in range(100000)]
json_str = json.dumps(data)
print(f"JSON size: {len(json_str) / 1024 / 1024:.1f} MB")

# With WASM + fallback
selector = BackendSelector()

start = time.perf_counter()
parsed = selector.call_function("parse_json_simple", json_str)
elapsed = time.perf_counter() - start

if elapsed < 0.2:
    print(f"✓ GOOD: {elapsed*1000:.1f}ms")
elif elapsed < 2:
    print(f"⚠ OK: {elapsed*1000:.1f}ms (Python fallback)")
else:
    print(f"✗ SLOW: {elapsed*1000:.1f}ms (needs optimization)")

Example 3: Cryptographic Operations

Problem: Encrypt 100MB file Baseline: 100 seconds Target: 1 second

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector

# Create test data (simulated, don't actually create 100MB)
plaintext = "a" * 10000000  # 10MB
key = "secretkey"

selector = BackendSelector()

start = time.perf_counter()
encrypted = selector.call_function("xor_cipher", plaintext, key)
elapsed = time.perf_counter() - start

# Estimate for 100MB based on linear scaling
estimated_100mb = elapsed * 10

if estimated_100mb < 1:
    print(f"✓ EXCELLENT: {estimated_100mb:.1f}s for 100MB (WASM enabled)")
elif estimated_100mb < 10:
    print(f"⚠ ACCEPTABLE: {estimated_100mb:.1f}s (Python fallback)")
else:
    print(f"✗ TOO SLOW: {estimated_100mb:.1f}s (needs optimization)")

Performance Troubleshooting

Symptom: No Speedup Seen

from multilingualprogramming.runtime.backend_selector import BackendSelector

selector = BackendSelector()
print(f"WASM Available: {selector.is_wasm_available()}")

if not selector.is_wasm_available():
    print("Solution: pip install wasmtime")
    print("          or verify WASM files in package")

Silent fallback warning: When Backend.WASM is requested but no compiled .wasm binary exists for the requested function (e.g. because the WASM corpus is not yet built), BackendSelector silently falls through to the Python implementation with no error or warning. The call succeeds and returns a correct result, but WASM is not actually used. Always check selector.is_wasm_available() and confirm that the relevant .wasm file is present if you expect WASM execution.

Symptom: Slower Than Python

# This happens with small operations
# WASM call overhead (~0.031ms) > operation time (e.g. 0.01ms)

# Solution: Batch operations
# Instead of:
results = [fibonacci(5) for _ in range(100)]  # 100 small WASM calls

# Do:
# Compute something larger where WASM overhead is amortized
large_result = fibonacci(1000)  # Single call, overhead amortized
large_result = fibonacci(1000)  # Single heavy call, often faster

Symptom: Memory Errors with WASM

# WASM has 64MB linear memory limit (1024 pages × 64KB)
# If you hit this:
# 1. Check matrix sizes: max ~10000x10000 for floats
# 2. Stream processing instead of all-at-once
# 3. Use Python fallback for this operation
# 4. Use Backend.PYTHON for memory-intensive ops

from multilingualprogramming.runtime.backend_selector import Backend
selector = BackendSelector(prefer_backend=Backend.PYTHON)

Metrics to Monitor

Key Performance Indicators

import time
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend

class PerformanceMonitor:
    def __init__(self):
        self.measurements = []

    def measure(self, name, function, *args):
        """Measure operation performance."""
        selector = BackendSelector()

        # Warm up
        function(*args)

        # Measure
        start = time.perf_counter()
        result = function(*args)
        elapsed = time.perf_counter() - start

        self.measurements.append({
            'name': name,
            'time_ms': elapsed * 1000,
            'backend': 'WASM' if selector.is_wasm_available() else 'Python'
        })

        return result

    def report(self):
        """Print performance report."""
        print(f"\n{'Operation':<20} {'Time':<10} {'Backend':<10}")
        print("-" * 40)
        for m in self.measurements:
            print(f"{m['name']:<20} {m['time_ms']:>8.2f}ms {m['backend']:<10}")

# Usage
monitor = PerformanceMonitor()
monitor.measure("fibonacci(30)", fibonacci, 30)
monitor.measure("matrix_multiply(100x100)", matrix_multiply, a, b)
monitor.report()

Best Practices Summary

DO: 1. Use WASM for operations > ~0.05ms 2. Batch operations to amortize overhead 3. Monitor actual performance with benchmarks 4. Use auto-detection in production 5. Test both backends 6. Profile before and after optimization

DON'T: 1. Assume WASM is always faster 2. Over-optimize small operations 3. Ignore fallback path 4. Create massive data structures 5. Assume no overhead 6. Skip testing on target platform


Resources


Version: PyPI Distribution Final Status: Stable; validate performance in your target environment.