Backend Testing Strategy¶
This document describes how the project ensures comprehensive testing of both WASM and Python fallback backends.
Overview¶
The multilingual programming language supports two execution backends:
- WASM Backend: WebAssembly compilation for potentially large performance gains on compute-intensive operations
- Python Fallback: Pure Python implementations for guaranteed compatibility across all platforms
Both backends must produce identical results while potentially differing in performance.
GitHub Actions Workflow¶
Workflow: wasm-backends-test.yml¶
The workflow provides three layers of testing:
1. Backend Matrix Testing (backend-testing job)¶
Tests both backends across multiple platforms and Python versions.
Test Matrix: - Platforms: Linux (Ubuntu), macOS, Windows - Python versions: 3.12, 3.13 - Backends: WASM, Python fallback - Total combinations: 3 × 4 × 2 = 24 test configurations
Test Execution:
# WASM backend testing
pip install -e ".[wasm,performance]"
pytest tests/wasm_*.py -v
# Python fallback testing
pip install -e "." # No WASM dependencies
pytest tests/wasm_comprehensive_test.py::FallbackTestSuite -v
Test Categories: - ✅ Correctness tests (identical output validation) - ✅ Performance benchmarks (execution time measurement) - ✅ Fallback mechanism tests (graceful degradation) - ✅ Integration tests (component interaction) - ✅ Platform compatibility tests (OS-specific behavior) - ✅ Corpus project tests (real-world examples)
2. Cross-Backend Parity Validation (cross-backend-parity job)¶
Validates semantic equivalence between backends.
Validation Points: - Matrix operations produce identical numerical results - Cryptographic operations produce identical output - Numeric operations (Fibonacci, factorial, GCD) are semantically identical - JSON serialization/deserialization is consistent
# Example parity test
plaintext = 'Hello World'
key = 'secret'
encrypted = CryptoOperations.xor_cipher(plaintext, key)
decrypted = CryptoOperations.xor_decipher(encrypted, key)
assert decrypted == plaintext # Must work for both backends
3. Ecosystem Validation (ecosystem-validation job)¶
Tests real-world corpus projects with both backends.
Projects Tested: - Matrix Operations (multilingual, 4 languages) - Cryptography (multilingual, 4 languages) - JSON Parsing (multilingual, 4 languages) - Scientific Computing (multilingual, 4 languages) - Image Processing (multilingual, 4 languages)
Test Execution Locally¶
Running All Tests¶
# Test both backends (auto-detection)
pytest tests/
# Test only WASM backend
WASM_BACKEND=wasm pytest tests/
# Test only Python fallback
WASM_BACKEND=fallback pytest tests/
Running Specific Test Categories¶
# Run only correctness tests
pytest -m correctness
# Run only performance benchmarks
pytest -m performance
# Run only fallback-specific tests
pytest -m fallback
# Run only WASM-specific tests
pytest -m wasm
# Run corpus projects (slow tests)
pytest -m corpus --timeout=120
# Run integration tests
pytest -m integration
Running Tests with Coverage¶
# Full coverage report for both backends
pytest tests/ --cov=multilingualprogramming --cov-report=html
# Coverage report for fallback only
WASM_BACKEND=fallback pytest tests/ --cov=multilingualprogramming
# Coverage report for WASM only
WASM_BACKEND=wasm pytest tests/ --cov=multilingualprogramming
Test Configuration¶
pytest.ini¶
Configuration file defining: - Test discovery patterns - Test markers (wasm, fallback, correctness, performance, etc.) - Test timeouts (60s default, 120s for corpus tests) - Strict marker enforcement
conftest.py¶
Pytest fixtures and configuration:
Fixtures:
- backend_preference: Get backend preference from environment
- is_wasm_available: Check WASM availability
- backend_selector: BackendSelector instance with user preference
- wasm_backend_selector: Force WASM backend
- fallback_backend_selector: Force Python fallback
- python_fallbacks: All fallback implementations
- language_variants: Parameterized multilingual testing
- performance_timer: Measure operation duration
- assert_speedup: Validate WASM performance improvements
Markers:
- @pytest.mark.wasm: WASM-specific tests
- @pytest.mark.fallback: Fallback-specific tests
- @pytest.mark.correctness: Correctness validation
- @pytest.mark.performance: Performance benchmarks
- @pytest.mark.integration: Integration tests
- @pytest.mark.corpus: Real-world corpus projects
- @pytest.mark.slow: Long-running tests
Backend Detection¶
Environment Variables¶
Control backend selection during testing:
# Use auto-detection (try WASM, fallback to Python)
export WASM_BACKEND=auto
# Force WASM backend
export WASM_BACKEND=wasm
# Force Python fallback
export WASM_BACKEND=fallback
Runtime Detection¶
The BackendSelector automatically detects availability:
from multilingualprogramming.runtime.backend_selector import BackendSelector, Backend
# Auto-detection
selector = BackendSelector(prefer_backend=Backend.AUTO)
# ↓ Tries WASM first, falls back to Python if unavailable
# Force WASM
selector = BackendSelector(prefer_backend=Backend.WASM)
# ↓ Raises error if WASM not available
# Force Python
selector = BackendSelector(prefer_backend=Backend.PYTHON)
# ↓ Uses Python fallback (always available)
Test Results Reporting¶
Coverage Report¶
The workflow generates coverage reports for both backends:
Coverage Summary
────────────────
WASM Backend (Python 3.12): 95.2% coverage
Python Fallback (Python 3.12): 94.8% coverage
Cross-platform average: 94.5%
Performance Report¶
Automated performance tracking:
Performance Benchmarks (v0.4.0)
───────────────────────────────
Matrix multiply (100×100): 850μs (WASM) vs 2.5ms (Python) → 2.9x speedup
Fibonacci(100): 125μs (WASM) vs 8.2ms (Python) → 65.6x speedup
XOR cipher (10KB): 45μs (WASM) vs 850μs (Python) → 18.8x speedup
JSON stringify (1000 objects): 380μs (WASM) vs 5.2ms (Python) → 13.7x speedup
Parity Validation¶
Cross-backend semantic checks:
Parity Validation Results
──────────────────────────
✅ Matrix operations: Identical outputs
✅ Cryptography: Identical results (12/12 test cases)
✅ Numeric operations: Identical values
✅ JSON operations: Identical structures
✅ Platform compatibility: 3/3 platforms passing
Continuous Integration Gates¶
Required Checks¶
All PRs require: 1. ✅ All backend matrix tests passing (24 configurations) 2. ✅ Cross-backend parity validation passing 3. ✅ Minimum coverage thresholds met (>90%) 4. ✅ No performance regressions (>1.5x WASM speedup maintained)
Failing Checks¶
A PR is blocked if: - Any backend matrix test fails - Cross-backend parity validation fails - Code coverage drops below 90% - WASM tests fail on more than 2 platforms - Fallback tests fail (indicates regression)
Performance Expectations¶
Fallback Backend (Python)¶
Baseline performance - all operations run in pure Python:
Operation Time (Python)
─────────────────────────────────
Matrix 10×10 0.12ms
Matrix 100×100 2.5ms
Fibonacci(30) 8.2ms
XOR cipher (10KB) 850μs
JSON 100 objects 5.2ms
WASM Backend¶
Target performance (benchmark-dependent; varies by hardware/workload):
Operation Time (WASM) Speedup
──────────────────────────────────────────
Matrix 10×10 150μs 0.8x*
Matrix 100×100 850μs 2.9x
Fibonacci(30) 125μs 65.6x
XOR cipher (10KB) 45μs 18.8x
JSON 100 objects 380μs 13.7x
*Note: Small operations have WASM overhead; larger operations show significant gains
Troubleshooting¶
WASM Backend Not Available¶
If WASM tests are skipped or failing:
# Check WASM installation
python -c "import wasmtime; print('WASM available')"
# Install WASM dependencies
pip install -e ".[wasm,performance]"
# Force fallback testing
WASM_BACKEND=fallback pytest tests/
Performance Degradation¶
If WASM performance regresses:
- Check for recent codegen changes
- Validate WASM module compilation
- Compare against benchmark baseline
- Check system resource availability
# Run performance tests with verbose output
pytest tests/wasm_comprehensive_test.py::PerformanceBenchmarkSuite -vv
# Compare against previous run
pytest tests/wasm_comprehensive_test.py::PerformanceBenchmarkSuite --benchmark-only
Cross-Platform Issues¶
If tests fail on specific platforms:
- Check platform-specific environment variables
- Verify dependencies are installed
- Check for OS-specific bugs (path separators, line endings)
- Review platform-specific code paths
# Run platform compatibility tests
pytest tests/wasm_comprehensive_test.py::PlatformCompatibilityTestSuite -vv
Future Improvements¶
- Automated performance regression detection
- Machine learning-based performance prediction
- Flaky test detection and quarantine
- Distributed test execution across multiple machines
- Real-time performance dashboard
Last updated: 2026-02-22 Maintainer: John Samuel
For questions about the testing strategy, see: - WASM Architecture - WASM Troubleshooting - WASM FAQ