Introduce Abstraction Boundaries
Place an interface or protocol between concrete classes so both the legacy and new implementations can be tested through the same contract. This lets you run the same parity test against both systems.
Parity testing answers a single question: does the new implementation produce the same outputs as the legacy system for the same inputs? This is the bridge between “we built it” and “we can ship it.” ModernizeSpec’s parity-tests.json captures the test cases, expected outputs, and confidence scores that determine when extraction is complete.
The concept of characterization tests — first described in Michael Feathers’ Working Effectively with Legacy Code (Chapter 13) — flips the usual testing assumption. Instead of testing what the code should do according to a specification, you test what it actually does in practice.
In a modernization context, characterization tests work like this:
The characterization test does not judge whether the behavior is correct. It captures reality. If the legacy system rounds tax to 2 decimal places when it should use 4, the characterization test asserts 2 decimal places. The new system must reproduce this behavior (or the team must explicitly decide to fix it and document the deviation).
| Approach | Tests Against | Risk |
|---|---|---|
| Specification tests | What the system should do (requirements docs) | Requirements may be outdated, incomplete, or wrong |
| Characterization tests | What the system actually does (runtime output) | Captures bugs as “expected” behavior |
For migration, characterization tests are safer. The legacy system has been running in production — its behavior, including its bugs, is what users depend on. Changing behavior during migration introduces risk that is separate from the extraction itself.
When a characterization test captures a known bug:
parity-tests.json with a knownDeviation fieldThe most scalable approach to parity testing is table-driven: a matrix of inputs and expected outputs, run through both implementations.
| Input | Legacy Output | New Output | Match |
|---|---|---|---|
| Invoice: 3 items, GST 18% | Total: 11,800.00, Tax: 1,800.00 | Total: 11,800.00, Tax: 1,800.00 | Pass |
| Invoice: 1 item, exempt | Total: 500.00, Tax: 0.00 | Total: 500.00, Tax: 0.00 | Pass |
| Invoice: discount + tax | Total: 9,440.00, Tax: 1,440.00 | Total: 9,440.00, Tax: 1,440.00 | Pass |
| Invoice: multi-currency | Total: 850.00 USD, Tax: 153.00 | Total: 850.00 USD, Tax: 153.00 | Pass |
Extract real inputs and outputs from the legacy system’s database or logs:
Advantage: Captures real-world scenarios including edge cases you would never think to write.
Risk: Requires anonymization for PII.
Build test cases by hand based on business rules:
Advantage: Systematic coverage of known rules.
Risk: Misses unknown rules and implicit behaviors.
Generate random inputs within valid ranges and record legacy outputs:
Advantage: Discovers edge cases that manual testing misses.
Risk: May generate unrealistic combinations. Requires the legacy system to be callable programmatically.
parity-tests.jsonEach row in the table becomes an entry in parity-tests.json:
{ "id": "tax-calc-gst-18", "module": "taxation", "description": "Standard GST 18% on 3-item invoice", "input": { "items": [ { "amount": 5000 }, { "amount": 3000 }, { "amount": 2000 } ], "taxRate": 0.18 }, "expectedOutput": { "subtotal": 10000.00, "taxAmount": 1800.00, "total": 11800.00 }, "source": "production-capture", "status": "passing"}Behavioral snapshots are a heavier-weight version of characterization tests. Instead of testing individual functions, they capture the full response of the legacy system to a realistic request.
| Artifact | How to Capture | Storage |
|---|---|---|
| API responses | Record HTTP response body, headers, status | JSON files |
| Database writes | Capture rows written after an operation | SQL or JSON fixtures |
| Computed values | Log intermediate calculations | Structured log entries |
| Side effects | Record emails sent, events emitted, files written | Event log |
Store snapshots as “golden files” — reference outputs that the new system must reproduce exactly.
fixtures/├── tax-calculation/│ ├── input-001.json # Input to the function│ ├── golden-001.json # Expected output (captured from legacy)│ ├── input-002.json│ └── golden-002.json└── gl-posting/ ├── input-001.json └── golden-001.json # Expected GL entriesThe test runner:
input-*.jsongolden-*.jsonWhen the new system intentionally deviates from legacy behavior (bug fixes, improvements):
parity-tests.json with knownDeviationNot all parity is equal. A module with 50 passing tests on happy paths but zero tests on error paths has limited real confidence. Confidence scoring quantifies how trustworthy the parity evidence is.
| Dimension | Weight | Measurement |
|---|---|---|
| Happy path coverage | 1x | Percentage of normal workflows tested |
| Error path coverage | 2x | Percentage of error/exception paths tested |
| Edge case coverage | 2x | Boundary values, empty inputs, maximum sizes |
| Data variety | 1.5x | Diversity of test inputs (currencies, date ranges, entity types) |
| Production traffic representation | 3x | How closely test inputs match actual production usage patterns |
Error paths and production representation are weighted highest because they are where surprises emerge in production.
| Score | Label | Meaning | Decision |
|---|---|---|---|
| 0-30 | Low | Minimal testing, major gaps | Do not proceed to shadow mode |
| 31-60 | Moderate | Core paths tested, gaps in edges | Proceed with caution, add tests |
| 61-85 | High | Comprehensive testing, few gaps | Ready for shadow mode |
| 86-100 | Very High | Exhaustive testing including production traffic replay | Ready for production cutover |
Confidence scores are recorded per module in parity-tests.json:
{ "module": "taxation", "confidence": { "overall": 78, "happyPath": 95, "errorPath": 45, "edgeCases": 72, "dataVariety": 80, "productionRepresentation": 60 }}This makes confidence transparent to AI agents and team leads reviewing migration progress.
Once parity is proven, the tests serve a second purpose: regression guards. Any future change to the new system that breaks an established parity test must be intentional and documented.
Capture baseline ──▶ Prove parity ──▶ Guard regressions ──▶ Retire │ (when legacy is │ fully decommissioned)Parity tests are retired only after the legacy system is completely removed. Until then, they remain active as regression guards.
Run parity tests on every pull request that touches an extracted module:
src/taxation/ → run taxation parity testsknownDeviation entryLegacy code often resists testing because of hard-coded dependencies, framework coupling, and deeply nested call chains. Michael Feathers catalogs 24 dependency-breaking techniques in Working Effectively with Legacy Code (Chapter 25). The core strategies relevant to migration parity fall into three categories:
Introduce Abstraction Boundaries
Place an interface or protocol between concrete classes so both the legacy and new implementations can be tested through the same contract. This lets you run the same parity test against both systems.
Isolate New Behavior
When adding recording hooks or comparison logic to a legacy method, write the new code in a separate method or wrapper rather than modifying the original. This preserves the original behavior while enabling side-by-side output capture.
Replace Hard-Coded Dependencies
Pass dependencies through constructors, factory methods, or configuration rather than instantiating them internally. During parity testing, swap in test doubles that capture intermediate state for comparison.
For the full catalog of techniques, see Feathers’ Working Effectively with Legacy Code, Chapter 25. The key insight for modernization: these techniques create seams for testing without modifying the legacy system’s behavior — which is exactly what you need when building characterization tests.
Team Zeta in the PearlThoughts internship independently achieved 100% parity on tax calculation using table-driven tests:
| Scenario | Python Output | Go Output | Match |
|---|---|---|---|
| GST 18% on single item | Tax: 1,800.00 | Tax: 1,800.00 | Pass |
| GST 18% + CESS 1% compound | Tax: 1,918.00 | Tax: 1,918.00 | Pass |
| Inclusive pricing (tax-in-price) | Net: 8,474.58 | Net: 8,474.58 | Pass |
| Multi-rate (5% + 18% items) | Tax: 1,150.00 | Tax: 1,150.00 | Pass |
| Zero-rated export | Tax: 0.00 | Tax: 0.00 | Pass |
They captured Python outputs first, then built Go implementations until every row matched. No specification documents were needed — the Python system was the specification.