Random Data Generation for Software Testing
Generating realistic random test data improves code coverage and catches edge cases that manual test data misses. This guide covers strategies for different data types and testing scenarios.
Key Takeaways
- Static test fixtures test the same paths repeatedly.
- Generate strings of varying lengths (empty, 1 char, max length, beyond max).
- Faker** (Python/JS): Realistic fake names, addresses, companies, text
- Always seed your random generator with a fixed value in CI.
κ°μ§ λ°μ΄ν° μμ±κΈ°
ν μ€νΈμ© κ°μ§ λ°μ΄ν° μμ±
Why Random Test Data
Static test fixtures test the same paths repeatedly. Random data generation (also called fuzzing or property-based testing) explores a wider input space, uncovering bugs that fixed test cases miss: buffer overflows, encoding errors, and boundary conditions.
Data Types and Strategies
Strings
Generate strings of varying lengths (empty, 1 char, max length, beyond max). Include unicode characters, emoji, null bytes, RTL text, and SQL injection patterns to test input handling.
Numbers
Test boundary values: 0, -1, MAX_INT, MIN_INT, NaN, Infinity, very large floats, and numbers with many decimal places. These edge cases frequently cause arithmetic overflows.
Dates
Generate dates across time zones, leap years (Feb 29), century boundaries (2000, 2100), and DST transitions. Include timestamps at epoch (0), negative epoch, and far-future dates.
Structured Data
| Data Type | Strategy |
|---|---|
| Email addresses | Valid format + edge cases (very long local parts, special chars) |
| Phone numbers | Various international formats, with/without country codes |
| URLs | Valid + malformed + extremely long paths |
| JSON | Valid + deeply nested + circular refs + large arrays |
Tools and Libraries
- Faker (Python/JS): Realistic fake names, addresses, companies, text
- Hypothesis (Python): Property-based testing with automatic shrinking
- fast-check (TypeScript): Property-based testing for JS/TS
- QuickCheck (Haskell, ported to many languages): The original property-based testing library
Reproducibility
Always seed your random generator with a fixed value in CI. Log the seed so that any failure can be reproduced. A random test that cannot be reproduced provides no actionable information.