Introduction
VeriSynth is an open-source framework for verifiable synthetic data generation — designed to help teams safely share and analyze sensitive datasets without exposing real individuals. Built for developers, data scientists, and privacy researchers, VeriSynth transforms real datasets into statistically faithful synthetic versions, complete with cryptographic proof receipts that anyone can verify independently.Why VeriSynth?
Synthetic data isn’t new — but trust is. Most synthetic data tools are black boxes. You’re told to “trust” that the data is private and realistic, but there’s no way to verify it. VeriSynth changes that. ✅ Verifiable — Each run produces a tamper-evidentproof.json receipt with dataset hashes, model parameters, and metrics.✅ Transparent — Built entirely in open source under the MIT license.
✅ Reproducible — Deterministic seeds make every result traceable and reproducible.
✅ Private by design — No real records are ever copied, reused, or identifiable.
How It Works (High Level)
VeriSynth learns the statistical relationships within your dataset — not the identities. Example:If real data shows that older individuals with high BMI are more likely to have elevated blood pressure, VeriSynth preserves those patterns without retaining any real patient data.The result: synthetic data that behaves like the real world, ready for analysis, testing, or AI model training.
Proof Receipts
Every synthesis run outputs a JSON proof file containing:- Input/output dataset hashes (SHA-256)
- Correlation & privacy metrics
- Model version, parameters, and random seed
- Merkle root for integrity verification
Trusted Foundations
VeriSynth is powered by:- Gaussian Copula for tabular synthesis
- SHA-256 / Merkle trees for integrity proofs
- Deterministic seeding for reproducibility
- Python-first design for extensibility and CLI simplicity
Use Cases
- Training ML models without exposing real data
- Sharing medical or financial datasets securely
- Benchmarking AI systems with realistic-but-fake data
- Creating test environments for analytics pipelines
VeriSynth Core is open-source (MIT) and runs 100% locally. No signup · No GPUs needed · On-prem / VPC friendly.