Skip to main content

Introduction

VeriSynth is an open-source framework for verifiable synthetic data generation — designed to help teams safely share and analyze sensitive datasets without exposing real individuals. Built for developers, data scientists, and privacy researchers, VeriSynth transforms real datasets into statistically faithful synthetic versions, complete with cryptographic proof receipts that anyone can verify independently.

Why VeriSynth?

Synthetic data isn’t new — but trust is. Most synthetic data tools are black boxes. You’re told to “trust” that the data is private and realistic, but there’s no way to verify it. VeriSynth changes that. Verifiable — Each run produces a tamper-evident proof.json receipt with dataset hashes, model parameters, and metrics.
Transparent — Built entirely in open source under the MIT license.
Reproducible — Deterministic seeds make every result traceable and reproducible.
Private by design — No real records are ever copied, reused, or identifiable.

How It Works (High Level)

VeriSynth learns the statistical relationships within your dataset — not the identities. Example:
If real data shows that older individuals with high BMI are more likely to have elevated blood pressure, VeriSynth preserves those patterns without retaining any real patient data.
The result: synthetic data that behaves like the real world, ready for analysis, testing, or AI model training.
# Generate 1M synthetic rows locally
pip install verisynth-core
verisynth data/patients.csv -o out/ --rows 1000000 --seed 42
This produces:
📁 out/synthetic.csv
🧾 out/proof.json

Proof Receipts

Every synthesis run outputs a JSON proof file containing:
  • Input/output dataset hashes (SHA-256)
  • Correlation & privacy metrics
  • Model version, parameters, and random seed
  • Merkle root for integrity verification
Example:
{
  "verisynth_version": "core-0.1.0",
  "metrics": {
    "corr_mean_abs_delta": 0.12,
    "naive_reid_risk": 0.01
  },
  "input":  { "rows": 10, "sha256": "…82b7" },
  "output": { "rows": 1000000, "sha256": "…acb9" },
  "proof": "merkle_root: …c31"
}

Trusted Foundations

VeriSynth is powered by:
  • Gaussian Copula for tabular synthesis
  • SHA-256 / Merkle trees for integrity proofs
  • Deterministic seeding for reproducibility
  • Python-first design for extensibility and CLI simplicity

Use Cases

  • Training ML models without exposing real data
  • Sharing medical or financial datasets securely
  • Benchmarking AI systems with realistic-but-fake data
  • Creating test environments for analytics pipelines

VeriSynth Core is open-source (MIT) and runs 100% locally. No signup · No GPUs needed · On-prem / VPC friendly.