Skip to main content

Quickstart

Let’s generate your first verifiable synthetic dataset — right from your terminal. VeriSynth runs 100% locally, needs no GPU, and produces both a synthetic dataset and a cryptographic proof receipt for auditability.

1. Install VeriSynth Core

pip install verisynth-core
That’s all you need. VeriSynth ships as a self-contained Python package with zero external dependencies beyond standard data libraries.

2. Prepare your dataset

Start with any CSV file — for example, a small tabular dataset of patients, transactions, or IoT readings.
data/patients.csv
Example data:
patient_idagebmismokerhba1csystolic_bpdiastolic_bp
0016231.417.814890
0024528.606.113282
0033324.105.311674
You can also grab sample data here: https://github.com/VeriSynthAI/verisynth-core/tree/main/data

3. Generate synthetic data

Run VeriSynth directly from the CLI:
verisynth data/patients.csv -o out/ --rows 1000000 --seed 42
✅ This will produce two files:
📁 out/synthetic.csv   # Synthetic dataset
🧾 out/proof.json      # Cryptographic proof receipt

4. Inspect the proof receipt

Each run creates a verifiable JSON proof describing:
  • Dataset hashes (SHA-256)
  • Model configuration (Gaussian Copula)
  • Random seed (for reproducibility)
  • Privacy and correlation metrics
  • Merkle root for integrity checking
Example:
{
  "verisynth_version": "core-0.1.0",
  "metrics": { "corr_mean_abs_delta": 0.12, "naive_reid_risk": 0.01 },
  "input":  { "rows": 10, "sha256": "…82b7" },
  "output": { "rows": 1000000, "sha256": "…acb9" },
  "proof": "merkle_root: …c31"
}

5. Explore your synthetic dataset

The generated data preserves the statistical structure of your original dataset — correlations, distributions, and realistic relationships — without containing any real records. Try viewing it in pandas:
import pandas as pd
df = pd.read_csv("out/synthetic.csv")
df.head()

6. Reproduce your results

Because VeriSynth uses deterministic seeding (--seed), you can always reproduce the exact same synthetic dataset and proof receipt from the same input file.
verisynth data/patients.csv -o out/ --rows 1000000 --seed 42
Running this command again, with the same seed value will yield identical Merkle roots and metrics, proving the process is verifiable and consistent.