> ## Documentation Index
> Fetch the complete documentation index at: https://docs.verisynth.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Introduction to VeriSynth — verifiable synthetic data generation for AI, analytics, and privacy.

# Introduction

**VeriSynth** is an open-source framework for **verifiable synthetic data generation** — designed to help teams safely share and analyze sensitive datasets without exposing real individuals.

Built for developers, data scientists, and privacy researchers, VeriSynth transforms real datasets into **statistically faithful** synthetic versions, complete with **cryptographic proof receipts** that anyone can verify independently.

***

## Why VeriSynth?

Synthetic data isn’t new — but **trust** is.

Most synthetic data tools are black boxes. You’re told to “trust” that the data is private and realistic, but there’s no way to verify it. VeriSynth changes that.

✅ **Verifiable** — Each run produces a tamper-evident `proof.json` receipt with dataset hashes, model parameters, and metrics.\
✅ **Transparent** — Built entirely in open source under the MIT license.\
✅ **Reproducible** — Deterministic seeds make every result traceable and reproducible.\
✅ **Private by design** — No real records are ever copied, reused, or identifiable.

***

## How It Works (High Level)

VeriSynth learns the **statistical relationships** within your dataset — not the identities.

Example:

> If real data shows that *older individuals with high BMI are more likely to have elevated blood pressure*, VeriSynth preserves those patterns without retaining any real patient data.

The result: **synthetic data that behaves like the real world**, ready for analysis, testing, or AI model training.

```bash theme={null}
# Generate 1M synthetic rows locally
pip install verisynth-core
verisynth data/patients.csv -o out/ --rows 1000000 --seed 42
```

This produces:

```
📁 out/synthetic.csv
🧾 out/proof.json
```

***

## Proof Receipts

Every synthesis run outputs a JSON proof file containing:

* Input/output dataset hashes (SHA-256)
* Correlation & privacy metrics
* Model version, parameters, and random seed
* Merkle root for integrity verification

Example:

```json theme={null}
{
  "verisynth_version": "core-0.1.0",
  "metrics": {
    "corr_mean_abs_delta": 0.12,
    "naive_reid_risk": 0.01
  },
  "input":  { "rows": 10, "sha256": "…82b7" },
  "output": { "rows": 1000000, "sha256": "…acb9" },
  "proof": "merkle_root: …c31"
}
```

***

## Trusted Foundations

VeriSynth is powered by:

* **Gaussian Copula** for tabular synthesis
* **SHA-256 / Merkle trees** for integrity proofs
* **Deterministic seeding** for reproducibility
* **Python-first design** for extensibility and CLI simplicity

***

## Use Cases

* Training ML models without exposing real data
* Sharing medical or financial datasets securely
* Benchmarking AI systems with realistic-but-fake data
* Creating test environments for analytics pipelines

***

> VeriSynth Core is open-source (MIT) and runs 100% locally.
> No signup · No GPUs needed · On-prem / VPC friendly.
