Skip to main content

CLI Commands

The VeriSynth CLI is a lightweight, Python-based command-line interface for generating synthetic datasets and producing verifiable proof receipts. All commands run offline and are designed to integrate easily with Python, CI/CD pipelines, and local analytics workflows.

Base Command: verisynth

The base command generates synthetic data and outputs both:
  • A synthetic dataset (synthetic.csv)
  • A proof receipt (proof.json)
verisynth <input_csv> -o <output_dir> [options]
Example:
verisynth data/patients.csv -o out/ --rows 1000000 --seed 42
Output:
📁 out/synthetic.csv
🧾 out/proof.json
Result summary:
VeriSynth — Synthetic Data Report
========================================
Input: data/patients.csv
Output: out/synthetic.csv
Engine: GaussianCopula | Seed: 42
Fidelity: corr Δ=0.23 | Privacy risk=0.0
Proof: out/proof.json (Merkle verified)

Global Options

FlagDescription
--inputPath to input CSV file
--outputOutput directory for synthetic data and proof
--rowsNumber of synthetic rows to generate (default: 1000)
--seedRandom seed for deterministic reproducibility
--schemaPath to YAML schema configuration file (optional)

Schema Configuration

VeriSynth supports explicit field mapping and exclusion through YAML schema configuration files. This gives you fine-grained control over which fields to synthesize and how to handle data types.

Schema Configuration Format

exclude: ["patient_id", "address"]
types:
  age: int
  bmi: float
  smoker: bool
  hba1c: float
model:
  engine: GaussianCopula
  seed: 42

Configuration Options

  • exclude: List of field names to exclude from synthesis (e.g., IDs, addresses)
  • types: Explicit type mappings for fields (supports: int, float, bool, str)
  • model: Model configuration including engine and seed

Benefits

  • Privacy: Exclude sensitive identifiers and PII
  • Control: Explicit type handling instead of automatic detection
  • Reproducibility: Schema configuration is included in proof receipts
  • Validation: Built-in validation ensures configuration correctness

Example Workflows

Generate 1M rows from a healthcare dataset

verisynth data/patients.csv -o out/ --rows 1000000 --seed 42

Disable proof generation (for benchmarking only)

verisynth data/patients.csv -o out/ --rows 50000 --no-proof

Use a different model (future feature)

verisynth data/transactions.csv -o out/ --model ctgan --rows 250000