# Synthetic Data Generation Plugin

This plugin provides comprehensive tools for generating synthetic tabular data in Dataiku. It includes three complementary recipes:

1. **Privacy-Preserving Synthesis**: Generate differentially private synthetic datasets from sensitive data using state-of-the-art algorithms (DP-CTGAN, PATE-CTGAN, MWEM) powered by SmartNoise.

2. **Universal Data Generator**: Build synthetic datasets from scratch using a declarative schema with statistical distributions, categorical sampling, date ranges, Faker providers, and advanced correlation modeling for regression and classification targets.

3. **Conditional VAE Oversampling**: Rebalance binary classification datasets by learning minority-class patterns and appending synthetic minority records.

## Key Features

- **Differential privacy** guarantees for sensitive data synthesis
- **Class imbalance handling**: CVAE-based oversampling for minority classes in binary targets
- **Flexible column types**: Normal, Uniform, Integer, Categorical, Date, Faker providers
- **Correlated variables**: Create dependent columns with custom correlation weights
- **Binary classification targets**: Generate 0/1 labels with percentile-based thresholds
- **Numeric regression targets**: Create continuous targets with optional range rescaling
- **Categorical variable handling**: Automatic encoding and normalization for correlations
- **Reproducibility**: Seed-based generation ensures consistent outputs
