# How does it work?

This file provides a very high-level explanation of:
- the algorithm that generates counterfactual explanations.
- the algorithm that performs outcome optimization.

## Definitions

**OO** is short for _Outcome optimization_.

**CF** is short for _Counterfactuals_ (or _Counterfactual explanations_).

**EMU** is the historical name of this engine, it's short for _Exploiting Model
Unconsciously_.

**Reference** designates the point from which the counterfactuals must be close.


## Handlers

A handler is attached to exactly one feature of the dataset. Its purpose is to:
1. Generate useful random values for this feature.
2. Measure the distance between two values for this feature.

Some handlers work with numerical features. Some handlers work with categorical
features. One handler exists for frozen features (features for which we should
only generate the reference value).

Handlers must usually be fitted. For instance, some handlers need to learn the
distribution of the feature before generating values.

## Algorithm for counterfactual explanations

1. Fit the handlers
2. Define two small hyperspheres centered on the reference point
3. Generate some points between these two hyperspheres using the handlers
4. Increase the radii of the hyperspheres and repeat step 3 until we find some
   counterfactuals, ie. points for which the pred differs from the reference's.
5. Once some counterfactuals are found for a given min radius and max radius,
   try to find more valid counterfactuals between these two radii
6. At this point of the algorithm, some counterfactuals were found. Use the
   reducers to improve the counterfactuals. (eg. try to simplify the
   counterfactuals or to bring them closer to the reference)
7. Choose which of the counterfactuals should be returned to the user
    - One strategy consists of simply returning the ones for which the distance
      to the reference is minimal
    - Another strategy consists in performing clustering and returning one
      counterfactual from each of the clusters to ensure diversity

## Algorithm for outcome optimization

### Without diversity

#### Principle

It's a very simple genetic algorithm. The handlers are used to generate the new
candidate populations. The top individuals that are selected for each iteration
are simply the ones with the lowest loss function.

#### Pseudocode

```
population = init_population_with_random_samples_from_dataset()

for 0..n_iterations
   population = concat(
      population,
      generate_values_with_perturbations_of_existing_population(population),
      generate_random_uniform_values()
   )
   population = keep_only_top_individuals(population)

return population
```

### With diversity

#### Principle

To make sure that the final optimized population is diverse, a mechanism is
added to make sure that the search for local optima continues even if the
population already contains a global optimum.

Each sample belongs to a certain genus.
- When a new sample is generated by a perturbation of another sample, both
  samples will share the same genus.
- When a new sample is generated ex-nihilo, it won't share any other sample's
  genus.

When we select the samples to keep in the next generation, we take their genus
into account:
- If, for a given genus, the new generation didn't find a new sample that
  outperforms every other sample sharing the same genus, then, we consider that
  the genus has reached its optimum, so we remove all corresponding samples 
  from the population.
- If we found a new optimum for a given genus, then we ensure the survival of
  some of its samples, even if there are better samples that don't share the
  same genus.
- When a genus becomes extinct, its best sample is probably a local optimum, so
  it's saved in a "pantheon".

After a certain number of iterations, the pantheon will contain the values that
must be returned by the algorithm. If the pantheon contains too many values, a
KMeans with k=number_of_samples_to_return is performed to make sure that the
final samples are diverse.

_NB: The best sample will always be returned, regardless of its cluster._

#### Pseudocode

```
population = init_population_with_random_samples_from_dataset()
pantheon = [ ]

for 0..n_iterations
   population = concat(
      population,
      generate_values_with_perturbations_of_existing_population(population),
      generate_random_uniform_values()
   )
   population, best_samples_from_newly_extinct_genuses = select(population)
   pantheon = concat(pantheon, best_samples_from_newly_extinct_genuses)

clusters = kmeans_clustering(pantheon)
return get_best_individual_from_each_cluster(clusters)
```
