Selecting Clusters

A cluster expansion model is completely defined by i) a set of clusters, and ii) a set of effective cluster interactions.

In this section, we will concentrate on techniques aimed at determining the optimal set of clusters of item i).

This can be a challenging task, since two conflicting goals must be fulfilled, namely, fit well the training data and provide accurate predictions for new data.

While a very good fit to training data can almost invariably be achieved by adding a large number of clusters to our model, this will generally yield poor predictions for new data, i.e. data that was not present in the training set. This is called overfitting.

The techniques that we will explain in this section are aimed at avoiding overfitting.

The module from CELL doing the tasks in this section is the Clusters Selector.

Assigning the substitutions

List of all methods