This post demonstrates the application of models trained on Citrination for optimal experimental design.  Using machine learning to identify experiments with the greatest likelihood of improving the objective reduces the number of required experiments to find high ZT materials by 3x.

Optimal experimental design is a class of procedures for designing experimental sequences that are maximally effective by some metric.  In materials informatics, we are interested in minimizing the number of experiments needed to find materials with extreme values of a property or combination of properties.  We can simulate optimal experimental design of the search for thermoelectric materials using an existing thermoelectric dataset taken from Gaultois et al [1].  This dataset contains descriptive properties, like chemical formula and temperature, as well as measured ones, like Seebeck coefficient and thermoelectric figure of merit, i.e. ZT. You can find the dataset on Citrination here.

To simulate an experimental search, we hide the measurements from the dataset, leaving only the descriptions of the materials, e.g. “n-type polycrystalline Mg2Si”.  When we perform an “experiment” on a material, we simply reveal those measurements and add the corresponding entries to the training data.  In this case, optimal experimental design is used to find the material with the greatest ZT in the fewest number of experiments.

Our baseline is guessing: if we pick randomly from the unknown materials, we expect to need to experiment on half of them before we find the highest ZT.  In this case, there are 176 materials, so that’s 88 guesses.

Instead of guessing, we can use Citrination to build a model for ZT based on whichever materials we’ve already performed experiments on.  In the beginning, we have no data so we’ll guess materials randomly until we have enough to build a model (12 in this case).  Once we have a model, we can use different strategies for selecting the next experiment.  Here, we consider two common ones:

  • maximum expected improvement (MEI)
  • maximum likelihood of improvement (MLI)

For MEI, we select the unknown material with the highest predicted expected ZT.  This makes a lot of sense: we’re looking for high ZT so we want to learn more about a material that is expected to have a high one. For MLI, we select the unknown material with the maximum likelihood of improvement over the largest ZT in our training set. This gives preference to materials with high uncertainty because we are less confident that their ZT values are small.  Whichever material is selected, we reveal its value of ZT and add it to the training set for the next iteration.  The experimental sequence ends when we measure the material with the highest value of ZT, which is Bi2Te3 in this dataset.

Both the MEI and MLI techniques are significantly better than guessing, finding the high ZT material Bi2Te3 more than twice as quickly.  MLI outperforms MEI, indicating that the uncertainty estimates are useful for identifying experiments that provide the most information.  The improvement with MLI is nearly 3x.

If you would like to request a topic or demonstration, please feel free to contact us!


  • The models used here are “standard” Citrination models: the chemical formula is enriched with means of elemental properties and “ZT” is predicted via a random forest with linear leaves.
  • The reported number of iterations are sample means and standard deviations over a set of 32 random initial 12 material seeds.  The starting 12 materials are the same for the MLI and MEI experiments.
  • The semiconductor type, i.e. n or p, can be inferred by the sign of the Seebeck coefficient
  • Some materials have multiple entries with different crystallinity or type.  Therefore, we group the training data by its chemical formula, of which there are 176 values.
[1] Gaultois, Michael W., et al. “Data-driven review of thermoelectric materials: performance and resource considerations.” Chemistry of Materials 25.15 (2013): 2911-2920.