Deep Learning on Microstructural Images

It’s well known that microstructure plays a key role in determining material properties.  One common way of assessing material microstructure is via Scanning Electron Microscopy (SEM) images.  On Citrination, we have the capability to use these microstructural images as inputs to our data-driven models.  

We have developed customized deep learning techniques to automatically detect which textures are present in the images.  Those textures can then be used as inputs to machine learning models to label the microstructure and predict material properties.  This framework is shown schematically in the figure below.  

This schematic illustrates the deep learning framework for featurizing SEM images. The SEM image on the left shows steel with pearlite microstructure. That image is transformed through deep learning into a vector of textures. A machine learning model is then able to correctly label the microstructure of this image with high confidence.

A tutorial video of how SEM images can be ingested onto the platform and used to build models is available here.  This tutorial used data from the Ultra High Carbon Steel Database 1, which is accessible here on the public Citrination platform, complete with deep learning texture vectors.  

This capability is an example of how Citrine’s platform provides cutting edge artificial intelligence solutions specialized for materials science use cases.

Dr. Julia Ling, Principal Scientist at Citrine

1 DeCost, Brian L., et al. “UHCSDB: UltraHigh Carbon Steel Micrograph DataBase.” Integrating Materials and Manufacturing Innovation (2017): 1-9.

Mengfei Yuan Shares Her Researcher in Residence Experience

Working as a researcher-in-residence during these three weeks, I developed my research interests combining with Citrination tool. This study seeks to using machine learning algorithm through Citrination to establish a “fast-acting reduced-order crystal plasticity model” for polycrystalline material. The system is usually underdetermined when adding more fitting parameters than experiments used to calibrate the model.  At the preliminary design stage, it’s important to have an early approximation of mechanical properties and microstructure informatics based on the experimental stress-strain data and EBSD results of samples. The final texture and optimal crystal plasticity model with certain initial/boundary/loading condition can be predicted through Citrination tool through leaning the relationship between the microscale texture deformation development and physical properties. The overreaching goal of this study can be extended to the “data-driven material design tool” used for designing expected microstructures with desired crystal plasticity properties depending on given initial texture, processing techniques and conditions.

Also, I learned how to design training processes and evaluate the quality of data on Citrnation platform. Based on the machine learning results, I need to adjust my dataset for better predictions and analyze the theoretical issues might existed in my training dataset. Also, I did some exercise based on “learn-citrinaition”, such as writing PIF from computational calculations, experimental designing for optimization problem, batched properties prediction using queries, etc.

Mengfei Yuan, Ohio State University

Mengfei Yuan joined the Citrine Research team over the summer as part of the Researcher in Residence program. 

Learn Citrination to generate a useful data analysis

In this Learn Citrination tutorial, we’re going to learn to use Citrination to generate a useful data analysis called t-SNE. This data visualization technique enables you to represent a high dimensional set of data in fewer dimensions in a way that preserves the local structure of the data. In materials informatics, this allows you to create a two-dimensional plot of a set of materials where points corresponding to similar materials are grouped together in two-dimensional space. More information on t-SNE here.

This tutorial will teach you to create and export a two-dimensional t-SNE plot for any data on Citrination. The first step is to create a data view on the Citrination. Instructions for creating a data view can be found in this tutorial.

We’ll be using this data view: (view id 787) for this tutorial, which includes a model predicting experimental band gaps based on data compiled by W.H. Strehlow and E.L. Cook, which can be viewed in this dataset.

See the full tutorial notebook with step-by-step instructions here.

– E Antono, Citrine Research

Cutting Edge Uncertainty Quantification for Data-Driven Materials Models

In many applications of machine learning, the machine learning model accuracy is the most important consideration, and knowing the uncertainty of those predictions is not critical.  For example, for a clothing recommendation engine, it is important that on average it suggests clothes that a customer would like to buy.  It is acceptable for it to occasionally recommend an article of clothing that a customer dislikes, as long as its average performance is high.

At Citrine, we recognize that building accurate models for materials properties is not enough.

In order for data-driven models to be useful in materials science applications, it is critical to have a reliable estimate of model uncertainty reported with every prediction.

For example, say that we have trained a model to predict band gap based on the Strehlow and Cook experimental dataset.  We want to make predictions for the band gaps of a couple new compounds, tin monoxide (SnO) and nickel oxide (NiO).  Our model predicts values of 2.4 eV and 2.8 eV respectively. The key question is, “How confident is our model in these predictions?”

There are many different sources of uncertainty in data-driven models.  If the model was fit to noisy training data, then that noise will cause uncertainty in the model.  If the model is fit to only a small number of data points, it will also have higher uncertainty.  Another important source of uncertainty is extrapolation.  For example, if we trained a model on the blue dots in the figure to the right, then tried to make a prediction at the red X, our prediction would have high uncertainty.  Similarly, data-driven models are unreliable at making predictions on materials that are significantly different from any of the materials in the training set.

At Citrine, all our predictions come with uncertainty estimates.  We have developed, implemented, and validated cutting edge uncertainty quantification methods for data-driven materials models.  For more details on our uncertainty quantification techniques and how they can be used to accelerate materials design, please see our recent paper.1

In the cases of SnO and NiO, our predictions are shown below.


These plots show the probability distribution function for our prediction.  For example, in the case of SnO, the mean value of the distribution is 2.45 eV and the uncertainty of 0.78 eV is based on the spread of the distribution at one standard deviation. Since the uncertainty estimates are based on the standard deviation of the distribution, they are a 68% confidence interval, i.e. the probability that the true value is within 0.75 eV of the prediction (2.45 eV) is 68%.

The model uncertainty for NiO (1.41 eV) is much higher than for SnO (0.78 eV), in part because the training set included far fewer compounds containing nickel than tin.  The higher uncertainty in the NiO predictions reflects the fact that the model is extrapolating at this point.  The true band gap for SnO is approximately 2.5 eV and for NiO is approximately 3.8 eV.2

At Citrine, we know that uncertainty estimates are critical for assessing model confidence when using data-driven models for real engineering applications.  We are proud to be leading the field by providing well-calibrated uncertainty estimates for all our predictions.3

J Ling, Citrine Research

  1.  Ling, Julia, et al. “High-Dimensional Materials and Process Optimization using Data-driven Experimental Design with Well-Calibrated Uncertainty Estimates.” Integrating Materials and Manufacturing Innovation (2017).
  2. Wong, Terence KS, et al. “Current status and future prospects of copper oxide heterojunction solar cells.” Materials 9.4 (2016): 271.
  3. This work was funded in part by Argonne National Laboratories through contract 6F-31341, associated with the R2R Manufacturing Consortium funded by the Department of Energy Advanced Manufacturing Office.

Citrine Research Spotlight: Vanessa Meschke, University of Wisconsin Madison

The Citrine Research Spotlight Series profiles members of our academic and research community on their research interests, methodologies, and beyond.

This week we profile Vanessa Meschke, one of our 2017 NextGen Fellows. Vanessa is a rising senior at the University of Wisconsin Madison. She is majoring in Materials Science and Engineering, with a minor in Computer Science.  Under the guidance of Professor Dane Morgan, Vanessa is working on predicting metallic alloys’ ability to form bulk glasses through data mining and machine learning and implementing a clustering algorithm with Citrination.

Learn more about Vanessa and her impressive undergraduate research career thus far below.

How did you get into your current field of undergraduate research?
As a junior in high school, I attended Michigan Tech’s Women in Engineering summer program, which offered an overview of what different engineers do and what it’s like to be a woman in a STEM field. The presentations from the week I remember most were a hands-on experiment with shape-memory alloys and an introduction to some different types of microscopes by the MS&E department. I loved the combination of chemistry and physics this field offers, and coming to UW-Madison has only enhanced this!

What are some of the larger themes and questions that inform your work?
I take a lot of motivation for my work from NASA’s vision: “reach[ing] for new heights and reveal[ing] the unknown for the benefit of humankind.” Materials science is capable of taking people higher and bettering the world, whether it be through new materials discovery or advancing processing methods. Marrying computer science with materials research is a way to enhance and expedite these changes. You never know when someone is going to discover the material that makes solar panels reach near-perfect efficiency or a bulk metallic glass that makes medical implants safer and stronger, and I want to be a part of that benefit.

How have you developed your own methodology?
I’ve just tried to find ways to keep myself excited about my work while making progress on tasks for research or schoolwork. Prof. Morgan has been particularly helpful in this; he has well-defined results he wants to see and ideas to circumnavigate major obstacles you may encounter along the way, but he makes sure to let my team try things our own way and learn from our mistakes.

Generally, what are some of the challenges you face, data related or otherwise?

My biggest challenges are overcoming my frustrations with programming. For research, I work mainly in Python, so a lot of my time is then spent looking up syntax and debugging programs.

What have been the most surprising or interesting findings in your past work or current work?

One of the most surprising pieces of information I’ve found during my current work is how important concrete is. My current project at the Skunkworks lab is predicting the mechanical properties of concrete, so I’ve done a fair amount of reading on what makes up concrete, how people measure its properties, and some of the concerns to keep in mind when designing a concrete mix. In cities, it’s easy to see how often concrete is used, but I had no idea how much work went into designing concrete before this project!

What do you enjoy the most about the work that you do?
My favorite part of my work is combining concepts I’ve learned in my different areas of study. I love being able to use an algorithm I’ve learned in a computer science class along with knowledge from my materials coursework to understand what kinds of features could be good inputs for a model. The same idea can be applied to analyzing a model’s output. It’s such a neat combination of two topics I learn about in the classroom!

What do you enjoy doing outside of undergraduate studies and research?
When I’m not working on schoolwork or research, I spend my time volunteering with the UW-Madison chapter of Circle K and participating in events sponsored by the Materials Advantage student organization. I also love baking, reading, and I’m currently training for a half marathon!


What has been your favorite part about participating in the Citrine NextGen Fellowship?
My favorite part of the NextGen Fellowship has been recognizing how materials science and machine learning can be combined in a meaningful way at both a business and academic level. Sometimes it’s hard to recognize areas where these two topics can combine since many people don’t understand what machine learning is or how it can be useful to them. NextGen has shown me a real-world example of a successful application of machine learning in an engineering setting.

My Internship Experience at Citrine

For a materials science student with a background in computer science, finding somewhere to combine work in these two fields is already an uncommon opportunity. Applying machine learning and data mining techniques to answer questions in materials science has been an opportunity that is really unique to Citrine. The projects that I’ve worked on have paired materials intuition with an understanding of data science and machine learning to build models for various material properties. This goes hand in hand with learning how to visualize and communicate machine learning results to people in the materials science field who can be unfamiliar with machine learning concepts. In addition to various modeling projects, I’ve also been able to contribute to the continued development of our in-house machine learning infrastructure. Working as an intern in a small, fast-moving team has been especially rewarding as projects I’ve worked on actually see the light of day. Work that I’ve done has been delivered to customers, and demo’d to investors and potential customers. It has been an honor and a rush to be able to work at Citrine as we try to fundamentally change how data is used in the field of materials science.