Insights from the Intersection: Applying Data Science Thinking to Materials

I’ve spent the summer working at Citrine fresh out of an undergraduate degree where I studied both Materials Science and Computer Science at Stanford. Though I thoroughly enjoyed studying both fields, I found limited opportunities to apply the two together until beginning work here. While companies in entertainment and shopping have reaped the benefits of massive data sets, many fields in the scientific community, notably materials science, have remained largely separate from data science even as they amass huge quantities of data. Working with materials data at Citrine has made me reflect on differences between how data scientists and materials scientists can perceive data in different ways, and how insights from data science can benefit materials research.

Continue reading…

Data Highlight: Elastic Constants for Single-Crystal Oxides

Many thanks go out to David Teter, Ph.D., of Teter Engineering, for contributing a great data set of elastic constants for some single-crystal oxides. 

Anisotropic single crystal materials have direction-dependent physical properties such as thermal expansion or elasticity that can’t adequately be expressed by scalar values. This is why sometimes you will see a matrix of values for a material property on Citrination. Anisotropy is particularly important when a material must endure exposure to extreme forces, temperatures, or a combination of the two. Jet engine components, for example, often require specific engineering of crystal orientations to achieve performance targets. The list of mechanical properties and constants contributed by Dr. Teter is exactly the kid of data that we strive to have readily accessible to the community for efficient materials selection and development. 

Lessons from the lab: ALL data matters

Every day, graduate students in science and engineering fields generate data of varying quality, most of which – especially negative results – are never published. Journal referees and editors are the primary arbiters of what is the most interesting or novel to the research community, and the nature of the peer review and journal acceptance process inevitably leads to the exclusion of some potentially valuable results. Excluding a sometimes-significant portion of results from publications is a detriment to researchers and to research progress because others can’t glean a comprehensive view of all the work done to learn from past mistakes. 

Every chemistry graduate student pursuing a PhD must pass a candidacy exam to be considered a PhD candidate. This is usually in the form of a presentation or written paper that is reviewed by a committee of four or five professors. Passing this exam indicates that the committee has confidence in your abilities and direction to obtain a PhD, leaving you with the task of making a contribution to science over the next two to four years. My candidacy presentation started at 8am on a Monday. I spent an hour and a half being questioned by my five professor committee. I was promptly sent out of the room to allow for deliberation. Twenty long minutes later, I was told that I had conditionally passed my candidacy exam, with an emphasis on conditionally. My committee informed me that the few positive results I had presented were not indicative of two years of work. Years of reading positive results in the literature and seeing post-doctoral researchers successfully pump out fantastic results showed me what was valued. So, I made my presentation with a major focus on the few positive results I had obtained. In subsequent talks with committee members, I learned that they were expecting to see all the negative results that I had generated, and how i had overcome experimental hurdles to obtain my few positive results. Had I included a summary of my negative results as well, it could have been a very different exam. This experience changed my perception of the importance of negative results, and the process by which you learn from them in pursuing positive results. This notion became even more apparent in the lab when I took on a project that required reproducing a previously published work from our research group. 

Reproducing results from past scientific publications is a common starting point for many research projects. It provides a basis for comparison and often validates a material or process for further application in the project. As a graduate student in chemistry, I started a solar to fuel conversion project by trying to reproduce a seminal paper published in our own research group a decade earlier. The fabrication method involved a number of steps that would produce uniquely shaped silver nanowires in an array that held promise as a light harvesting material. My professors remembered the process as being robust, with straightforward methods that should take only a week or two to reproduce and extrapolate to other materials. My initial attempts to utilize the process to produce the structures were unsuccessful. With two undergraduates working with me, we spend months changing half a dozen experimental parameters, purchasing fresh precursor materials, and still were not able to reliably obtain the structures. Even speaking with the first author over the phone didn’t solve the problem, he said he didn’t remember it being particularly difficult and that there wasn’t any trick to consistently produce the structures. What was made clear was the hundreds of samples produced over the course of perfecting the process prior to publishing the results. Ultimately, the answer turned out to be a longer aging step than reported in the publication. In the end, reproducing the work cost hundreds of lab hours and thousands of dollars in microscopy characterization time. The biggest cause of this was figuring out how each parameter in the process explicitly impacted the structures. If only I could have seen the data from the hundreds of samples analyzed in producing the original work, then I might have gleaned some valuable insights into what variables to modify. 

All data matters: they are an essential part of the research process and should be accessible to anyone viewing a published work. In the early days, scientists and engineers came together to dispute and validate claims made by others in the field. Today, the digital revolution in data makes it easier than ever to communicate, organize, and access data. This leaves perception as the biggest barrier to change. Here at Citrine, we care deeply about transparency through research data, and provide a platform to store, organize, and access all the results generated in producing great research. 

Data Highlight: Plasmon-Enhanced Upconversion

Many thanks go out to Diane Wu, a PhD candidate in Stanford’s Department of Chemistry, for contributing a data set from her recent review paper on Plasmon-Enhanced Upconversion

Upconversion is becoming a more commonly used method to improve light absorption in photovoltaic/photocatalytic systems as well as for background-free bioimaging. The concept of upconversion involves converting lower energy photons to higher energies using paired absorber and emitter materials. This is particularly valuable when the efficiency of photovoltaics or the utilization of light within biological samples can be improved by upconverting two lower energy photons into one higher energy photo. Improving the efficiency of this process has led to numerous unique methods including plasmonic enhancement of the emitter via nano-structured gold or silver. Diane We’s review takes a very multidisciplinary area of materials research and comprehensively surveys the methods of improving upconversion efficiency and the current state of the art. Contributing the results of her extensive survey to Citrination benefits all those working in the field as well as those selecting optimal materials for device and imaging applications. 

Data Highlight: Temperature Programmed Desorption Data

Many thanks go out to Josh Buffon, from UCSB’s Department of Chemistry and Biochemistry, for contributing some temperature programmed desorption (TPD) data from his model catalyst research. 

This week’s user data highlight illustrates the increasing variety of data that users continue to submit to Citrination on a regular basis. Josh Buffon’s TPD data shows certain mass fractions of a model catalytic reaction as a function of temperature studied under ultra-high vacuum in a custom-built characterization setup. Having diverse data types such as these TPD results adds to the growing Citrination community and to the discussions that spark when such model studies are performed and shared. 

Data Highlight: Hydrotalcite Heterogeneous Catalyst

Many thanks go out to Jacob Barrett, from UCSB’s Department of Chemistry and Biochemistry, for contributing some recent powder X-ray Diffraction patterns

Heterogeneous catalysts have continued to be one of the most applicable and diverse areas of materials science at an industrial scale. From hydrocarbon cracking using zeolites to noble metal catalysts in your car’s exhaust system, the development of more effective catalysts continues to be an active are of research. Often times, earth-abundant minerals serve as the inspiration for heterogeneous catalysts given their relative abundance and low cost. This week’s data highlight shows a great example of a hydrotalcite and doped hydrotalcite heterogeneous catalyst synthesized in the Ford Group at UCSB for use in biomass conversion to valuable chemical feedstocks. 

Data Highlight: Semiconductor bandgap, conduction band, and valence band values

Many thanks go out to Nirala Singh, from UCSB’s Department of Chemical Engineering, for contributing a curated semiconductor band gap data set

Aqueous electrochemistry and photo-electrochemistry have become increasingly important areas of research for groups pursuing water splitting or solar-to-fuel conversion. The energy storage theme continues this week with valuable oxide, sulfide, and phosphide semiconductor materials data curated by Nirala Singh and others in the McFarland group at UCSB. This particular dataset highlights the conduction and valence band levels reported throughout the literature vs. vacuum as well as vs. normal hydrogen electrode (NHE) at neutral pH. Exemplary group-wide efforts to generate data sets such as these truly benefit the greater research community on Citrination, saving others time from searching for materials data throughout literature. 

Data Highlight: Lithium ion battery electrode materials

Energy storage in the form of Li-ion batteries has become commonplace in our electronic devices and is now enabling transportation, being found in dozens of different car models. Research groups around the world are pushing the materials used for these batteries to improve capacity and cyclability. With hundreds of publications showcasing the latest electrode material performance improvements, it can be difficult to assess the state of these materials from performance and natural resource perspectives. The recent work of Leila Ghadbeigi, Jaye Harada, Bethany Lettier, and Professor Taylor Sparks, published in Energy & Environmental Science earlier this year, analyzes over 16,000 data points from 200+ publications to shed light on cutting edge data-driven battery material design. The authors have made this extensive dataset available to the public, and Jaye Harada has worked with the Citrine team to make these data readily searchable on the Citrination platform. Check it out



Data Highlight: Enabling Organic Electronics

Organic electronic devices are being found in more and more consumer devices, from OLEDs (organic light emitting diodes) in smartphones, to OPVs (organic photovoltaics) for future flexible and inexpensive solar panels. The organic semiconductors that drive these devices are an active area of research, one that Citrine has started to support with new datasets such as the recently uploaded 1-D diffusion length and bandgap sets below. 

Organic Seminconductor Exciton Diffusion Lengths and Diffusion Coefficients 

Organic Semiconductor HOMO/LUMO/Transition State/Bandgap and Photoluminescence/Phosphorescence Lifetime Data Set 

Many thanks go out to Alex Mikhnenko, a Post-Doc in the Nguyen Group in UC Santa Barbara’s Department of Chemistry and Biochemistry, for contributing both of these fantastic datasets.