logo

pySIPFENN

GitHub top language PyPI - Python Version GitHub license PyPI PyPI

GitHub last commit (by committer) GitHub Release Date - Published_At GitHub issues GitHub commits since previous GitHub commits since last

Build Status Linux Status Mac M1 Status Mac M1 Status Windows Status Coverage Status

2022 Paper DOI Zenodo DOI 2024 Paper DOI

pySIPFENN or py thon toolset for S tructure- I nformed P roperty and F eature E ngineering with N eural N etworks implements a numer of researcher-friendly tools for:

  • Calculating different vector representations of atomic structures for a number of applications including supervised (e.g., predictive machine learning models) and unsupervised learning (e.g., clustering of atomic structures based on similarity or performing anomaly detection). Notably, utilize crystallographic information and some other techniques to make this process very efficient for the vast majority of use cases (see arXiv:2404.02849).

  • Efficient deployment of pre-trained ML models (not limited to neural networks) obtained from repositories like Zenodo (including some we trained) or trained locally on user’s machine. The system is very plug-and-play thanks to using Open Neural Network Exchange (ONNX) format which can be exported from nearly any machine learning framework.

  • Tuning pre-trained ML models to new domains, like new chemical compositions, different ab initio functional, or entirely new properties. Since V0.16, users can take advantage of integration with OPTIMADE API which allows one to tune models based on DFT datasets like Materials Project, OQMD, AFLOW, or NIST-JARVIS, in just 3 lines of code specifying which provider to use, what to query for, and hyperparameters for tuning.

The underlying methodology, efficiency optimizations, design choices, and implementation specifics are given in the following publications:

  • Adam M. Krajewski, Jonathan W. Siegel, Zi-Kui Liu, Efficient Structure-Informed Featurization and Property Prediction of Ordered, Dilute, and Random Atomic Structures, April 2024, arXiv:2404.02849

  • Adam M. Krajewski, Jonathan W. Siegel, Jinchao Xu, Zi-Kui Liu, Extensible Structure-Informed Prediction of Formation Energy with improved accuracy and usability employing neural networks, Computational Materials Science, Volume 208, 2022, 111254, DOI: 10.1016/j.commatsci.2022.111254

The source code and developement discussions are available in the GitHub repository at (git.pysipfenn.org). You may also consider visiting our Phases Research Lab website at (phaseslab.org).

News

  • (v0.16.0) Three exciting news! (1) The all new `ModelAdjusters submodule automates tuning and can fetch data directly from OPTIMADE API (https://www.optimade.org); (2) A new manuscript detailing advantages of our featurization tools has been put on arXiv:2404.02849; and (3) the name of the software was updated to python toolset for Structure-Informed Property and Feature Engineering with Neural Networks` to retain the pySIPFENN acronym but better reflect our strengths and development direction.

  • (v0.15.0) A new descriptor (feature vector) calculator descriptorDefinitions.KS2022_randomSolutions has been implemented. It is used for structure informed featurization of compositions randomly occupying a lattice, spiritually similar to SQS generation, but also taking into account (1) chemical differences between elements and (2) structural effects. A full description will be given in the upcoming manuscript.

  • (v0.14.0) Users can now take advantage of a Prototype Library to obtain common structures from any Calculator instance c with a simple c.prototypeLibrary['BCC']['structure']. It can be easily updated or appended with high-level API or by manually modifyig its YAML here.

  • (v0.13.0) Model exports (and more!) to PyTorch, CoreML, and ONNX are now effortless thanks to core.modelExporters module. Please note you need to install pySIPFENN with dev option (e.g., pip install "pysipfenn[dev]") to use it. See docs here.

  • (v0.12.2) Swith to LGPLv3 allowing for integration with proprietary software developed by CALPHAD community, while supporting the development of new pySIPFENN features for all. Many thanks to our colleagues from GTT-Technologies and other participants of CALPHAD 2023 <https://calphad.org/calphad-2023>`__ for fruitful discussions.

  • (March 2023 Workshop) We would like to thank all of our amazing attendees for making our workshop, co-organized with the Materials Genome Foundation, such a success! Over 100 of you simultaneously followed all exercises and, at the peak, we loaded over 1,200GB of models into the HPC’s RAM.

Main Schematic

The figure below is the main schematic of pySIPFENN framework detailing the interplay of internal components. The user interface provides a high-level API to process structural data within core.Calculator, pass it to featurization submodules in descriptorDefinitions to obtain vector representation, then passed to models defined in models.json and (typically) run automatically through all available models. All internal data of core.Calculator is accessible directly, enabling rapid customization. An auxiliary high-level API enables advanced users to operate and retrain the models.

mainSchematic

Index

Applications

pySIPFENN is a very flexible tool that can, in principle, be used for the prediction of any property of interest that depends on an atomic configuration with very few modifications. The models shipped by default are trained to predict formation energy because that is what our research group is interested in; however, if one wanted to predict Poisson’s ratio and trained a model based on the same features, adding it would take minutes. Simply add the model in open ONNX format and link it using the models.json file, as described in the documentation.

Real-World Examples

In our line of work, pySIPFENN and the formation energies it predicts are usually used as a computational engine that generates proto-data for creation of thermodynamic databases (TDBs) using ESPEI (https://espei.org). The TDBs are then used through pycalphad (https://pycalphad.org) to predict phase diagrams and other thermodynamic properties.

Another of its uses in our research is guiding the Density of Functional Theory (DFT) calculations as a low-cost screening tool. Their efficient conjunction then drives experiments leading to the discovery of new materials, as presented in these two papers:

  • Sanghyeok Im, Shun-Li Shang, Nathan D. Smith, Adam M. Krajewski, Timothy Lichtenstein, Hui Sun, Brandon J. Bocklund, Zi-Kui Liu, Hojong Kim, Thermodynamic properties of the Nd-Bi system via emf measurements, DFT calculations, machine learning, and CALPHAD modeling, Acta Materialia, Volume 223, 2022, 117448, https://doi.org/10.1016/j.actamat.2021.117448.

  • Shun-Li Shang, Hui Sun, Bo Pan, Yi Wang, Adam M. Krajewski, Mihaela Banu, Jingjing Li & Zi-Kui Liu, Forming mechanism of equilibrium and non-equilibrium metallurgical phases in dissimilar aluminum/steel (Al–Fe) joints. Nature Scientific Reports 11, 24251 (2021). https://doi.org/10.1038/s41598-021-03578-0