pySIPFENN Core
pySIPFENN
- class Calculator(autoLoad=True, verbose=True)[source]
Bases:
objectpySIPFENN Calculator automatically initializes all functionalities including identification and loading of all available models defined statically in the
models.jsonfile. It exposes methods for calculating predefined structure-informed descriptors (feature vectors) and predicting properties using models that utilize them.- Parameters:
autoLoad (
bool) – Automatically load all available ML models based on themodels.jsonfile. This will require significant memory and time if they are available, so for featurization and other non-model-requiring tasks, it is recommended to set this toFalse. Defaults toTrue.verbose (
bool) – Print initialization messages and several other non-critical messages during runtime procedures. Defaults to True.
- models[source]
Dictionary with all model information based on the
models.jsonfile in the modelsSIPFENN directory. The keys are the network names and the values are dictionaries with the model information.
- loadedModels[source]
Dictionary with all loaded models. The keys are the network names and the values are the loaded pytorch models.
- descriptorData[source]
List of all descriptor data created during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input. The order of the list of descriptor data for each structure corresponds to the order of networks in the toRun list.
- predictions[source]
List of all predictions created during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input. The order of the list of predictions for each structure corresponds to the order of networks in the toRun list.
- inputFiles[source]
List of all input file names used during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input.
- appendPrototypeLibrary(customPath)[source]
Parses a custom prototype library YAML file and permanently appends it into the internal prototypeLibrary of the pySIPFENN package. They will be persisted for future use and, by default, they will be loaded automatically when instantiating the
Calculatorobject, similar to your custom models.- Parameters:
customPath (
str) – Path to the prototype library YAML file to be appended to the internalself.prototypeLibraryof theCalculatorobject.- Return type:
None- Returns:
None
- calculate_KS2022(structList, mode='serial', max_workers=8)[source]
Calculates
KS2022descriptors for a list of structures. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in the descriptorData attribute. The function returns the list of descriptors as well.- Parameters:
structList (
List[Structure]) – List of structures to calculate descriptors for. The structures must be initialized with the pymatgenStructureclass.mode (
str) – Mode of calculation. Defaults to ‘serial’. Options are'serial'and'parallel'.max_workers (
int) – Number of workers to use in parallel mode. Defaults to8. IfNone, the number of workers will be set to the number of available CPU cores. If set to0, 1 worker will be used.
- Return type:
list- Returns:
List of
KS2022descriptor (feature vector) for each structure.
- calculate_KS2022_dilute(structList, baseStruct='pure', mode='serial', max_workers=8)[source]
Calculates
KS2022descriptors for a list of dilute structures (either based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to theKS2022descriptor, which is more general and can be used on any structure. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in theself.descriptorDataattribute. The function returns the list of descriptors as well.- Parameters:
structList (
List[Structure]) – List of structures to calculate descriptors for. The structures must be dilute structures (either based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. The structures must be initialized with the pymatgenStructureclass.baseStruct (
Union[str,List[Structure]]) – Non-diluted references for the dilute structures. Defaults to'pure', which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).mode (
str) – Mode of calculation. Defaults to'serial'. Options are'serial'and'parallel'.max_workers (
int) – Number of workers to use in parallel mode. Defaults to8. IfNone, the number of workers will be set to the number of available CPU cores. If set to0, 1 worker will be used.
- Return type:
List[ndarray]- Returns:
List of
KS2022descriptor (feature vector)np.ndarrayfor each structure.
- calculate_KS2022_randomSolutions(baseStructList, compList, minimumSitesPerExpansion=50, featureConvergenceCriterion=0.005, compositionConvergenceCriterion=0.01, minimumElementOccurrences=10, plotParameters=False, printProgress=False, mode='serial', max_workers=8)[source]
Calculates
KS2022descriptors corresponding to random solid solutions occupying base structure / lattice sites for a list of compositions through method described indescriptorDefinitions.KS2022_randomSolutionssubmodule. The results are stored in the descriptorData attribute. The function returns the list of descriptors in numpy format as well.- Parameters:
baseStructList (
Union[str,Structure,List[str],List[Structure],List[Union[Composition,str]]]) – The base structure to generate a random solid solution (RSS). It does _not_ need to be a simple Bravis lattice, such as BCC lattice, but can be anyStructureobject or a list of them, if you need to define them on per-case basis. In addition to Structure objects, you can use “magic” strings corresponding to one of the structures in the library you can find underpysipfenn.miscdirectory or loaded underself.prototypeLibraryattribute. The magic strings include, but are not limited to:'BCC','FCC','HCP','DHCP','Diamond', and so on. You can invoke them by their name, e.g.BCC, or by passingself.prototypeLibrary['BCC']['structure']directly. If you pass a list tobaseStruct, you are allowed to mix-and-matchStructureobjects and magic strings.compList (
Union[str,List[str],Composition,List[Composition],List[Union[Composition,str]]]) – The composition to populate the supercell with until KS2022 descriptor converges. You can use pymatgen’sCompositionobjects or strings of valid chemical formulas (symbol - atomic fraction pairs), like'Fe0.5Ni0.3Cr0.2','Fe50 Ni30 Cr20', or'Fe5 Ni3 Cr2'. You can either pass a single entity, in which case it will be used for all structures (use to run the same composition for different base structures), or a list of entities, in which case pairs will be used in the order of the list. If you pass a list tocompList, you are allowed to mix-and-matchCompositionobjects and composition strings.minimumSitesPerExpansion (
int) – The minimum number of sites that the base structure will be expanded to (doubling dimension-by-dimension) before it is used as expansion step/batch in each iteration of adding local chemical environment information to the global ensemble. The optimal value will depend on the number of species and their relative fractions in the composition. Generally, low values (<20ish) will result in a slower convergence, as some extreme local chemical environments will have strong influence on the global ensemble, and too high values (>150ish) will result in a needlessly slow computation for not-complex compositions, as at least two iterations will be processed. The default value is50and works well for simple cases.featureConvergenceCriterion (
float) – The maximum difference between any feature belonging to the current iteration (statistics based on the global ensemble of local chemical environments) and the previous iteration (before last expansion) expressed as a fraction of the maximum value of each feature found in the OQMD database at the time of SIPFENN creation (seeKS2022_randomSolutions.maxFeaturesInOQMDarray). The default value is0.01, corresponding to 1% of the maximum value.compositionConvergenceCriterion (
float) – The maximum average difference between any element fraction belonging to the current composition (net of all expansions) and the target composition (comp). The default value is0.01, corresponding to 1% deviation, which interpretation will depend on the number of elements in the composition.minimumElementOccurrences (
int) – The minimum number of times all elements must occur in the composition before it is considered converged. This setting prevents the algorithm from converging before very dilute elements like C in low-carbon steel, have had a chance to occur. The default value is10.plotParameters (
bool) – If True, the convergence history will be plotted using plotly. The default value isFalse, but tracking them is recommended and will be accessible in the metas attribute of the Calculator under the key'RSS'.printProgress (
bool) – If True, the progress will be printed to the console. The default value is False.mode (
str) – Mode of calculation. Options areserial(default) andparallel.max_workers (
int) – Number of workers to use in parallel mode. Defaults to8.
- Return type:
List[ndarray]- Returns:
A list of
numpy.ndarray``s containing the ``KS2022descriptor, just like the ordinaryKS2022. Please note the stochastic nature of this algorithm. The result will likely vary slightly between runs and parameters, so if convergence is critical, verify it with a test matrix ofminimumSitesPerExpansion,featureConvergenceCriterion, andcompositionConvergenceCriterionvalues.
- calculate_Ward2017(structList, mode='serial', max_workers=4)[source]
Calculates
Ward2017descriptors for a list of structures. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in theself.descriptorDataattribute. The function returns the list of descriptors as well.- Parameters:
structList (
List[Structure]) – List of structures to calculate descriptors for. The structures must be initialized with the pymatgenStructureclass.mode (
str) – Mode of calculation. Defaults to ‘serial’. Options are'serial'and'parallel'.max_workers (
int) – Number of workers to use in parallel mode. Defaults to4. IfNone, the number of workers will be set to the number of available CPU cores. If set to0, 1 worker will be used.
- Return type:
list- Returns:
List of
Ward2017descriptor (feature vector) for each structure.
- destroy()[source]
Deallocates all loaded models and clears all data from the Calculator object.
- Return type:
None
- downloadModels(network='all')[source]
Downloads ONNX models. By default, all available models are downloaded. If a model is already available on disk, it is skipped. If a specific
networkis given, only that network is downloaded, possibly overwriting the existing one. If thenetworkname is not recognized, the message will be printed.- Parameters:
network (
str) – Name of the network to download. Defaults to'all'.- Return type:
None
- findCompatibleModels(descriptor)[source]
Finds all models compatible with a given descriptor based on the descriptor definitions loaded from the
models.jsonfile.- Parameters:
descriptor (
str) – Descriptor to use. Must be one of the available descriptors. Seepysipfenn.descriptorDefinitionsto see available modules or add yours. Available default descriptors are:'Ward2017','KS2022'.- Return type:
List[str]- Returns:
List of strings corresponding to compatible models.
- get_resultDicts()[source]
Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks. The order of the dictionaries is the same as the order of the input structures passed through
runModels()functions.- Return type:
List[dict]- Returns:
List of dictionaries with the predictions.
- get_resultDictsWithNames()[source]
Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks and the names of the input structures. The order of the dictionaries is the same as the order of the input structures passed through
runModels()functions. Note that this function requiresself.inputFilesto be set, which is done automatically when usingrunFromDirectory()orrunFromDirectory_dilute()but not when usingrunModels()orrunModels_dilute(), as the input structures are passed directly to the function and names have to be provided separately by assigning them toself.inputFiles.- Return type:
List[dict]- Returns:
List of dictionaries with the predictions.
- loadModelCustom(networkName, modelName, descriptor, modelDirectory='.')[source]
Load a custom ONNX model from a custom directory specified by the user. The primary use case for this function is to load models that are not included in the package and cannot be placed in the package directory because of write permissions (e.g. on restrictive HPC systems) or storage allocations.
- Parameters:
modelDirectory (
str) – Directory where the model is located. Defaults to the current directory.networkName (
str) – Name of the network. This is the name used to refer to the ONNX network. It has to be unique, not contain any spaces, and correspond to the name of the ONNX file (excluding the.onnxextension).modelName (
str) – Name of the model. This is the name that will be displayed in the model selection menu. It can be any string desired.descriptor (
str) – Descriptor/feature vector used by the model. pySIPFENN currently supports the following descriptors:'KS2022', and'Ward2017'.
- Return type:
None
- loadModels(network='all')[source]
Load model/models into memory of the
Calculatorclass. The models are loaded from themodelsSIPFENNdirectory inside the package. Its location can be seen by callingprint()on theCalculator. The models are stored in theself.loadedModelsattribute as a dictionary with the network string as key and the PyTorch model as value.- Note:
This function only works with models that are stored in the
modelsSIPFENNdirectory inside the package, are in ONNX format, and have corresponding entries inmodels.json. For all others, you will need to useloadModelCustom().
- Parameters:
network (
str) – Default is'all', which loads all models detected as available. Alternatively, a specific model can be loaded by its corresponding key in models.json. E.g.'SIPFENN_Krajewski2020_NN9'or'SIPFENN_Krajewski2022_NN30'. The key is the same as the network argument indownloadModels().- Raises:
ValueError – If the network name is not recognized or if the model is not available in the
modelsSIPFENNdirectory.- Return type:
None- Returns:
None. It updates the loadedModels attribute of the Calculatorclass.
- makePredictions(models, toRun, dataInList)[source]
Makes predictions using PyTorch networks listed in toRun and provided in models dictionary. Shared among all “predict” functions.
- Parameters:
models (
Dict[str,Module]) – Dictionary of models to use. Keys are network names and values are PyTorch models loaded from ONNX withloadModels()/loadModelCustom()or manually (fairly simple!).toRun (
List[str]) – List of networks to run. It must be a subset ofmodels.keys().dataInList (
List[Union[List[float],array]]) – List of data to make predictions for. Each element of the list should be a descriptor accepted by all networks in toRun. Can be a list of lists of floats or a list of numpy ``nd.array``s.
- Return type:
List[list]- Returns:
List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the networks in
toRun.
- parsePrototypeLibrary(customPath='default', verbose=False, printCustomLibrary=False)[source]
Parses the prototype library YAML file in the
miscdirectory, interprets them into pymatgenStructureobjects, and stores them in theself.prototypeLibrarydict attribute of theCalculatorobject. You can use it also to temporarily append a custom prototype library (by providing a path) which will live as long as theCalculator. For permanent changes, useappendPrototypeLibrary().- Parameters:
customPath (
str) – Path to the prototype library YAML file. Defaults to the magic string"default", which loads the default prototype library included in the package in themiscdirectory.verbose (
bool) – If True, it prints the number of prototypes loaded. Defaults toFalse, but note thatCalculatorclass automatically initializes withverbose=True.printCustomLibrary (
bool) – If True, it prints the name and POSCAR of each prototype being added to the prototype library. Has no effect ifcustomPathis'default'. Defaults toFalse.
- Return type:
None- Returns:
None
- runFromDirectory(directory, descriptor, mode='serial', max_workers=4)[source]
Runs all loaded models on a list of Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as
'poscar','cif','json','mcsqs', etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order:'1-Fe','2-Al','10-xx','11-xx','20-xx','21-xx','11111-xx', etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks inself.network_list_available. If a network is not available, it will not be included in the list.- Parameters:
directory (
str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as'poscar','cif','json','mcsqs', etc., or a mix of these. The structures are automatically sorted as described above.descriptor (
str) – Descriptor to use. Must be one of the available descriptors. Seepysipgenn.descriptorDefinitionsfor a list of available descriptors.mode (
str) – Computation mode.'serial'or'parallel'. Default is'serial'. Parallel mode is not recommended for small datasets.max_workers (
int) – Number of workers to use in parallel mode. Default is4. Ignored in serial mode. If set toNone, will use all available cores. If set to0, will use 1 core.
- Return type:
List[list]- Returns:
List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in
self.network_list_available. If a network is not available, it will not be included in the list.
- runFromDirectory_dilute(directory, descriptor, baseStruct='pure', mode='serial', max_workers=8)[source]
Runs all loaded models on a list of dilute Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as
'poscar','cif','json','mcsqs', etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order:'1-Fe','2-Al','10-xx','11-xx','20-xx','21-xx','11111-xx', etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.- Parameters:
directory (
str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as'poscar','cif','json','mcsqs', etc., or a mix of these. The structures are automatically sorted as described above. The structures must be dilute structures, i.e. they must contain only one alloying element.descriptor (
str) – Descriptor to use. Must be one of the available descriptors. Seepysipfenn.descriptorDefinitionsfor a list of available descriptors.baseStruct (
str) – Non-diluted references for the dilute structures. Defaults to'pure', which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).mode (
str) – Computation mode.'serial'or'parallel'. Default is'serial'. Parallel mode is not recommended for small datasets.max_workers (
int) – Number of workers to use in parallel mode. Default is8. Ignored in serial mode. If set toNone, will use all available cores. If set to0, will use 1 core.
- Return type:
None- Returns:
List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in
self.network_list_available. If a network is not available, it will not be included in the list.
- runModels(descriptor, structList, mode='serial', max_workers=4)[source]
Runs all loaded models on a list of Structures using specified descriptor. Supports serial and parallel computation modes. If parallel is selected, max_workers determines number of processes handling the featurization of structures (90-99+% of computational intensity) and models are then run in series.
- Parameters:
descriptor (
str) – Descriptor to use. Must be one of the available descriptors. Seepysipfenn.descriptorDefinitionsto see available modules or add yours. Available default descriptors are:'Ward2017','KS2022'.structList (
List[Structure]) – List of pymatgen Structure objects to run the models on.mode (
str) – Computation mode.'serial'or'parallel'. Default is'serial'. Parallel mode is not recommended for small datasets.max_workers (
int) – Number of workers to use in parallel mode. Default is4. Ignored in serial mode. If set toNone, will use all available cores. If set to0, will use1core.
- Return type:
List[List[float]]- Returns:
List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in
self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.
- runModels_dilute(descriptor, structList, baseStruct='pure', mode='serial', max_workers=4)[source]
Runs all loaded models on a list of Structures using specified descriptor. A critical difference from runModels() is that this function will call dilute-specific featurizer, e.g.
KS2022_dilutewhen'KS2022'is provided as input, which can only be used on dilute structures (both based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to the KS2022 descriptor, which is more general and can be used on any structure. Supports serial and parallel modes in the same way asrunModels().- Parameters:
descriptor (
str) – Descriptor to use for predictions. Must be one of the descriptors which support the dilute structures (i.e. *_dilute). Seepysipfenn.descriptorDefinitionsto see available modules or add yours here. Available default dilute descriptors are now:'KS2022'. The'KS2022'can also be called fromrunModels()function, but is not recommended for dilute alloys, as it negates the speed increase of the dilute structure featurizer.structList (
List[Structure]) – List of pymatgenStructureobjects to run the models on. Must be dilute structures as described above.baseStruct (
Union[str,List[Structure]]) – Non-diluted references for the dilute structures. Defaults to ‘pure’, which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).mode (
str) – Computation mode.'serial'or'parallel'. Default is'serial'. Parallel mode is not recommended for small datasets.max_workers (
int) – Number of workers to use in parallel mode. Default is4. Ignored in serial mode. If set toNone, will use all available cores. If set to0, will use1core.
- Return type:
List[List[float]]- Returns:
List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in
self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.
- runModels_randomSolutions(descriptor, baseStructList, compList, minimumSitesPerExpansion=50, featureConvergenceCriterion=0.005, compositionConvergenceCriterion=0.01, minimumElementOccurrences=10, plotParameters=False, printProgress=False, mode='serial', max_workers=8)[source]
A top-level convenience wrapper for the
calculate_KS2022_randomSolutionsfunction. It passes all the arguments to that function directly (except fordescriptorand uses its result to run all applicable models. The result is a list of predictions for all run networks.- Parameters:
descriptor (
str) – Descriptor to use for predictions. Must be one of the descriptors which support the randomstructures (solid solution) –
v0.15.0 (available modules or add yours here. As of) –
is (the only available descriptor) –
submodule. ('KS2022' through its KS2022_randomSolutions) –
baseStructList (
Union[str,Structure,List[str],List[Structure],List[Union[Composition,str]]]) – Seecalculate_KS2022_randomSolutionsfor details. You can mix-and-matchStructureobjects and magic strings, either individually (to use the same entity for all calculations) or in a list.compList (
Union[str,List[str],Composition,List[Composition],List[Union[Composition,str]]]) – Seecalculate_KS2022_randomSolutionsfor details. You can mix-and-matchCompositionobjects and composition strings, either individually (to use the same entity for all calculations) or in a list.minimumSitesPerExpansion (
int) – Seecalculate_KS2022_randomSolutions.featureConvergenceCriterion (
float) – Seecalculate_KS2022_randomSolutions.compositionConvergenceCriterion (
float) – Seecalculate_KS2022_randomSolutions.minimumElementOccurrences (
int) – Seecalculate_KS2022_randomSolutions.plotParameters (
bool) – Seecalculate_KS2022_randomSolutions.printProgress (
bool) – Seecalculate_KS2022_randomSolutions.mode (
str) – Computation mode.'serial'or'parallel'. Default is'serial'. Parallel mode is not recommended for small datasets.
- Return type:
List[List[float]]- Returns:
List of predictions. They will correspond to the order of the networks in
self.toRunestablished by thefindCompatibleModels()function. If a network is not available, it will not be included in the list.
- updateModelAvailability()[source]
Updates availability of models based on the pysipfenn.modelsSIPFENN directory contents. Works only for current ONNX model definitions.
- Return type:
None
- writeDescriptorsToCSV(descriptor, file='descriptorData.csv')[source]
Writes the descriptor data to a CSV file. The first column is the name of the structure. If the
self.inputFilesattribute is populated automatically by runFromDirectory() or set manually, the names of the structures will be used. Otherwise, the names will be'1','2','3', etc. The remaining columns are the descriptor values. The order of the columns is the same as the order of the labels in the descriptor definition file.- Parameters:
descriptor (
str) – Descriptor to use. Must be one of the available descriptors. Seepysipgenn.descriptorDefinitionsfor a list of available descriptors, such as'KS2022'and'Ward2017'. It provides the labels for the descriptor values.file (
str) – Name of the file to write the results to. If the file already exists, it will be overwritten. If the file does not exist, it will be created. The file must have a'.csv'extension to be recognized correctly.
- Return type:
None
- writeResultsToCSV(file)[source]
Writes the results to a CSV file. The first column is the name of the structure. If the
self.inputFilesattribute is populated automatically byrunFromDirectory()or set manually, the names of the structures will be used. Otherwise, the names will be'1','2','3', etc. The remaining columns are the predictions for each network. The order of the columns is the same as the order of the networks inself.network_list_available.- Parameters:
file (
str) – Name of the file to write the results to. If the file already exists, it will be overwritten. If the file does not exist, it will be created. The file must have a'.csv'extension to be recognized correctly.- Return type:
None
- overwritePrototypeLibrary(prototypeLibrary)[source]
Destructively overwrites the prototype library with a custom one. Used by the
appendPrototypeLibrary()function to persist its changes. The other main use it to restore the default one to the original state based on a backup made earlier (see tests for an example).- Return type:
None
- string2prototype(c, prototype)[source]
Converts a prototype string to a pymatgen
Structureobject.- Parameters:
c (
Calculator) –Calculatorobject with theprototypeLibrary.prototype (
str) – Prototype string.
- Return type:
Structure- Returns:
Structureobject.
- ward2ks2022(ward2017)[source]
Converts a
Ward2017descriptor to aKS2022descriptor (which is its subset).- Parameters:
ward2017 (
ndarray) –Ward2017descriptor. Must be a 1Dnp.ndarrayof length271.- Return type:
ndarray- Returns:
KS2022descriptor array.
modelExporters
- class CoreMLExporter(calculator)[source]
Bases:
objectExport models to the
CoreMLformat to allow for easy loading and inference inCoreMLin other projects, particularly valuable for Apple devices, as pySIPFENN models can be run using the Neural Engine accelerator with minimal power consumption and neat optimizations.Note: Some of the dependencies (
coremltools) are not installed by default. If you need them, you have to install pySIPFENN in dev mode like:pip install "pysipfenn[dev]", or likepip install -e ".[dev]".- Parameters:
calculator (
Calculator) – ACalculatorobject with loaded models.
- export(model, append='')[source]
Export a loaded model to
CoreMLformat. Models will be saved as{model}.mlpackagein the current working directory. Models will be annotated with the feature vector name (Ward2017orKS2022) and the output will be named “property”. The latter behavior will be adjusted in the future when model output name and unit will be added to the model JSON metadata.- Parameters:
model (
str) – The name of the model to export (must be loaded in theCalculator) and it must have a descriptor (Ward2017orKS2022) defined in thecalculator.modelsdictionary created when theCalculatorwas initialized.append (
str) – A string to append to the exported model name after the model name. Useful for adding a version number or other information to the exported model name.
- Return type:
None- Returns:
None
- class ONNXExporter(calculator)[source]
Bases:
objectExport models to the ONNX format (what they ship in by default) to allow (1) exporting modified pySIPFENN models, (2) simplify the models using ONNX optimizer, and (3) convert them to FP16 precision, cutting the size in half.
Note: Some of the dependencies (
onnxconverter_commonandonnxsim) are not installed by default. If you need them, you have to install pySIPFENN in dev mode like:pip install "pysipfenn[dev]", or likepip install -e ".[dev]".- Parameters:
calculator (
Calculator) – ACalculatorobject with loaded models that has loaded PyTorch models (happens automaticallythe (when the autoLoad argument is kept to its default value of True when initializing the Calculator). During) –
initialization (in memory) –
ONNX (the loaded PyTorch models are converted back to) –
disk. (persisted to) –
- export(model, append='')[source]
Export a loaded model to ``ONNX``format.
- Parameters:
model (
str) – The name of the model to export (must be loaded in theCalculator).append (
str) – A string to append to the exported model name after the model name, simplification marker, and FP16 marker. Useful for adding a version number or other information to the exported model name.
- Return type:
None- Returns:
None
- exportAll(append='')[source]
Export all loaded models to
ONNXformat with the export function.appendstring can be passed to the export function to append to the exported model name.- Return type:
None
- simplify(model)[source]
Simplify a loaded model using the ONNX optimizer.
- Parameters:
model (
str) – The name of the model to simplify (must be loaded in theCalculator).- Return type:
None- Returns:
None
- class TorchExporter(calculator)[source]
Bases:
objectExport models to the
PyTorch PTformat to allow for easy loading and inference in PyTorch in other projects.- Parameters:
calculator (
Calculator) – ACalculatorobject with loaded models.
- export(model, append='')[source]
Export a loaded model to
PyTorch PTformat. Models are exported in eval mode (no dropout) and saved in the current working directory.- Parameters:
model (
str) – The name of the model to export (must be loaded in theCalculator) and it must have a descriptor (Ward2017orKS2022) defined in theCalculator.modelsdictionary created when theCalculatorwas initialized.append (
str) – A string to append to the exported model name after the model name. Useful for adding a version number or other information to the exported model name.
- Return type:
None- Returns:
None