pySIPFENN Core

pySIPFENN

class Calculator(autoLoad=True, verbose=True)[source]

Bases: object

pySIPFENN Calculator automatically initializes all functionalities including identification and loading of all available models defined statically in the models.json file. It exposes methods for calculating predefined structure-informed descriptors (feature vectors) and predicting properties using models that utilize them.

Parameters:
  • autoLoad (bool) – Automatically load all available ML models based on the models.json file. This will require significant memory and time if they are available, so for featurization and other non-model-requiring tasks, it is recommended to set this to False. Defaults to True.

  • verbose (bool) – Print initialization messages and several other non-critical messages during runtime procedures. Defaults to True.

models[source]

Dictionary with all model information based on the models.json file in the modelsSIPFENN directory. The keys are the network names and the values are dictionaries with the model information.

loadedModels[source]

Dictionary with all loaded models. The keys are the network names and the values are the loaded pytorch models.

descriptorData[source]

List of all descriptor data created during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input. The order of the list of descriptor data for each structure corresponds to the order of networks in the toRun list.

predictions[source]

List of all predictions created during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input. The order of the list of predictions for each structure corresponds to the order of networks in the toRun list.

inputFiles[source]

List of all input file names used during the last predictions run. The order of the list corresponds to the order of atomic structures given to models as input.

appendPrototypeLibrary(customPath)[source]

Parses a custom prototype library YAML file and permanently appends it into the internal prototypeLibrary of the pySIPFENN package. They will be persisted for future use and, by default, they will be loaded automatically when instantiating the Calculator object, similar to your custom models.

Parameters:

customPath (str) – Path to the prototype library YAML file to be appended to the internal self.prototypeLibrary of the Calculator object.

Return type:

None

Returns:

None

calculate_KS2022(structList, mode='serial', max_workers=8)[source]

Calculates KS2022 descriptors for a list of structures. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in the descriptorData attribute. The function returns the list of descriptors as well.

Parameters:
  • structList (List[Structure]) – List of structures to calculate descriptors for. The structures must be initialized with the pymatgen Structure class.

  • mode (str) – Mode of calculation. Defaults to ‘serial’. Options are 'serial' and 'parallel'.

  • max_workers (int) – Number of workers to use in parallel mode. Defaults to 8. If None, the number of workers will be set to the number of available CPU cores. If set to 0, 1 worker will be used.

Return type:

list

Returns:

List of KS2022 descriptor (feature vector) for each structure.

calculate_KS2022_dilute(structList, baseStruct='pure', mode='serial', max_workers=8)[source]

Calculates KS2022 descriptors for a list of dilute structures (either based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to the KS2022 descriptor, which is more general and can be used on any structure. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in the self.descriptorData attribute. The function returns the list of descriptors as well.

Parameters:
  • structList (List[Structure]) – List of structures to calculate descriptors for. The structures must be dilute structures (either based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. The structures must be initialized with the pymatgen Structure class.

  • baseStruct (Union[str, List[Structure]]) – Non-diluted references for the dilute structures. Defaults to 'pure', which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).

  • mode (str) – Mode of calculation. Defaults to 'serial'. Options are 'serial' and 'parallel'.

  • max_workers (int) – Number of workers to use in parallel mode. Defaults to 8. If None, the number of workers will be set to the number of available CPU cores. If set to 0, 1 worker will be used.

Return type:

List[ndarray]

Returns:

List of KS2022 descriptor (feature vector) np.ndarray for each structure.

calculate_KS2022_randomSolutions(baseStructList, compList, minimumSitesPerExpansion=50, featureConvergenceCriterion=0.005, compositionConvergenceCriterion=0.01, minimumElementOccurrences=10, plotParameters=False, printProgress=False, mode='serial', max_workers=8)[source]

Calculates KS2022 descriptors corresponding to random solid solutions occupying base structure / lattice sites for a list of compositions through method described in descriptorDefinitions.KS2022_randomSolutions submodule. The results are stored in the descriptorData attribute. The function returns the list of descriptors in numpy format as well.

Parameters:
  • baseStructList (Union[str, Structure, List[str], List[Structure], List[Union[Composition, str]]]) – The base structure to generate a random solid solution (RSS). It does _not_ need to be a simple Bravis lattice, such as BCC lattice, but can be any Structure object or a list of them, if you need to define them on per-case basis. In addition to Structure objects, you can use “magic” strings corresponding to one of the structures in the library you can find under pysipfenn.misc directory or loaded under self.prototypeLibrary attribute. The magic strings include, but are not limited to: 'BCC', 'FCC', 'HCP', 'DHCP', 'Diamond', and so on. You can invoke them by their name, e.g. BCC, or by passing self.prototypeLibrary['BCC']['structure'] directly. If you pass a list to baseStruct, you are allowed to mix-and-match Structure objects and magic strings.

  • compList (Union[str, List[str], Composition, List[Composition], List[Union[Composition, str]]]) – The composition to populate the supercell with until KS2022 descriptor converges. You can use pymatgen’s Composition objects or strings of valid chemical formulas (symbol - atomic fraction pairs), like 'Fe0.5Ni0.3Cr0.2', 'Fe50 Ni30 Cr20', or 'Fe5 Ni3 Cr2'. You can either pass a single entity, in which case it will be used for all structures (use to run the same composition for different base structures), or a list of entities, in which case pairs will be used in the order of the list. If you pass a list to compList, you are allowed to mix-and-match Composition objects and composition strings.

  • minimumSitesPerExpansion (int) – The minimum number of sites that the base structure will be expanded to (doubling dimension-by-dimension) before it is used as expansion step/batch in each iteration of adding local chemical environment information to the global ensemble. The optimal value will depend on the number of species and their relative fractions in the composition. Generally, low values (<20ish) will result in a slower convergence, as some extreme local chemical environments will have strong influence on the global ensemble, and too high values (>150ish) will result in a needlessly slow computation for not-complex compositions, as at least two iterations will be processed. The default value is 50 and works well for simple cases.

  • featureConvergenceCriterion (float) – The maximum difference between any feature belonging to the current iteration (statistics based on the global ensemble of local chemical environments) and the previous iteration (before last expansion) expressed as a fraction of the maximum value of each feature found in the OQMD database at the time of SIPFENN creation (see KS2022_randomSolutions.maxFeaturesInOQMD array). The default value is 0.01, corresponding to 1% of the maximum value.

  • compositionConvergenceCriterion (float) – The maximum average difference between any element fraction belonging to the current composition (net of all expansions) and the target composition (comp). The default value is 0.01, corresponding to 1% deviation, which interpretation will depend on the number of elements in the composition.

  • minimumElementOccurrences (int) – The minimum number of times all elements must occur in the composition before it is considered converged. This setting prevents the algorithm from converging before very dilute elements like C in low-carbon steel, have had a chance to occur. The default value is 10.

  • plotParameters (bool) – If True, the convergence history will be plotted using plotly. The default value is False, but tracking them is recommended and will be accessible in the metas attribute of the Calculator under the key 'RSS'.

  • printProgress (bool) – If True, the progress will be printed to the console. The default value is False.

  • mode (str) – Mode of calculation. Options are serial (default) and parallel.

  • max_workers (int) – Number of workers to use in parallel mode. Defaults to 8.

Return type:

List[ndarray]

Returns:

A list of numpy.ndarray``s containing the ``KS2022 descriptor, just like the ordinary KS2022. Please note the stochastic nature of this algorithm. The result will likely vary slightly between runs and parameters, so if convergence is critical, verify it with a test matrix of minimumSitesPerExpansion, featureConvergenceCriterion, and compositionConvergenceCriterion values.

calculate_Ward2017(structList, mode='serial', max_workers=4)[source]

Calculates Ward2017 descriptors for a list of structures. The calculation can be done in serial or parallel mode. In parallel mode, the number of workers can be specified. The results are stored in the self.descriptorData attribute. The function returns the list of descriptors as well.

Parameters:
  • structList (List[Structure]) – List of structures to calculate descriptors for. The structures must be initialized with the pymatgen Structure class.

  • mode (str) – Mode of calculation. Defaults to ‘serial’. Options are 'serial' and 'parallel'.

  • max_workers (int) – Number of workers to use in parallel mode. Defaults to 4. If None, the number of workers will be set to the number of available CPU cores. If set to 0, 1 worker will be used.

Return type:

list

Returns:

List of Ward2017 descriptor (feature vector) for each structure.

destroy()[source]

Deallocates all loaded models and clears all data from the Calculator object.

Return type:

None

downloadModels(network='all')[source]

Downloads ONNX models. By default, all available models are downloaded. If a model is already available on disk, it is skipped. If a specific network is given, only that network is downloaded, possibly overwriting the existing one. If the network name is not recognized, the message will be printed.

Parameters:

network (str) – Name of the network to download. Defaults to 'all'.

Return type:

None

findCompatibleModels(descriptor)[source]

Finds all models compatible with a given descriptor based on the descriptor definitions loaded from the models.json file.

Parameters:

descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: 'Ward2017', 'KS2022'.

Return type:

List[str]

Returns:

List of strings corresponding to compatible models.

get_resultDicts()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions.

Return type:

List[dict]

Returns:

List of dictionaries with the predictions.

get_resultDictsWithNames()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks and the names of the input structures. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions. Note that this function requires self.inputFiles to be set, which is done automatically when using runFromDirectory() or runFromDirectory_dilute() but not when using runModels() or runModels_dilute(), as the input structures are passed directly to the function and names have to be provided separately by assigning them to self.inputFiles.

Return type:

List[dict]

Returns:

List of dictionaries with the predictions.

loadModelCustom(networkName, modelName, descriptor, modelDirectory='.')[source]

Load a custom ONNX model from a custom directory specified by the user. The primary use case for this function is to load models that are not included in the package and cannot be placed in the package directory because of write permissions (e.g. on restrictive HPC systems) or storage allocations.

Parameters:
  • modelDirectory (str) – Directory where the model is located. Defaults to the current directory.

  • networkName (str) – Name of the network. This is the name used to refer to the ONNX network. It has to be unique, not contain any spaces, and correspond to the name of the ONNX file (excluding the .onnx extension).

  • modelName (str) – Name of the model. This is the name that will be displayed in the model selection menu. It can be any string desired.

  • descriptor (str) – Descriptor/feature vector used by the model. pySIPFENN currently supports the following descriptors: 'KS2022', and 'Ward2017'.

Return type:

None

loadModels(network='all')[source]

Load model/models into memory of the Calculator class. The models are loaded from the modelsSIPFENN directory inside the package. Its location can be seen by calling print() on the Calculator. The models are stored in the self.loadedModels attribute as a dictionary with the network string as key and the PyTorch model as value.

Note:

This function only works with models that are stored in the modelsSIPFENN directory inside the package, are in ONNX format, and have corresponding entries in models.json. For all others, you will need to use loadModelCustom().

Parameters:

network (str) – Default is 'all', which loads all models detected as available. Alternatively, a specific model can be loaded by its corresponding key in models.json. E.g. 'SIPFENN_Krajewski2020_NN9' or 'SIPFENN_Krajewski2022_NN30'. The key is the same as the network argument in downloadModels().

Raises:

ValueError – If the network name is not recognized or if the model is not available in the modelsSIPFENN directory.

Return type:

None

Returns:

None. It updates the loadedModels attribute of the Calculatorclass.

makePredictions(models, toRun, dataInList)[source]

Makes predictions using PyTorch networks listed in toRun and provided in models dictionary. Shared among all “predict” functions.

Parameters:
  • models (Dict[str, Module]) – Dictionary of models to use. Keys are network names and values are PyTorch models loaded from ONNX with loadModels() / loadModelCustom() or manually (fairly simple!).

  • toRun (List[str]) – List of networks to run. It must be a subset of models.keys().

  • dataInList (List[Union[List[float], array]]) – List of data to make predictions for. Each element of the list should be a descriptor accepted by all networks in toRun. Can be a list of lists of floats or a list of numpy ``nd.array``s.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the networks in toRun.

parsePrototypeLibrary(customPath='default', verbose=False, printCustomLibrary=False)[source]

Parses the prototype library YAML file in the misc directory, interprets them into pymatgen Structure objects, and stores them in the self.prototypeLibrary dict attribute of the Calculator object. You can use it also to temporarily append a custom prototype library (by providing a path) which will live as long as the Calculator. For permanent changes, use appendPrototypeLibrary().

Parameters:
  • customPath (str) – Path to the prototype library YAML file. Defaults to the magic string "default", which loads the default prototype library included in the package in the misc directory.

  • verbose (bool) – If True, it prints the number of prototypes loaded. Defaults to False, but note that Calculator class automatically initializes with verbose=True.

  • printCustomLibrary (bool) – If True, it prints the name and POSCAR of each prototype being added to the prototype library. Has no effect if customPath is 'default'. Defaults to False.

Return type:

None

Returns:

None

runFromDirectory(directory, descriptor, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as 'poscar', 'cif', 'json', 'mcsqs', etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order: '1-Fe', '2-Al', '10-xx', '11-xx', '20-xx', '21-xx', '11111-xx', etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

Parameters:
  • directory (str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as 'poscar', 'cif', 'json', 'mcsqs', etc., or a mix of these. The structures are automatically sorted as described above.

  • descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipgenn.descriptorDefinitions for a list of available descriptors.

  • mode (str) – Computation mode. 'serial' or 'parallel'. Default is 'serial'. Parallel mode is not recommended for small datasets.

  • max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

runFromDirectory_dilute(directory, descriptor, baseStruct='pure', mode='serial', max_workers=8)[source]

Runs all loaded models on a list of dilute Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as 'poscar', 'cif', 'json', 'mcsqs', etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order: '1-Fe', '2-Al', '10-xx', '11-xx', '20-xx', '21-xx', '11111-xx', etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

Parameters:
  • directory (str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as 'poscar', 'cif', 'json', 'mcsqs', etc., or a mix of these. The structures are automatically sorted as described above. The structures must be dilute structures, i.e. they must contain only one alloying element.

  • descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions for a list of available descriptors.

  • baseStruct (str) – Non-diluted references for the dilute structures. Defaults to 'pure', which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).

  • mode (str) – Computation mode. 'serial' or 'parallel'. Default is 'serial'. Parallel mode is not recommended for small datasets.

  • max_workers (int) – Number of workers to use in parallel mode. Default is 8. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

None

Returns:

List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

runModels(descriptor, structList, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. Supports serial and parallel computation modes. If parallel is selected, max_workers determines number of processes handling the featurization of structures (90-99+% of computational intensity) and models are then run in series.

Parameters:
  • descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: 'Ward2017', 'KS2022'.

  • structList (List[Structure]) – List of pymatgen Structure objects to run the models on.

  • mode (str) – Computation mode. 'serial' or 'parallel'. Default is 'serial'. Parallel mode is not recommended for small datasets.

  • max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[List[float]]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

runModels_dilute(descriptor, structList, baseStruct='pure', mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. A critical difference from runModels() is that this function will call dilute-specific featurizer, e.g. KS2022_dilute when 'KS2022' is provided as input, which can only be used on dilute structures (both based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to the KS2022 descriptor, which is more general and can be used on any structure. Supports serial and parallel modes in the same way as runModels().

Parameters:
  • descriptor (str) – Descriptor to use for predictions. Must be one of the descriptors which support the dilute structures (i.e. *_dilute). See pysipfenn.descriptorDefinitions to see available modules or add yours here. Available default dilute descriptors are now: 'KS2022'. The 'KS2022' can also be called from runModels() function, but is not recommended for dilute alloys, as it negates the speed increase of the dilute structure featurizer.

  • structList (List[Structure]) – List of pymatgen Structure objects to run the models on. Must be dilute structures as described above.

  • baseStruct (Union[str, List[Structure]]) – Non-diluted references for the dilute structures. Defaults to ‘pure’, which assumes that the structures are based on pure elements and generates references automatically. Alternatively, a list of structures can be provided, which can be either pure elements or custom base structures (e.g. TCP endmember configurations).

  • mode (str) – Computation mode. 'serial' or 'parallel'. Default is 'serial'. Parallel mode is not recommended for small datasets.

  • max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[List[float]]

Returns:

List of predictions. Each element of the list is a list of predictions for all run networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

runModels_randomSolutions(descriptor, baseStructList, compList, minimumSitesPerExpansion=50, featureConvergenceCriterion=0.005, compositionConvergenceCriterion=0.01, minimumElementOccurrences=10, plotParameters=False, printProgress=False, mode='serial', max_workers=8)[source]

A top-level convenience wrapper for the calculate_KS2022_randomSolutions function. It passes all the arguments to that function directly (except for descriptor and uses its result to run all applicable models. The result is a list of predictions for all run networks.

Parameters:
  • descriptor (str) – Descriptor to use for predictions. Must be one of the descriptors which support the random

  • structures (solid solution) –

  • v0.15.0 (available modules or add yours here. As of) –

  • is (the only available descriptor) –

  • submodule. ('KS2022' through its KS2022_randomSolutions) –

  • baseStructList (Union[str, Structure, List[str], List[Structure], List[Union[Composition, str]]]) – See calculate_KS2022_randomSolutions for details. You can mix-and-match Structure objects and magic strings, either individually (to use the same entity for all calculations) or in a list.

  • compList (Union[str, List[str], Composition, List[Composition], List[Union[Composition, str]]]) – See calculate_KS2022_randomSolutions for details. You can mix-and-match Composition objects and composition strings, either individually (to use the same entity for all calculations) or in a list.

  • minimumSitesPerExpansion (int) – See calculate_KS2022_randomSolutions.

  • featureConvergenceCriterion (float) – See calculate_KS2022_randomSolutions.

  • compositionConvergenceCriterion (float) – See calculate_KS2022_randomSolutions.

  • minimumElementOccurrences (int) – See calculate_KS2022_randomSolutions.

  • plotParameters (bool) – See calculate_KS2022_randomSolutions.

  • printProgress (bool) – See calculate_KS2022_randomSolutions.

  • mode (str) – Computation mode. 'serial' or 'parallel'. Default is 'serial'. Parallel mode is not recommended for small datasets.

Return type:

List[List[float]]

Returns:

List of predictions. They will correspond to the order of the networks in self.toRun established by the findCompatibleModels() function. If a network is not available, it will not be included in the list.

updateModelAvailability()[source]

Updates availability of models based on the pysipfenn.modelsSIPFENN directory contents. Works only for current ONNX model definitions.

Return type:

None

writeDescriptorsToCSV(descriptor, file='descriptorData.csv')[source]

Writes the descriptor data to a CSV file. The first column is the name of the structure. If the self.inputFiles attribute is populated automatically by runFromDirectory() or set manually, the names of the structures will be used. Otherwise, the names will be '1', '2', '3', etc. The remaining columns are the descriptor values. The order of the columns is the same as the order of the labels in the descriptor definition file.

Parameters:
  • descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipgenn.descriptorDefinitions for a list of available descriptors, such as 'KS2022' and 'Ward2017'. It provides the labels for the descriptor values.

  • file (str) – Name of the file to write the results to. If the file already exists, it will be overwritten. If the file does not exist, it will be created. The file must have a '.csv' extension to be recognized correctly.

Return type:

None

writeResultsToCSV(file)[source]

Writes the results to a CSV file. The first column is the name of the structure. If the self.inputFiles attribute is populated automatically by runFromDirectory() or set manually, the names of the structures will be used. Otherwise, the names will be '1', '2', '3', etc. The remaining columns are the predictions for each network. The order of the columns is the same as the order of the networks in self.network_list_available.

Parameters:

file (str) – Name of the file to write the results to. If the file already exists, it will be overwritten. If the file does not exist, it will be created. The file must have a '.csv' extension to be recognized correctly.

Return type:

None

overwritePrototypeLibrary(prototypeLibrary)[source]

Destructively overwrites the prototype library with a custom one. Used by the appendPrototypeLibrary() function to persist its changes. The other main use it to restore the default one to the original state based on a backup made earlier (see tests for an example).

Return type:

None

string2prototype(c, prototype)[source]

Converts a prototype string to a pymatgen Structure object.

Parameters:
  • c (Calculator) – Calculator object with the prototypeLibrary.

  • prototype (str) – Prototype string.

Return type:

Structure

Returns:

Structure object.

ward2ks2022(ward2017)[source]

Converts a Ward2017 descriptor to a KS2022 descriptor (which is its subset).

Parameters:

ward2017 (ndarray) – Ward2017 descriptor. Must be a 1D np.ndarray of length 271.

Return type:

ndarray

Returns:

KS2022 descriptor array.

wrapper_KS2022_dilute_generate_descriptor(args)[source]

Wraps the KS2022_dilute.generate_descriptor function for parallel processing.

wrapper_KS2022_randomSolutions_generate_descriptor(args)[source]

Wraps the KS2022_randomSolutions.generate_descriptor function for parallel processing.

modelExporters

class CoreMLExporter(calculator)[source]

Bases: object

Export models to the CoreML format to allow for easy loading and inference in CoreML in other projects, particularly valuable for Apple devices, as pySIPFENN models can be run using the Neural Engine accelerator with minimal power consumption and neat optimizations.

Note: Some of the dependencies (coremltools) are not installed by default. If you need them, you have to install pySIPFENN in dev mode like: pip install "pysipfenn[dev]", or like pip install -e ".[dev]".

Parameters:

calculator (Calculator) – A Calculator object with loaded models.

calculator[source]

A Calculator object with loaded models.

export(model, append='')[source]

Export a loaded model to CoreML format. Models will be saved as {model}.mlpackage in the current working directory. Models will be annotated with the feature vector name (Ward2017 or KS2022) and the output will be named “property”. The latter behavior will be adjusted in the future when model output name and unit will be added to the model JSON metadata.

Parameters:
  • model (str) – The name of the model to export (must be loaded in the Calculator) and it must have a descriptor (Ward2017 or KS2022) defined in the calculator.models dictionary created when the Calculator was initialized.

  • append (str) – A string to append to the exported model name after the model name. Useful for adding a version number or other information to the exported model name.

Return type:

None

Returns:

None

exportAll(append='')[source]

Export all loaded models to CoreML format with the export function. append can be passed to the export function to append to all exported model names.

Return type:

None

class ONNXExporter(calculator)[source]

Bases: object

Export models to the ONNX format (what they ship in by default) to allow (1) exporting modified pySIPFENN models, (2) simplify the models using ONNX optimizer, and (3) convert them to FP16 precision, cutting the size in half.

Note: Some of the dependencies (onnxconverter_common and onnxsim) are not installed by default. If you need them, you have to install pySIPFENN in dev mode like: pip install "pysipfenn[dev]", or like pip install -e ".[dev]".

Parameters:
  • calculator (Calculator) – A Calculator object with loaded models that has loaded PyTorch models (happens automatically

  • the (when the autoLoad argument is kept to its default value of True when initializing the Calculator). During) –

  • initialization (in memory) –

  • ONNX (the loaded PyTorch models are converted back to) –

  • disk. (persisted to) –

calculator[source]

A Calculator object with ONNX loaded models.

simplifiedDict[source]

A boolean dictionary of models that have been simplified.

fp16Dict[source]

A boolean dictionary of models that have been converted to FP16.

export(model, append='')[source]

Export a loaded model to ``ONNX``format.

Parameters:
  • model (str) – The name of the model to export (must be loaded in the Calculator).

  • append (str) – A string to append to the exported model name after the model name, simplification marker, and FP16 marker. Useful for adding a version number or other information to the exported model name.

Return type:

None

Returns:

None

exportAll(append='')[source]

Export all loaded models to ONNX format with the export function. append string can be passed to the export function to append to the exported model name.

Return type:

None

simplify(model)[source]

Simplify a loaded model using the ONNX optimizer.

Parameters:

model (str) – The name of the model to simplify (must be loaded in the Calculator).

Return type:

None

Returns:

None

simplifyAll()[source]

Simplify all loaded models with the simplify function.

toFP16(model)[source]

Convert a loaded model to FP16 precision.

Parameters:

model (str) – The name of the model to convert to FP16 (must be loaded in the Calculator).

Return type:

None

Returns:

None

toFP16All()[source]

Convert all loaded models to FP16 precision with the toFP16 function.

class TorchExporter(calculator)[source]

Bases: object

Export models to the PyTorch PT format to allow for easy loading and inference in PyTorch in other projects.

Parameters:

calculator (Calculator) – A Calculator object with loaded models.

calculator[source]

A Calculator object with loaded models.

export(model, append='')[source]

Export a loaded model to PyTorch PT format. Models are exported in eval mode (no dropout) and saved in the current working directory.

Parameters:
  • model (str) – The name of the model to export (must be loaded in the Calculator) and it must have a descriptor (Ward2017 or KS2022) defined in the Calculator.models dictionary created when the Calculator was initialized.

  • append (str) – A string to append to the exported model name after the model name. Useful for adding a version number or other information to the exported model name.

Return type:

None

Returns:

None

exportAll(append='')[source]

Exports all loaded models to PyTorch PT format with the export function. append can be passed to the export function

Return type:

None