pysipfenn.core package

Submodules

pysipfenn.core.pysipfenn module

class pysipfenn.Calculator[source]

Bases: object

pySIPFENN Calculator automatically initializes all functionalities. This includes identification and loading of all available models defined statically in models.json file.

calculate_KS2022(structList, mode='serial', max_workers=10)[source]

calculate_KS2022_dilute(structList, baseStruct='pure', mode='serial', max_workers=10)[source]

calculate_Ward2017(structList, mode='serial', max_workers=10)[source]

Calculates Ward2017 descriptors for a list of structures.

Parameters:

structList (List[Structure]) – List of structures to calculate descriptors for.
mode (str, optional) – Mode of calculation. Defaults to ‘serial’.
max_workers (int, optional) – Number of workers to use in parallel mode. Defaults to 10.

Returns:

List of descriptors.

Return type:

list

downloadModels(network='all')[source]

Downloads all ONNX models.

Parameters:: network (str, optional) – Name of the network to download. Defaults to ‘all’.

downloadModels_legacyMxNet(network='all')[source]: Legacy Function Downloads MxNet models.

findCompatibleModels(descriptor)[source]

Finds all models compatible with a given descriptor based on the descriptor definitions loaded from the models.json file.

Parameters:: descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘Ward2017’, ‘KS2022’, ‘KS2022_dilute’.
Return type:: List[str]
Returns:: List of compatible models.

get_resultDicts()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions.

Return type:: List[dict]
Returns:: List of dictionaries with the predictions.

get_resultDictsWithNames()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks and the names of the input structures. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions. Note that this function requires self.inputFiles to be set, which is done automatically when using runFromDirectory() or runFromDirectory_dilute() but not when using runModels() or runModels_dilute(), as the input structures are passed directly to the function and names have to be provided separately by assigning them to self.inputFiles.

Return type:: List[dict]
Returns:: List of dictionaries with the predictions.

loadModels()[source]: Fill a dictionary of available models with loaded model neural networks in self.loadedModels.

makePredictions(models, toRun, dataInList)[source]

makePredictions_legacyMxNet(mxnet_networks, dataInList)[source]

Makes predictions using legacy mxnet networks. This is a legacy function and will be removed in future versions. Compatibility with legacy networks is not guaranteed. Use at your own risk.

Parameters:

mxnet_networks (List[str]) – List of networks to use.
dataInList (List[List[float]]) – List of data to make predictions for. Each element of the list should be a list of descriptors.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran network.

runFromDirectory(directory, descriptor, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as ‘poscar’, ‘cif’, ‘json’, ‘mcsqs’, etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order: ‘1-Fe’, ‘2-Al’, ‘10-xx’, ‘11-xx’, ‘20-xx’, ‘21-xx’, ‘11111-xx’, etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

Parameters:

directory (str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as ‘poscar’, ‘cif’, ‘json’, ‘mcsqs’, etc., or a mix of these. The structures are automatically sorted as described above.
descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipgenn.descriptorDefinitions for a list of available descriptors.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

runFromDirectory_dilute(directory, descriptor, baseStruct='pure', mode='serial', max_workers=4)[source]

runModels(descriptor, structList, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. Supports serial and parallel computation modes. If parallel is selected, max_workers determines number of processes handling the featurization of structures (90-99+% of computational intensity) and models are then run in series.

Parameters:

descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘Ward2017’, ‘KS2022’.
structList (List[Structure]) – List of pymatgen Structure objects to run the models on.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

runModels_dilute(descriptor, structList, baseStruct='pure', mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. A critical difference from runModels() is that this function supports the KS2022_dilute descriptor, which can only be used on dilute structures (both based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to the KS2022 descriptor, which is more general and can be used on any structure. Supports serial and parallel modes in the same way as runModels().

Parameters:

descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘KS2022_dilute’. The ‘KS2022’ should also work, but is not recommended, as it negates the speed increase of the dilute descriptor.
structList (List[Structure]) – List of pymatgen Structure objects to run the models on. Must be dilute structures as described above.
baseStruct (Union[str, Structure]) – Base structure to use for the dilute descriptor. Can be a Structure object or a string. If a string, must be ‘pure’ indicating that the dilute structures given as input are pure elements alloyed with a single atom. If the base structure is not pure, it must be a Structure object which differs from the input Structures by one atom.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

updateModelAvailability()[source]: Updates availability of models based on the pysipfenn.modelsSIPFENN directory contents. Works only for current ONNX model definitions. Legacy support for MxNet models is retained in other functions, but they have to be manually added here.

writeDescriptorsToCSV(descriptor, file)[source]

writeResultsToCSV(file)[source]

pysipfenn.ward2ks2022(ward2017)[source]

Return type:: ndarray

Module contents

class pysipfenn.Calculator[source]

Bases: object

pySIPFENN Calculator automatically initializes all functionalities. This includes identification and loading of all available models defined statically in models.json file.

calculate_KS2022(structList, mode='serial', max_workers=10)[source]

calculate_KS2022_dilute(structList, baseStruct='pure', mode='serial', max_workers=10)[source]

calculate_Ward2017(structList, mode='serial', max_workers=10)[source]

Calculates Ward2017 descriptors for a list of structures.

Parameters:

structList (List[Structure]) – List of structures to calculate descriptors for.
mode (str, optional) – Mode of calculation. Defaults to ‘serial’.
max_workers (int, optional) – Number of workers to use in parallel mode. Defaults to 10.

Returns:

List of descriptors.

Return type:

list

downloadModels(network='all')[source]

Downloads all ONNX models.

Parameters:: network (str, optional) – Name of the network to download. Defaults to ‘all’.

downloadModels_legacyMxNet(network='all')[source]: Legacy Function Downloads MxNet models.

findCompatibleModels(descriptor)[source]

Finds all models compatible with a given descriptor based on the descriptor definitions loaded from the models.json file.

Parameters:: descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘Ward2017’, ‘KS2022’, ‘KS2022_dilute’.
Return type:: List[str]
Returns:: List of compatible models.

get_resultDicts()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions.

Return type:: List[dict]
Returns:: List of dictionaries with the predictions.

get_resultDictsWithNames()[source]

Returns a list of dictionaries with the predictions for each network. The keys of the dictionaries are the names of the networks and the names of the input structures. The order of the dictionaries is the same as the order of the input structures passed through runModels() functions. Note that this function requires self.inputFiles to be set, which is done automatically when using runFromDirectory() or runFromDirectory_dilute() but not when using runModels() or runModels_dilute(), as the input structures are passed directly to the function and names have to be provided separately by assigning them to self.inputFiles.

Return type:: List[dict]
Returns:: List of dictionaries with the predictions.

loadModels()[source]: Fill a dictionary of available models with loaded model neural networks in self.loadedModels.

makePredictions(models, toRun, dataInList)[source]

makePredictions_legacyMxNet(mxnet_networks, dataInList)[source]

Makes predictions using legacy mxnet networks. This is a legacy function and will be removed in future versions. Compatibility with legacy networks is not guaranteed. Use at your own risk.

Parameters:

mxnet_networks (List[str]) – List of networks to use.
dataInList (List[List[float]]) – List of data to make predictions for. Each element of the list should be a list of descriptors.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran network.

runFromDirectory(directory, descriptor, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures it automatically imports from a specified directory. The directory must contain only atomic structures in formats such as ‘poscar’, ‘cif’, ‘json’, ‘mcsqs’, etc., or a mix of these. The structures are automatically sorted using natsort library, so the order of the structures in the directory, as defined by the operating system, is not important. Natural sorting, for example, will sort the structures in the following order: ‘1-Fe’, ‘2-Al’, ‘10-xx’, ‘11-xx’, ‘20-xx’, ‘21-xx’, ‘11111-xx’, etc. This is useful when the structures are named using a numbering system. The order of the predictions is the same as the order of the input structures. The order of the networks in a prediction is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

Parameters:

directory (str) – Directory containing the structures to run the models on. The directory must contain only atomic structures in formats such as ‘poscar’, ‘cif’, ‘json’, ‘mcsqs’, etc., or a mix of these. The structures are automatically sorted as described above.
descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipgenn.descriptorDefinitions for a list of available descriptors.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list.

runFromDirectory_dilute(directory, descriptor, baseStruct='pure', mode='serial', max_workers=4)[source]

runModels(descriptor, structList, mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. Supports serial and parallel computation modes. If parallel is selected, max_workers determines number of processes handling the featurization of structures (90-99+% of computational intensity) and models are then run in series.

Parameters:

descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘Ward2017’, ‘KS2022’.
structList (List[Structure]) – List of pymatgen Structure objects to run the models on.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

runModels_dilute(descriptor, structList, baseStruct='pure', mode='serial', max_workers=4)[source]

Runs all loaded models on a list of Structures using specified descriptor. A critical difference from runModels() is that this function supports the KS2022_dilute descriptor, which can only be used on dilute structures (both based on pure elements and on custom base structures, e.g. TCP endmember configurations) that contain a single alloying atom. Speed increases are substantial compared to the KS2022 descriptor, which is more general and can be used on any structure. Supports serial and parallel modes in the same way as runModels().

Parameters:

descriptor (str) – Descriptor to use. Must be one of the available descriptors. See pysipfenn.descriptorDefinitions to see available modules or add yours. Available default descriptors are: ‘KS2022_dilute’. The ‘KS2022’ should also work, but is not recommended, as it negates the speed increase of the dilute descriptor.
structList (List[Structure]) – List of pymatgen Structure objects to run the models on. Must be dilute structures as described above.
baseStruct (Union[str, Structure]) – Base structure to use for the dilute descriptor. Can be a Structure object or a string. If a string, must be ‘pure’ indicating that the dilute structures given as input are pure elements alloyed with a single atom. If the base structure is not pure, it must be a Structure object which differs from the input Structures by one atom.
mode (str) – Computation mode. ‘serial’ or ‘parallel’. Default is ‘serial’. Parallel mode is not recommended for small datasets.
max_workers (int) – Number of workers to use in parallel mode. Default is 4. Ignored in serial mode. If set to None, will use all available cores. If set to 0, will use 1 core.

Return type:

List[list]

Returns:

List of predictions. Each element of the list is a list of predictions for all ran networks. The order of the predictions is the same as the order of the input structures. The order of the networks is the same as the order of the networks in self.network_list_available. If a network is not available, it will not be included in the list. If a network is not compatible with the selected descriptor, it will not be included in the list.

updateModelAvailability()[source]: Updates availability of models based on the pysipfenn.modelsSIPFENN directory contents. Works only for current ONNX model definitions. Legacy support for MxNet models is retained in other functions, but they have to be manually added here.

writeDescriptorsToCSV(descriptor, file)[source]

writeResultsToCSV(file)[source]

pysipfenn.ward2ks2022(ward2017)[source]

Return type:: ndarray