The StructuresSet class

Initialization and methods

class clusterx.structures_set.StructuresSet(parent_lattice=None, filepath=None, json_db_filepath=None, calculator=None, quick_parse=False, **sset_opts)

StructureSet class

Objects of this class contain a set of structures. This set can be used for various purposes, for instance as a training data set for cluster expansion, or as a validation set for cross validation. All the structures contained in a StructuresSet object must derive from a single ParentLattice object.

Parameters:

parent_lattice: ParentLattice object

All the structures on a structures set must derive from the same parent lattice given here. This argument can be ommited if parsing from file (see below).

filepath: String

if provided, the structures set is initialized from a structures_set file, as created by StructuresSet.serialize() or StructuresSet.write_files(). In this case, the parent_lattice argument can be ommited (if present, it is overriden).

json_db_filepath: String

Deprecated, use filepath instead. If set, overrides filepath

calculator: ASE calculator object (default: None)

quick_parse: Boolean (default: False)

if True, it assumes that, in the json file to be parsed (see db_fname), the atom indices of the structures are the same as those of the supercell. Otherwise, the atom positions of structures and supercell are verified for every structure in the structures set being parsed. This leads to a slower parsing but safer if not sure how the file was built.

Deprecated parameters:

db_fname: replaced by json_db_filepath

Examples:

Methods:

add_structure(structure, folder='', **props)

Add a structure to the StructuresSet object

Parameters:

structure: Structure object

Structure object for the structure to be added.

folder: string (default:””)

optionally, path of the folder containing ab-initio runs. Paths are created automatically when calling StructuresSet.write_files(). See related documentation for more details.

props: keyword arguments

keyword arguments to be stored in the properties dictionary of a StructuresSet object.

add_structures(structures=None, sort_key=None)

Add structures to the StructureSet object

Parameters:

structures: list of Structure objects, path to JSON file, or StructuresSet object

Structures to be added

sort_key: list of three integers (default:None)

Only relevant if structures is a JSON file. Sort atomic positions after reading. For example, the value (2,1,0) will sort as: increasing z-coordinate first, increasing y-coordinate second, increasing x-coordinate third. Useful to get well ordered slab structures, for instance.

calculate_energies(calculator, structure_fname='geometry.json')

Perform ab-initio calculation of energies using an ASE calculator.

The folders list as returned by StructuresSet.get_folders() is iterated. The current working directory (cwd) is set to the actual folder in the loop. The structure in the file structure_fname is converted to an Atoms object, whose calculator is set to calulator. The Atoms.get_potential_energy() method is called and the resulting total energy is stored in the file cwd/energy.dat.

compute_property_values(property_name='energy', property_calc=None, rm_vacancies=True, update_json_db=True, **kwargs)

Return array of calculated property for all structures in the structures set.

Parameters:

property_name: string (default: “energy”)

Name of the property to be calculated. This is used as a key for the self._props dictionary. The property values can be recovered by calling the method StructureSet.get_property_values(property_name) (see documentation).

property_calc: function (default: None)

If none, the property value is calculated with the calculator object assigned to the structures set with the method StructuresSet.set_calculator(). If not None, it must be a function with the following signature:

my_function(i, structure, **kwargs)    

where i is the structure index, structure is the structure object for structure index i, and **kwargs are any additional keyword arguments. The function must return a number.

rm_vacancies: Boolean (default:True)

Only takes effect if property_func is None, i.e., when an ASE calculator (or derived calculator) is used. If True, the “Atoms.get_potential_energy()” method is applied to a copy of Structure.atoms object with vacancy sites removed, i.e., atom positions containing species with species number 0 or species symbol “X”.

update_json_db: Boolean (default:True)

Whether to update the json database file (in case one is attached to the sset instance).

**kwargs: keyword argument list, arbitrary length

keyword arguments directly passed to property_func function. You may call this method as:

sset_instance.calculate_property(property_name="my_prop", property_func="my_func", arg1=arg1, ..., argN=argN)

where arg1 to argN are the keyword arguments passed to the my_func(i, structure, **kwargs) function.

energy_parser(folder, structure=None, **kwargs)

Read value stored in energy.dat file.

This is to be used as the default argument for the read_property parameter of the StructureSet.read_property_values() method. Can be used as a template for reading different properties to be passed to StructureSet.read_property_values().

Parameters:

i: integer

folder number

folder: string

absolute or relative path of the folder containing the file/s to be read.

structure: Structure object

structure object for structure index i

**kwargs: keyword arguments

Extra arguments needed for the property reading. See documentation of StructureSet.read_property_values().

get_calculator()

Get Calculator object associated to the structures set.

get_concentrations(site_type=0, sigma=1)

Get concentration values for a given site type

get_db()

Get json database object corresponding to the list of folders containing structure files for ab-initio calculations as created by StructureSet.write_files()

get_db_fname()

Get file name of json database corresponding to the list of folders containing structure files for ab-initio calculations as created by StructureSet.write_files()

get_folders()

Get list of folders containing structure files for ab-initio calculations as created by StructureSet.write_files()

get_images(rm_vac=True, n=None)

Return array of Atoms objects from structures set.

Parameters:

rm_vac: Boolean

whether the returned Atoms objects contain vacancies, i.e. atoms with species number 0 or chemical symbol X. If true, vacancy sites are eliminated in the returned Atoms objects

n: integer

return the first n structures. If None, return all structures.

get_natoms()

Return array of number of atoms of every strucure in the structures set.

get_nstr()

Return number of structures in the structures set.

get_parent_lattice()

Get ParentLattice object of structures set.

Returns the ParentLattice object from which all the structures in the StructuresSet object derive.

get_predictions(cemodel)

Get predictions of CE model on structures set

Applies the given cluster expansion model to every structure in the structrues set and returns an array with the computed values.

Parameters:

cemodel: Model object

Cluster expansion model for which predictions want to be computed.

get_property()

Get property dictionary of StructuresSet object

All the properties in a StructuresSet object, are stored in a dictionary with the following structure:

{"prop_name_1": [p10, p11, ...], "prop_name_2": [p20,p21, ..], ...}

where "prop_name_i" is the name of the property i, and pij is the value of property i for structure j.

This dictionary is returned by this method.

get_property_names()

Return list of stored property names.

get_property_values(property_name)

Return list of property values.

Parameters:

property_name: String

Name of the property. If not sure, a list of property names can be obtained StructuresSet.get_property_names().

Returns:

props: python array

A python array with the property values

get_scell_indices()

Return array of supercell indices.

Every structure in a structure set, is a (“decorated”) supercell. The index of a supercell is an integer number, equal to the super cell volume in units of the parent cell volume.

This method returns an array of supercell indices, corresponding to each structure in the structures set.

get_structure(sid)

Get one structure of the set

Parameters:

sid: integer

index of structure in the structure set.

Returns:

Structure object.

get_structure_atoms(sid)

Return Atoms object for db row sid.

get_structures()

Get all structures of the set

Return:

list of Structure objects.

get_subset(structure_indices=[], transfer_properties=True)

Return structures set instance containing a subset of structures of the original structures set

Parameters

structure_indices: list or array

indices of the structures in the original StructuresSet to be included in the subset.

transfer_properties: Boolean

if True (default), copy the properties from the original StructuresSet to the subset.

parse_property_values(property_name='total_energy', write_to_file=True, property_parser=<function StructuresSet.energy_parser>, root='', update_json_db=True, **kwargs)

Read calculated property values from ab-inito output files

Read property values from ab-initio code output files. These files are contained in paths:

[[root] /] [prefix] id [suffix] / file_to_read

as created by StructureSet.write_input_files(). The folders to be searched for energy values are those returned by StructureSet.get_folders(). These can be also obtained directly from the "metadata":{"folders":[ ... ]} elements of the json database file.

The read property value is stored in the json-database of the StructuresSet object (i.e., that obtained from StructureSet.get_db_fname()), under the key "data": {"properties": { ... }} dictionary for every structure. For instance:

"data": {"properties": {"formation_energy_per_site": -0.05788398131602701, "total_energy": -9824740.09590308}},

where "formation_energy_per_site" and "total_energy" here are the string value of the parameter property_name in the call to read_property_values().

Parameters:

property_name: string

key for the self._props dictionary of property values

write_to_file: Boolean

Whether to write property values to a file with name property_name.dat.

read_property: function

Function to extract property value from ab-initio files. Return value must be scalar and signature is:

read_property(i,folder_path, structure = None, **kwargs)

where i is the structure index, folder_path is the path of the folder containing the relevant ab-initio files, structure is the structure object for structure index i, and **kwargs are any additional keyword arguments.

root: string

the root folder containing the subfolders with ab-initio data. See description above.

update_json_db: Boolean (default:True)

Whether to update the json database file (in case one is attached to the sset instance).

**kwargs: keyword argument list, arbitrary length

keyword arguments directly passed to read_property function. You may call this method as:

sset_instance.read_property_values(read_property, arg1=arg1, ..., argN=argN)

where arg1 to argN are the keyword arguments passed to the read_property(folder_path,**kwargs) function.

serialize(filepath='sset.json', path=None, overwrite=False, rm_vac=False)

Serialize StructuresSet object

The serialization creates a Json ASE database object and writes a json file. This file can be used to reconstruct a StructuresSet object, by initializing with:

StructuresSet(filename="sset.json")

where “sset.json” is the file written in filepath.

Parameters:

filepath: string

Output file name.

path: string

DEPRECATED, use filepath instead. Output file name.

set_calculator(calc)

Assign calculator object to every structure in the structures set.

Parameters:

calc: Calculator object

set_property_values(property_name='total_energy', property_vals=[], update_json_db=True)

Set property values

Set the property values.

If a folders’ json-database (as created by StructuresSet.write_files()) exists, it is updated.

Parameters:

property_name: string

Name of the property

property_vals: array

Array of property values

update_json_db: Boolean (default:True)

Whether to update the json database file (in case one is attached to the sset instance).

set_property_values_from_files(property_name='property', property_file_name='property.dat', cwd='./')

Set property values read from files

Consider a StructuresSet oject named sset.

The list of folders sset.get_folders() is iterated and the value stored in the file named property_file_name is parsed and assigned to the corresponding sample in the sset. The name of the property is property_name and can be recovered by calling sset.get_property_values(property_name).

If an associated json database exists, it is updated with the new property.

Parameters:

property_name: string

The name used to label the property in the structures set. This label is then listed in sset.get_property_names() and the property values for this label can be obtained by calling sset.get_property_values(property_name)

property_file_name: string

In every folder of the list sset.get_folders() there must be a file named property_file_name containing a real number with the value of the property

write_input_files(root='.', prefix='', suffix='', fnames=None, formats=[], overwrite=True, rm_vac=False)

Create folders containing structure input files for ab-initio calculations.

Structure files are written to files with path:

[[root] /] [prefix] id [suffix] / [filename]

Where root, prefix, suffix, and filename are explained below, and id + 1 is the structure id in a created JSON database with path:

[[root] /] [prefix]id0-idN[suffix] . json

where id0 and idN are the smallest and largest id indices. The path of the created folders are stored in the json-database created by StructuresSet.write_files(), under, for instance:

{
...
"key_value_pairs": {"folder": "./random_strs-14"},
...
}

Parameters:

root: String

path to the root folder containing the set of created folders

prefix: String

prefix for name of folder containing the files

suffix: String

suffix for name of folder containing the files

fnames: array of Strings

Array of file names for files contaning the structure. If not set defaults to geometry.json.

formats: array of Strings, optional

Array of file formats corresponding to the file names in fnames. Possible formats are listed in ase.io.write. If entirely ommited, or if an element of the array is None, the format is guessed from the corresponding file name.

overwrite: boolean

Whether to overrite content of existing folders.

remove_vacancies: Boolean

Vacancies are represented with chemical symbol X and atomic number 0. Output file formats will contain lines with atomic positions corresponding to vacancies. If you want them absent in the files, set remove_vacancies to True.