StructureSet class
Objects of this class contain a set of structures. This set can be used for various purposes, for instance as a training data set for cluster expansion, or as a validation set for cross validation. All the structures contained in a StructuresSet object must derive from a single ParentLattice object.
Parameters:
parent_lattice
: ParentLattice objectAll the structures on a structures set must derive from the same parent lattice given here. This argument can be ommited if parsing from file (see below).
filepath
: Stringif provided, the structures set is initialized from a structures_set file, as created
by StructuresSet.serialize()
or StructuresSet.write_files()
. In this case,
the parent_lattice
argument can be ommited (if present, it is overriden).
json_db_filepath
: StringDeprecated, use filepath
instead. If set, overrides filepath
calculator
: ASE calculator object (default: None)
quick_parse
: Boolean (default: False
)if True, it assumes that, in the json file to be parsed (see db_fname
),
the atom indices of the structures are the same as those of the supercell.
Otherwise, the atom positions of structures and supercell are verified for
every structure in the structures set being parsed. This leads to a slower parsing
but safer if not sure how the file was built.
Deprecated parameters:
db_fname
: replaced by json_db_filepath
Examples:
Methods:
Add a structure to the StructuresSet object
Parameters:
structure
: Structure objectStructure object for the structure to be added.
folder
: string (default:””)optionally, path of the folder containing ab-initio runs. Paths are
created automatically when calling StructuresSet.write_files()
.
See related documentation for more details.
props
: keyword argumentskeyword arguments to be stored in the properties dictionary of a StructuresSet object.
Add structures to the StructureSet object
Parameters:
structures
: list of Structure objects, path to JSON file, or StructuresSet objectStructures to be added
sort_key
: list of three integers (default:None)Only relevant if structures
is a JSON file. Sort atomic
positions after reading. For example, the value (2,1,0)
will sort as: increasing z-coordinate first, increasing
y-coordinate second, increasing x-coordinate third. Useful to get
well ordered slab structures, for instance.
Perform ab-initio calculation of energies using an ASE calculator.
The folders list as returned by StructuresSet.get_folders()
is
iterated. The current working directory (cwd
) is set to the
actual folder in the loop. The structure in the file structure_fname
is converted to an Atoms
object, whose calculator is set to
calulator
. The Atoms.get_potential_energy()
method is called
and the resulting total energy is stored in the file cwd/energy.dat
.
Return array of calculated property for all structures in the structures set.
Parameters:
property_name
: string (default: “energy”)Name of the property to be calculated. This is used as a key for the
self._props
dictionary. The property values can be recovered by
calling the method StructureSet.get_property_values(property_name)
(see documentation).
property_calc
: function (default: None
)If none, the property value is calculated with the calculator object assigned
to the structures set with the method StructuresSet.set_calculator()
. If not
None, it must be a function with the following signature:
my_function(i, structure, **kwargs)
where i
is the structure index, structure
is the structure object for structure index i
, and **kwargs
are
any additional keyword arguments. The function must return a number.
rm_vacancies
: Boolean (default:True
)Only takes effect if property_func
is None
, i.e., when an ASE calculator
(or derived calculator) is used. If True, the “Atoms.get_potential_energy()” method
is applied to a copy of Structure.atoms object with vacancy sites removed,
i.e., atom positions containing species with species number 0 or species
symbol “X”.
update_json_db
: Boolean (default:True
)Whether to update the json database file (in case one is attached to the sset instance).
**kwargs
: keyword argument list, arbitrary lengthkeyword arguments directly passed to property_func
function.
You may call this method as:
sset_instance.calculate_property(property_name="my_prop", property_func="my_func", arg1=arg1, ..., argN=argN)
where arg1
to argN
are the keyword arguments passed to the
my_func(i, structure, **kwargs)
function.
Read value stored in energy.dat
file.
This is to be used as the default argument for the read_property
parameter of the StructureSet.read_property_values()
method. Can be used as a template for reading different properties to
be passed to StructureSet.read_property_values()
.
Parameters:
i
: integerfolder number
folder
: stringabsolute or relative path of the folder containing the file/s to be read.
structure
: Structure objectstructure object for structure index i
**kwargs
: keyword argumentsExtra arguments needed for the property reading. See documentation of
StructureSet.read_property_values()
.
Get Calculator object associated to the structures set.
Get concentration values for a given site type
Get json database object corresponding to the list of folders
containing structure files for ab-initio calculations
as created by StructureSet.write_files()
Get file name of json database corresponding to the list of folders
containing structure files for ab-initio calculations
as created by StructureSet.write_files()
Get list of folders containing structure files for ab-initio calculations
as created by StructureSet.write_files()
Return array of Atoms objects from structures set.
Parameters:
rm_vac
: Booleanwhether the returned Atoms objects contain vacancies, i.e. atoms with species number 0 or chemical symbol X. If true, vacancy sites are eliminated in the returned Atoms objects
n
: integerreturn the first n
structures. If None
, return all structures.
Return array of number of atoms of every strucure in the structures set.
Return number of structures in the structures set.
Get ParentLattice object of structures set.
Returns the ParentLattice object from which all the structures in the StructuresSet object derive.
Get predictions of CE model on structures set
Applies the given cluster expansion model to every structure in the structrues set and returns an array with the computed values.
Parameters:
cemodel
: Model objectCluster expansion model for which predictions want to be computed.
Get property dictionary of StructuresSet object
All the properties in a StructuresSet object, are stored in a dictionary with the following structure:
{"prop_name_1": [p10, p11, ...], "prop_name_2": [p20,p21, ..], ...}
where "prop_name_i"
is the name of the property i
, and pij
is the value of property i
for structure j
.
This dictionary is returned by this method.
Return list of stored property names.
Return list of property values.
Parameters:
property_name
: StringName of the property. If not sure, a list of property names can be
obtained StructuresSet.get_property_names()
.
Returns:
props
: python arrayA python array with the property values
Return array of supercell indices.
Every structure in a structure set, is a (“decorated”) supercell. The index of a supercell is an integer number, equal to the super cell volume in units of the parent cell volume.
This method returns an array of supercell indices, corresponding to each structure in the structures set.
Get one structure of the set
Parameters:
sid
: integerindex of structure in the structure set.
Returns:
Structure object.
Return Atoms object for db row sid.
Get all structures of the set
Return:
list of Structure objects.
Return structures set instance containing a subset of structures of the original structures set
Parameters
structure_indices
: list or arrayindices of the structures in the original StructuresSet to be included in the subset.
transfer_properties
: Booleanif True (default), copy the properties from the original StructuresSet to the subset.
Read calculated property values from ab-inito output files
Read property values from ab-initio code output files. These files are contained in paths:
[[root] /] [prefix] id [suffix] / file_to_read
as created by StructureSet.write_input_files()
. The folders to be searched
for energy values are those returned by StructureSet.get_folders()
. These
can be also obtained directly from the "metadata":{"folders":[ ... ]}
elements
of the json database file.
The read property value is stored in the json-database of the StructuresSet object
(i.e., that obtained from StructureSet.get_db_fname()
), under the
key "data": {"properties": { ... }}
dictionary for every structure. For instance:
"data": {"properties": {"formation_energy_per_site": -0.05788398131602701, "total_energy": -9824740.09590308}},
where "formation_energy_per_site"
and "total_energy"
here are the
string value of the parameter property_name
in the call to read_property_values()
.
Parameters:
property_name
: stringkey for the self._props
dictionary of property values
write_to_file
: BooleanWhether to write property values to a file with name property_name.dat
.
read_property
: functionFunction to extract property value from ab-initio files. Return value must be scalar and signature is:
read_property(i,folder_path, structure = None, **kwargs)
where i
is the structure index, folder_path
is the path
of the folder containing the relevant ab-initio files, structure
is the structure object for structure index i
, and **kwargs
are
any additional keyword arguments.
root
: stringthe root folder containing the subfolders with ab-initio data. See description above.
update_json_db
: Boolean (default:True
)Whether to update the json database file (in case one is attached to the sset instance).
**kwargs
: keyword argument list, arbitrary lengthkeyword arguments directly passed to read_property
function.
You may call this method as:
sset_instance.read_property_values(read_property, arg1=arg1, ..., argN=argN)
where arg1
to argN
are the keyword arguments passed
to the read_property(folder_path,**kwargs)
function.
Serialize StructuresSet object
The serialization creates a Json ASE database object and writes a json file. This file can be used to reconstruct a StructuresSet object, by initializing with:
StructuresSet(filename="sset.json")
where “sset.json” is the file written in filepath
.
Parameters:
filepath
: stringOutput file name.
path
: stringDEPRECATED, use filepath instead. Output file name.
Assign calculator object to every structure in the structures set.
Parameters:
calc
: Calculator object
Set property values
Set the property values.
If a folders’ json-database (as created by StructuresSet.write_files()
)
exists, it is updated.
Parameters:
property_name
: stringName of the property
property_vals
: arrayArray of property values
update_json_db
: Boolean (default:True
)Whether to update the json database file (in case one is attached to the sset instance).
Set property values read from files
Consider a StructuresSet
oject named sset
.
The list of folders sset.get_folders()
is iterated and the value stored in the file named
property_file_name
is parsed and assigned to the corresponding sample in the sset
.
The name of the property is property_name
and can be recovered by calling
sset.get_property_values(property_name)
.
If an associated json database exists, it is updated with the new property.
Parameters:
property_name
: stringThe name used to label the property in the structures set. This label is then listed
in sset.get_property_names()
and the property values for this label can be obtained
by calling sset.get_property_values(property_name)
property_file_name
: stringIn every folder of the list sset.get_folders()
there must be a file named
property_file_name
containing a real number with the value of the property
Create folders containing structure input files for ab-initio calculations.
Structure files are written to files with path:
[[root] /] [prefix] id [suffix] / [filename]
Where root
, prefix
, suffix
, and filename
are explained
below, and id
+ 1 is the structure id in a created JSON database with
path:
[[root] /] [prefix]id0-idN[suffix] . json
where id0
and idN
are the smallest and largest id
indices.
The path of the created folders are stored in the json-database created
by StructuresSet.write_files()
, under, for instance:
{
...
"key_value_pairs": {"folder": "./random_strs-14"},
...
}
Parameters:
root
: Stringpath to the root folder containing the set of created folders
prefix
: Stringprefix for name of folder containing the files
suffix
: Stringsuffix for name of folder containing the files
fnames
: array of StringsArray of file names for files contaning the structure. If not set defaults
to geometry.json
.
formats
: array of Strings, optionalArray of file formats corresponding to the file names in fnames
. Possible
formats are listed in ase.io.write.
If entirely ommited, or if an element of the array is None
, the format is
guessed from the corresponding file name.
overwrite
: booleanWhether to overrite content of existing folders.
remove_vacancies
: BooleanVacancies are represented with chemical symbol X
and atomic number
0. Output file formats will contain lines with atomic positions corresponding
to vacancies. If you want them absent in the files, set remove_vacancies
to True
.