Utils

This module contains utility functions used across the package and which may be useful also for external applications.

class clusterx.utils.Exponential(exponent, coefficient=1)

Basic exponential object of type coefficient * x ^ exponent . Numerically evalueted using the method evaluate ( x ) .

Parameters:

exponent: exponent of the exponential

coefficient: coefficient of the exponential

class clusterx.utils.PolynomialBasis(max_order=10, symmetric=False)

Polynomial basis, constructed from several PolynomialFunction(). Constructs orthonormal basis sets using scalcar_product . When initialized, all orthonormal basis sets to the order max_order are generated.

Parameters

max_order: Maximal order to which the basis set is initialized.

symmetric: Defines if the scalar product uses sigmas symmetrized around 0 or ascending from 0.

scalar_product(function1, function2, m)

returns the result of the scalar product 1/m sum_{sigma} f_1(sigma) * f_2(sigma)

class clusterx.utils.PolynomialFunction

Polynomial function, build from Exponential() .

exception clusterx.utils.SupercellError
clusterx.utils.add_noise(v, noise_level)

Add randomly distributed noise to vector coordinates

To each coordinate of the input vector v, random noise uniformly distributed between -noise_level and noise_level is added. The input vector v is left unchanged. The modified vector is returned.

Parameters:

v: list of floats

The input vector

noise_level: float

Width of the uniform distribution used to add noise.

clusterx.utils.atat_to_cell(file_path='lat.in', interpret_as='parent_lattice', parent_lattice=None, pbc=None, wrap=True)

Parse a lat.in or str.out file from ATAT.

ATAT users may use the input files from ATAT to perform a cluster expansion with CELL. This function allows to convert an input lat.in file from ATAT to a ParentLattice object in CELL. One may also read str.out files, which are converted to Structure objects in CELL.

Parameters:

file_path: string

string containing the path of the file to be parsed. The file must have format corresponding to a lat.in or str.out file from ATAT.

interpret_as: string or None

Indicate how to interpret the file in file_path. Possible values are:

  • None:

    Three arrays are returned: cell, r, and species.

  • parent_lattice:

    The recommended value if file_path corresponds to a lat.in input file from ATAT. A ParentLattice CELL object will be returned.

  • super_cell:

    The recommended value if file_path corresponds to a lat.in file with a matrix of lattice vectors (u,v,w in ATAT doc) different to the identity, or if it is known that the structure is a periodic repetition of a parent lattice. For this option, the parent lattice must be provided (see parent_lattice parameter below). A SuperCell object will be returned.

  • structure:

    The recommended value if file_path corresponds to a str.out file from ATAT. For this option, the parent lattice must be provided (see parent_lattice parameter below). In this case a Structure object will be returned.

parent_lattice: ParentLattice object

If interpret_as is super_cell or structure, a parent lattice must be provided. This must be compatible with the information in the lat.in file that was used to create the str.out files. See the examples below.

pbc: one or three bool (same as ASE’s Atoms object)

Periodic boundary conditions flags. Examples: True, False, 0, 1, (1, 1, 0), (True, False, False). Default value: False. The returned CELL objects will have these pbc’s set up.

wrap: boolean (default:True)

Wrap atomic coordinates. If pbc is None, pbc is set to (1,1,1). Set wrap to False if structure corresponds to a supercell, i.e., if the second matrix of the structure definition in either the lat.in or str.out file is different from the identity matrix.

Returns:

Depending on the value of interpret_as, the returned object can be python arrays (interpret_as=None), a ParentLattice (interpret_as="parent_lattice"), a SuperCell (interpret_as="super_cell"), or a Structure (interpret_as="Structure").

Examples:

clusterx.utils.atoms_equivalence_check(atoms, to_primitive=True, pretty_print=False)

Find equivalent structures in an array of Atoms objects

Equivalence is determined in terms of symmetry between structures

The SymmetryEquivalenceCheck tool of ASE is used.

Parameters:

atoms: array of Atoms objects

The structures to be analyzed.

to_primitive: Boolean (default: True)

If True the structures are reduced to their primitive cells. This feature requires spglib to installed (cf. ASE’s SymmetryEquivalenceCheck)

Returns: Returns a dictionary. The keys (k) are structure indices of unique representative structures, and the values (v) are arrays of integer, indicating all structure indices equivalent to k (containing k itself too). For instance, the dictionary:

{"0": [0, 1, 3, 8, 9],
 "2": [2, 5, 6],
 "4": [4, 7]}
indicates that in the structures set with indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] there are just three distinct structures. These can be represented by strucutres 0, 2 and 4. The structures [0, 1, 3, 8, 9] are all equivalent, etc.
Notice that here equivalence is used in the sense explained above: It is symmetrical equivalence only if cpool and comat are None.
clusterx.utils.calculate_trafo_matrix(pcell, scell, rnd=5)

Calculate integer transformation matrix given a primitive cell and a super-cell

If \(S\) and \(V\) are, respectively, a matrix whose rows are the cartesian coordinates of the parent lattice vectors and a matrix whose rows are the cartesian coordinates of the super-cell lattice vectors; then, this function returns the matrix \(P=SV^{-1}\). If the resulting matrix is not integer (see rnd parameter), then None is returned.

Parameters:

pcell: 3x3 array of float

The rows of this matrix correspond to the cartesian coordinates of a parent lattice

scell: 3x3 array of float

The rows of this matrix correspond to the cartesian coordinates of a supercell

rnd: integer (optional)

The matrix \(P=SV^{-1}\) is rounded to rnd decimal places and checked for integrity.

clusterx.utils.decorate_supercell(scell, atoms)

Create a Structure instance by decorating a SuperCell with an Atoms object from ASE.

clusterx.utils.dict_compare(d1, d2, tol=None)

Compare two dictionaries containing mutable objects.

This compares two dictionaries. Two dictionaries are considered equal even if the position of keys differ. Handles mutable values in the dict. Some parts are taken from:

https://stackoverflow.com/questions/4527942/comparing-two-dictionaries-in-python

Parameters:

d1,d2: python dictionaries

dictionaries to be compared

tol: float

a small float number. If not None, the comparison of dictionary values is regarded as a vector comparison and done with utils.isclose(). For the meaning of tol, read the documentation for rtol parameter of utils.isclose().

Return: boolean

True if dicts are equal, False if they are different.

clusterx.utils.get_cl_idx_sc(cl, sc, method=0, tol=0.001)

Return atom indexes of cluster points in SuperCell

Parameters:

cl: npoints x 3 matrix

matrix of cartesian or scaled coordinates of cluster points. Cluster positions are expected to be wrapped inside supercell sc

sc: natoms x 3 matrix

matrix of cartesian or scaled coordinates of supercell atomic positions. The type of coordinates (either cartesion or scaled) must coincide with that of cl

method: integer

Method to use. 0: (slow) nested for loop using numpy allclose. 1: (fast) calculates all distances from points in cl to atoms in sc, and return indices for which distances are zero.

tol: real positive number

tolerance to determine whether cluster and atom positions are the same.

clusterx.utils.isclose(r1, r2, rtol=0.0001)

Determine whether two vectors are similar

Parameters:

r1,r2: 1D arrays of integer or float

Vectors to be compared

rtol: float

Tolerance.

Returns:

Boolean: if the euclidean distance between r1 and r2 is smaller than rtol, True is returned, otherwise False is returned.

clusterx.utils.list_integer_named_folders(root='.', prefix='', suffix='', containing_files=[], not_containing_files=[])

Return array of integer named folders.

Scans folders in root and detects those which are named with an integer number. It returns then a sorted list of the found integers, e.g. , if the folder structure is:

root/
    1/
    2/
    5/
    7/
    8/
    run_2/
    run_3/
    run_4/
    run_5/
    run_old/
    folderx/
    foldery/

then the following array is returned:

[1,2,5,7,8].

If prefix is set to some string, then it returns a (sorted) list of strings. For example, if prefix is set to "run_", then this will return the array:

["run_2","run_3","run_4","run_5"]

Parameters:

root: string

path of the root folder in which to scan for integer named folders.

prefix: string

scan for folders whose name starts with the string prefix followed by and integer number (and possibly the string suffix).

suffix: string

scan for folders whose name ends with the string suffix and is preceded by an integer number (and possibly the string prefix).

containing_files: array of strings

a list of file names that should be contained in the returned folders.

not_containing_files: array of strings

a list of file names that should not be contained in the returned folders.

Returns:

array of integers if default value for prefix is used. Otherwise returns an array of strings

clusterx.utils.make_supercell(prim, P, wrap=True, tol=1e-05)

Generate a supercell by applying a general transformation (P) to the input configuration (prim).

This function is a modified version of ASEs build/supercells.py. The modification here fixes a bug in ASEs implementation, introduced in ASEs version 3.18.0: for certain transformation matrices, the determinant of the matrix is negative, and the function exits with error, since a negative number of atoms is obtained. An example script demonstrating the error is:

from ase.build import make_supercell
from ase.atoms  import Atoms

prim = Atoms(symbols='Cu',
      pbc=True,
      cell=[[0.0, 1.805, 1.805], [1.805, 0.0, 1.805], [1.805, 1.805, 0.0]])

sc = make_supercell(prim, P = [[ 2,  2, -2],[ 2, -2,  2],[-2,  2,  2]], wrap = True, tol = 1e-05)

The release 3.22.0 of ASE still contains the bug. It was informed to ASEs developers on the 3.7.2021

The transformation is described by a 3x3 integer matrix mathbf{P}. Specifically, the new cell metric mathbf{h} is given in terms of the metric of the input configuration mathbf{h}_p by mathbf{P h}_p = mathbf{h}.

Parameters:

prim: ASE Atoms object

Input configuration.

P: 3x3 integer matrix

Transformation matrix mathbf{P}.

wrap: bool

wrap in the end

tol: float

tolerance for wrapping

clusterx.utils.mgrep(fpath, search_array, prepend='', root='.')

Grep strings in file and return matching lines.

Parameters:

fpath: string

File path to grep

search_array: array of strings

Each element of the array is an string to grep in fpath.

prepend: string

prepend string prepend to each matching line.

root: string

File to grep should be in root/fpath.

clusterx.utils.parent_lattice_to_atat(plat, out_fname='lat.in', for_str=False)

Serializes ParentLattice object to ATAT input file

Parameters:

plat: ParentLattice object

ParentLattice object to be serialized

out_fname: string

Output file path

clusterx.utils.poppush(x, val)

Left-shift array one position and writes new value to the right. Returns average.

Parameters: x: numpy array val: int or float

clusterx.utils.remove_vacancies(at)

Remove every Atom containing ‘X’ as species symbol or 0 as atomic number from an Atoms object and return the resulting Atoms object.

Parameters: at: Atoms object

clusterx.utils.report_sset_equivalence_check(sset, sset_equivalence_check_output, property_name=None, tol=0.0)

Generate report of equivalent structures

Writes to files: sset_unique_sym.json and sset_unique_gss.json.

The first contains the structures whose index are given by the keys of the dictionary returned by sset_equivalence_check().

The second contains the structures of every equivalence subset where the value of property_name is minimal. So, if the property is an energy, the final set will contain all lowest energy structures of every equivalence subset.

clusterx.utils.sort_atoms(atoms, key=(2, 1, 0))

Return atoms object with sorted atomic coordinates

The default sorting is: increasing z-coordinate first, increasing y-coordinate second, increasing x-coordinate third. Useful to get well ordered slab structures, for instance. Sorting can be changed by appropriately setting the key argument, with the same effect as in:

from operator import itemgetter
sp = sorted(p, key=itemgetter(2,1,0))

where p is a Nx3 array of vector coordinates.

clusterx.utils.sset_equivalence_check(sset, to_primitive=True, cpool=None, basis='trigonometric', comat=None, pretty_print=False)

Find equivalent structures in a StructuresSet object

Equivalence is determined i) in terms of symmetry between structures or ii) in terms of cluster basis representation (if a ClustersPool object or a correlation matrix is given).

In the first case, the SymmetryEquivalenceCheck tool of ASE is used.

Parameters:

sset: StructuresSet object

The structures set object to be analyzed.

to_primitive: Boolean (default: True)

If True the structures are reduced to their primitive cells. This feature requires spglib to installed (cf. ASE’s SymmetryEquivalenceCheck)

cpool: ClustersPool object (default: None)

This parameter is optional. If given, the equivalence of a pair of structures is determined according to their cluster basis representation: two structures with the same cluster correlations for the clusters in cpool are considered equivalent.

basis: string (default: "trigonometric")

Only used if cpool is not None. Site basis functions used in the determination of cluster correlations.

comat: 2D array of floats (default: None)

It overrides cpool and basis. If a correlation matrix for the structures set sset was pre-computed, the equivalence in terms of cluster basis representation is performed by comparing the rows of this matrix (each row must correspond to a structure in the structures set).

Returns: Returns a dictionary. The keys (k) are structure indices of unique representative structures, and the values (v) are arrays of integer, indicating all structure indices equivalent to k (containing k itself too). For instance, the dictionary:

{"0": [0, 1, 3, 8, 9],
 "2": [2, 5, 6],
 "4": [4, 7]}
indicates that in the structures set with indices [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] there are just three distinct structures. These can be represented by strucutres 0, 2 and 4. The structures [0, 1, 3, 8, 9] are all equivalent, etc.
Notice that here equivalence is used in the sense explained above: It is symmetrical equivalence only if cpool and comat are None.
clusterx.utils.sub_folders(root)

Return list of subfolders

Parameters:

root: string

path of the absolute folder for which you want to get the list of subfolders.

clusterx.utils.super_structure(struc0, d)

Create a super structure

This function takes a Structure instance (struc0) and creates a new structure which is obtained as the periodic repetition of the original structure along its unit cell vectors. The number of repetitions along each cell vector is given by the three components of the input integer vector d.

Parameters:

struc0: Structure object

Original structure for the superstructure.

d: int, three-component integer array, or 3x3 integer array

The super structure is obtained by the transformation d S, with d a 3x3 matrix of integer and S the supercell cell vectors.