Output Data Formats#
The dftio parse command can save the processed data in several formats, controlled by the --format (or -f) option. Each format has its own advantages depending on the use case.
dat (Directory Format)#
This is the default format (-f dat). It saves the parsed data into a well-organized directory structure, making it easy to inspect and use with simple scripts.
A new directory is created for each parsed calculation, named {formula}.{index} (e.g., Si2.0).
Structure Data: Saved as plain text files.
cell.dat: Lattice vectors for each frame.positions.dat: Atomic positions for each frame.atomic_numbers.dat: A list of atomic numbers for the system.pbc.dat: A boolean array indicating periodic boundary conditions (e.g.,[True, True, True]).
Eigenvalue Data: Saved as NumPy binary files (
.npy).kpoints.npy: The coordinates of each k-point.eigenvalues.npy: A 3D array containing the eigenvalues for each frame, k-point, and band index.
Matrix Data (Hamiltonian, Overlap, etc.): Saved in HDF5 format (
.h5).hamiltonians.h5: Contains the real-space Hamiltonian matrices.overlaps.h5: Contains the overlap matrices.density_matrices.h5: Contains the density matrices. Each HDF5 file is organized by frame number, with datasets inside corresponding to the matrix blocks (e.g.,0_0_0_0_0).basis.dat: A text file describing the atomic basis set used.
ase (Atomic Simulation Environment) Format#
This format (-f ase) is similar to dat, but it stores the structural information in a standard ASE trajectory file, which is useful for direct integration with the Atomic Simulation Environment.
Structure Data:
xdat.traj: An ASE trajectory file containing the structure (atoms, positions, cell) for every frame.
Eigenvalue and Matrix Data:
All other data (eigenvalues, matrices) are saved in the same way as the
datformat, using.npyand.h5files within the same output directory.
lmdb (Lightning DB) Format#
This format (-f lmdb) is designed for high-performance applications where quick access to individual data frames from a very large dataset is required. It stores all information in a single binary database file.
Database File:
A single file named
data.{pid}.lmdbis created, where{pid}is the process ID.
Structure:
The database contains key-value pairs. Each value is a pickled Python dictionary that represents a single frame of the calculation.
Each dictionary contains all data for that frame:
positions,cell,atomic_numbers,eigenvalues,hamiltonian, etc.This structure avoids having to manage thousands of individual files and can significantly speed up data loading.