Output Data Formats

Output Data Formats#

The dftio parse command can save the processed data in several formats, controlled by the --format (or -f) option. Each format has its own advantages depending on the use case.

`dat` (Directory Format)#

This is the default format (-f dat). It saves the parsed data into a well-organized directory structure, making it easy to inspect and use with simple scripts.

A new directory is created for each parsed calculation, named {formula}.{index} (e.g., Si2.0).

Structure Data: Saved as plain text files.
- cell.dat: Lattice vectors for each frame.
- positions.dat: Atomic positions for each frame.
- atomic_numbers.dat: A list of atomic numbers for the system.
- pbc.dat: A boolean array indicating periodic boundary conditions (e.g., [True, True, True]).
Eigenvalue Data: Saved as NumPy binary files (.npy).
- kpoints.npy: The coordinates of each k-point.
- eigenvalues.npy: A 3D array containing the eigenvalues for each frame, k-point, and band index.
Matrix Data (Hamiltonian, Overlap, etc.): Saved in HDF5 format (.h5).
- hamiltonians.h5: Contains the real-space Hamiltonian matrices.
- overlaps.h5: Contains the overlap matrices.
- density_matrices.h5: Contains the density matrices. Each HDF5 file is organized by frame number, with datasets inside corresponding to the matrix blocks (e.g., 0_0_0_0_0).
- basis.dat: A text file describing the atomic basis set used.

`ase` (Atomic Simulation Environment) Format#

This format (-f ase) is similar to dat, but it stores the structural information in a standard ASE trajectory file, which is useful for direct integration with the Atomic Simulation Environment.

Structure Data:
- xdat.traj: An ASE trajectory file containing the structure (atoms, positions, cell) for every frame.
Eigenvalue and Matrix Data:
- All other data (eigenvalues, matrices) are saved in the same way as the dat format, using .npy and .h5 files within the same output directory.

`lmdb` (Lightning DB) Format#

This format (-f lmdb) is designed for high-performance applications where quick access to individual data frames from a very large dataset is required. It stores all information in a single binary database file.

Database File:
- A single file named data.{pid}.lmdb is created, where {pid} is the process ID.
Structure:
- The database contains key-value pairs. Each value is a pickled Python dictionary that represents a single frame of the calculation.
- Each dictionary contains all data for that frame: positions, cell, atomic_numbers, eigenvalues, hamiltonian, etc.
- This structure avoids having to manage thousands of individual files and can significantly speed up data loading.

Output Data Formats

Contents

Output Data Formats#

dat (Directory Format)#

ase (Atomic Simulation Environment) Format#

lmdb (Lightning DB) Format#

`dat` (Directory Format)#

`ase` (Atomic Simulation Environment) Format#

`lmdb` (Lightning DB) Format#