heat.io

Enables parallel I/O with data on disk.

Module Contents

supports_netcdf() bool[source]

Returns True if Heat supports reading from and writing to netCDF4 files, False otherwise.

supports_hdf5() bool[source]

Returns True if Heat supports reading from and writing to HDF5 files, False otherwise.

load(path: str, *args: List[object] | None, **kwargs: Dict[str, object] | None) heat.core.dndarray.DNDarray[source]

Attempts to load data from a file stored on disk. Attempts to auto-detect the file format by determining the extension. Supports at least CSV files, HDF5 and netCDF4 are additionally possible if the corresponding libraries are installed.

Parameters:
  • path (str) – Path to the file to be read.

  • args (list, optional) – Additional options passed to the particular functions.

  • kwargs (dict, optional) – Additional options passed to the particular functions.

Raises:
  • ValueError – If the file extension is not understood or known.

  • RuntimeError – If the optional dependency for a file extension is not available.

Examples

>>> ht.load("data.h5", dataset="DATA")
DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.load("data.nc", variable="DATA")
DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.load("my_data.zarr", variable="RECEIVER_1/DATA")
DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=0)
>>> ht.load("my_data.zarr", variable="RECEIVER_*/DATA")
DNDarray([[ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981],
            [ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981],
            [ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981]], dtype=ht.float32, device=cpu:0, split=0)

See also

load_csv()

Loads data from a CSV file.

load_csv_from_folder()

Loads multiple .csv files into one DNDarray which will be returned.

load_hdf5()

Loads data from an HDF5 file.

load_netcdf()

Loads data from a NetCDF4 file.

load_npy_from_path()

Loads multiple .npy files into one DNDarray which will be returned.

load_zarr()

Loads zarr-Format into DNDarray which will be returned.

load_csv(path: str, header_lines: int = 0, sep: str = ',', dtype: heat.core.types.datatype = types.float32, encoding: str = 'utf-8', split: int | None = None, device: str | None = None, comm: heat.core.communication.Communication | None = None) heat.core.dndarray.DNDarray[source]

Loads data from a CSV file. The data will be distributed along the axis 0.

Parameters:
  • path (str) – Path to the CSV file to be read.

  • header_lines (int, optional) – The number of columns at the beginning of the file that should not be considered as data.

  • sep (str, optional) – The single char or str that separates the values in each row.

  • dtype (datatype, optional) – Data type of the resulting array.

  • encoding (str, optional) – The type of encoding which will be used to interpret the lines of the csv file as strings.

  • split (int or None : optional) – Along which axis the resulting array should be split. Default is None which means each node will have the full array.

  • device (str, optional) – The device id on which to place the data, defaults to globally set default device.

  • comm (Communication, optional) – The communication to use for the data distribution, defaults to global default

Raises:

TypeError – If any of the input parameters are not of correct type.

Examples

>>> import heat as ht
>>> a = ht.load_csv("data.csv")
>>> a.shape
[0/3] (150, 4)
[1/3] (150, 4)
[2/3] (150, 4)
[3/3] (150, 4)
>>> a.lshape
[0/3] (38, 4)
[1/3] (38, 4)
[2/3] (37, 4)
[3/3] (37, 4)
>>> b = ht.load_csv("data.csv", header_lines=10)
>>> b.shape
[0/3] (140, 4)
[1/3] (140, 4)
[2/3] (140, 4)
[3/3] (140, 4)
>>> b.lshape
[0/3] (35, 4)
[1/3] (35, 4)
[2/3] (35, 4)
[3/3] (35, 4)
save_csv(data: heat.core.dndarray.DNDarray, path: str, header_lines: Iterable[str] = None, sep: str = ',', decimals: int = -1, encoding: str = 'utf-8', comm: heat.core.communication.Communication | None = None, truncate: bool = True)[source]

Saves data to CSV files. Only 2D data, all split axes.

Parameters:
  • data (DNDarray) – The DNDarray to be saved to CSV.

  • path (str) – The path as a string.

  • header_lines (Iterable[str]) – Optional iterable of str to prepend at the beginning of the file. No pound sign or any other comment marker will be inserted.

  • sep (str) – The separator character used in this CSV.

  • decimals (int) – Number of digits after decimal point.

  • encoding (str) – The encoding to be used in this CSV.

  • comm (Optional[Communication]) – An optional object of type Communication to be used.

  • truncate (bool) – Whether to truncate an existing file before writing, i.e. fully overwrite it. The sane default is True. Setting it to False will not shorten files if needed and thus may leave garbage at the end of existing files.

save(data: heat.core.dndarray.DNDarray, path: str, *args: List[object] | None, **kwargs: Dict[str, object] | None)[source]

Attempts to save data from a DNDarray to disk. An auto-detection based on the file format extension is performed.

Parameters:
  • data (DNDarray) – The array holding the data to be stored

  • path (str) – Path to the file to be stored.

  • args (list, optional) – Additional options passed to the particular functions.

  • kwargs (dict, optional) – Additional options passed to the particular functions.

Raises:
  • ValueError – If the file extension is not understood or known.

  • RuntimeError – If the optional dependency for a file extension is not available.

Examples

>>> x = ht.arange(100, split=0)
>>> ht.save(x, "data.h5", "DATA", mode="a")
load_npy_from_path(path: str, dtype: heat.core.types.datatype = types.int32, split: int = 0, device: str | None = None, comm: heat.core.communication.Communication | None = None) heat.core.dndarray.DNDarray[source]

Loads multiple .npy files into one DNDarray which will be returned. The data will be concatenated along the split axis provided as input.

Parameters:
  • path (str) – Path to the directory in which .npy-files are located.

  • dtype (datatype, optional) – Data type of the resulting array.

  • split (int) – Along which axis the loaded arrays should be concatenated.

  • device (str, optional) – The device id on which to place the data, defaults to globally set default device.

  • comm (Communication, optional) – The communication to use for the data distribution, default is ‘heat.MPI_WORLD’

supports_zarr() bool[source]

Returns True if zarr is installed, False otherwise.