heat.io

Enables parallel I/O with data on disk.

Module Contents

supports_hdf5() bool

Returns True if Heat supports reading from and writing to HDF5 files, False otherwise.

supports_netcdf() bool

Returns True if Heat supports reading from and writing to netCDF4 files, False otherwise.

load(path: str, *args: List[object] | None, **kwargs: Dict[str, object] | None) heat.core.dndarray.DNDarray

Attempts to load data from a file stored on disk. Attempts to auto-detect the file format by determining the extension. Supports at least CSV files, HDF5 and netCDF4 are additionally possible if the corresponding libraries are installed.

Parameters:
  • path (str) – Path to the file to be read.

  • args (list, optional) – Additional options passed to the particular functions.

  • kwargs (dict, optional) – Additional options passed to the particular functions.

Raises:
  • ValueError – If the file extension is not understood or known.

  • RuntimeError – If the optional dependency for a file extension is not available.

Examples

>>> ht.load('data.h5', dataset='DATA')
DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
>>> ht.load('data.nc', variable='DATA')
DNDarray([ 1.0000,  2.7183,  7.3891, 20.0855, 54.5981], dtype=ht.float32, device=cpu:0, split=None)
load_csv(path: str, header_lines: int = 0, sep: str = ',', dtype: heat.core.types.datatype = types.float32, encoding: str = 'utf-8', split: int | None = None, device: str | None = None, comm: heat.core.communication.Communication | None = None) heat.core.dndarray.DNDarray

Loads data from a CSV file. The data will be distributed along the axis 0.

Parameters:
  • path (str) – Path to the CSV file to be read.

  • header_lines (int, optional) – The number of columns at the beginning of the file that should not be considered as data.

  • sep (str, optional) – The single char or str that separates the values in each row.

  • dtype (datatype, optional) – Data type of the resulting array.

  • encoding (str, optional) – The type of encoding which will be used to interpret the lines of the csv file as strings.

  • split (int or None : optional) – Along which axis the resulting array should be split. Default is None which means each node will have the full array.

  • device (str, optional) – The device id on which to place the data, defaults to globally set default device.

  • comm (Communication, optional) – The communication to use for the data distribution, defaults to global default

Raises:

TypeError – If any of the input parameters are not of correct type.

Examples

>>> import heat as ht
>>> a = ht.load_csv('data.csv')
>>> a.shape
[0/3] (150, 4)
[1/3] (150, 4)
[2/3] (150, 4)
[3/3] (150, 4)
>>> a.lshape
[0/3] (38, 4)
[1/3] (38, 4)
[2/3] (37, 4)
[3/3] (37, 4)
>>> b = ht.load_csv('data.csv', header_lines=10)
>>> b.shape
[0/3] (140, 4)
[1/3] (140, 4)
[2/3] (140, 4)
[3/3] (140, 4)
>>> b.lshape
[0/3] (35, 4)
[1/3] (35, 4)
[2/3] (35, 4)
[3/3] (35, 4)
save_csv(data: heat.core.dndarray.DNDarray, path: str, header_lines: Iterable[str] = None, sep: str = ',', decimals: int = -1, encoding: str = 'utf-8', comm: heat.core.communication.Communication | None = None, truncate: bool = True)

Saves data to CSV files. Only 2D data, all split axes.

Parameters:
  • data (DNDarray) – The DNDarray to be saved to CSV.

  • path (str) – The path as a string.

  • header_lines (Iterable[str]) – Optional iterable of str to prepend at the beginning of the file. No pound sign or any other comment marker will be inserted.

  • sep (str) – The separator character used in this CSV.

  • decimals (int) – Number of digits after decimal point.

  • encoding (str) – The encoding to be used in this CSV.

  • comm (Optional[Communication]) – An optional object of type Communication to be used.

  • truncate (bool) – Whether to truncate an existing file before writing, i.e. fully overwrite it. The sane default is True. Setting it to False will not shorten files if needed and thus may leave garbage at the end of existing files.

save(data: heat.core.dndarray.DNDarray, path: str, *args: List[object] | None, **kwargs: Dict[str, object] | None)

Attempts to save data from a DNDarray to disk. An auto-detection based on the file format extension is performed.

Parameters:
  • data (DNDarray) – The array holding the data to be stored

  • path (str) – Path to the file to be stored.

  • args (list, optional) – Additional options passed to the particular functions.

  • kwargs (dict, optional) – Additional options passed to the particular functions.

Raises:
  • ValueError – If the file extension is not understood or known.

  • RuntimeError – If the optional dependency for a file extension is not available.

Examples

>>> x = ht.arange(100, split=0)
>>> ht.save(x, 'data.h5', 'DATA', mode='a')