:mod:`heat.nn.data_parallel` ============================ .. py:module:: heat.nn.data_parallel .. autoapi-nested-parse:: General data parallel neural network classes. Module Contents --------------- .. py:class:: DataParallel(module: torch.nn.Module, comm: heat.core.communication.MPICommunication, optimizer: Union[heat.optim.DataParallelOptimizer, List, Tuple], blocking_parameter_updates: bool = False) Bases: :class:`torch.nn.Module` Implements data parallelism across multiple processes. This means that the same model will be run locally on each process. Creation of the model is similar to PyTorch, the only changes are using HeAT layers (ht.nn.layer) in the initialization of the network/optimizer. If there is not a HeAT layer, it will fall back to the PyTorch layer of the same name. The same is true for the optimizer. It's possible to use more than one optimizer, but communication during parameter updates is limited to blocking. The same limitation takes effect when passing an optimizer that does not deal exactly with the set of model's parameters. For the given model both the ``__init__()`` and ``forward()`` functions must be defined in the class defining the network. An example of this is shown in `examples/mnist.py `_. It is highly recommended that a HeAT DataLoader is used, see :func:`ht.utils.data.DataLoader `. The default communications scheme for this is blocking. The blocking scheme will average the model parameters during the backwards step, synchronizing them before the next model iteration. Usage of more than one optimizer forces MPI communication to be parameter updates to use blocking communications. :ivar module: The local module :vartype module: torch.nn.Module :ivar comm: Communicator to use :vartype comm: MPICommunication :ivar optimizer: Individual or sequence of DataParallelOptimizers to be used :vartype optimizer: heat.DataParallelOptimizer, List, Tuple :ivar blocking_parameter_updates: Flag indicating the usage of blocking communications for parameter updates Default: non-blocking updates (``False``) :vartype blocking_parameter_updates: bool, optional .. attribute:: module .. attribute:: comm .. attribute:: blocking_parameter_updates :annotation: = False .. attribute:: _dp_optimizers :annotation: = [] .. attribute:: _layer_wait_handles .. attribute:: _fwd_hook_handles :annotation: = [] .. attribute:: _active_layers .. attribute:: _param_slices .. attribute:: _param_indices .. role:: raw-html(raw) :format: html .. method:: __setattr__(name: str, value: Union[torch.nn.Module, torch.Tensor, Any]) -> None Overwrite the current torch.nn.Module.__setattr__ so that it auto-detects the end of epoch's training phase and finalize wait handles (only relevant for non-blocking) .. method:: forward(*inputs: tuple, **kwargs: dict) -> torch.Tensor Do the forward step for the network, receive the parameters from the last .. method:: _iparam_update(param_slice: slice = None, layer_names: List[str] = None) -> None Update parameters asynchronously via wait handles. :param param_slice: Slice object for creating a view onto optimizer's params list.\n By default, the whole params list is used, (``None``) :type param_slice: slice, optional :param layer_names: List of layer names which parameters will be updated, must match param_slice.\n By default, all layers are updated (``None``) :type layer_names: list(str), optional .. method:: _blocking_hook(grad_loc: torch.Tensor) -> torch.Tensor Add a blocking hook to the PyTorch DAG for all of the backwards calls. :param grad_loc: The local gradient :type grad_loc: torch.Tensor .. rubric:: References [1] (cf. https://pytorch.org/docs/stable/tensors.html#torch.Tensor.register_hook). .. method:: _nonblocking_hook(layer_name: str, param_name: str) -> Callable Add a nonblocking hook to send and receive the averaged parameters after the backwards step :param layer_name: Name of the layer :type layer_name: str :param param_name: Name of the parameter :type param_name: str .. method:: _forward_hook(layer_name: str) -> Callable Add a forward hook to update parameters during the forward step. This will return a hook with can be added using the ``submodule.register_forward_pre_hook`` command. :param layer_name: Name of the layer :type layer_name: str .. method:: _reset_parameters(module: torch.nn.Module) -> None Reset parameters of given torch submodule. Only works for basic module types containing ``reset_parameters`` function. :param module: Submodule whose parameters are to be reset :type module: torch.nn.Module .. py:class:: DataParallelMultiGPU(module: torch.nn.Module, optimizer: heat.optim.DASO, comm: heat.core.communication.MPICommunication = MPI_WORLD) Bases: :class:`torch.nn.Module` Creates data parallel networks local to each node using PyTorch's distributed class. This does NOT do any global synchronizations. To make optimal use of this structure, use :func:`ht.optim.DASO `. .. rubric:: Notes The PyTorch distributed process group must already exist before this class is initialized. :param module: an implemented PyTorch model :type module: torch.nn.Module :param optimizer: A DASO optimizer. Other optimizers are not yet implemented. The DASO optimizer should be defined prior to calling this class. :type optimizer: optim.DASO :param comm: A global communicator. Default: :func:`MPICommunication ` :type comm: MPICommunication, optional .. attribute:: module .. attribute:: comm .. role:: raw-html(raw) :format: html .. method:: forward(*inputs: Tuple, **kwargs: Dict) -> torch.Tensor Calls the forward method for the torch model .. method:: _reset_parameters(module: torch.nn.Module) -> None Reset parameters of given torch submodule. Only works for basic module types containing ``reset_parameters`` function. :param module: Submodule whose parameters are to be reset :type module: torch.nn.Module