:mod:`heat.decomposition` ========================= .. py:module:: heat.decomposition .. autoapi-nested-parse:: Add the decomposition functions to the ht.decomposition namespace Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 dmd/index.rst pca/index.rst Package Contents ---------------- .. function:: _isvd(new_data: heat.core.dndarray.DNDarray, U_old: heat.core.dndarray.DNDarray, S_old: heat.core.dndarray.DNDarray, Vt_old: Optional[heat.core.dndarray.DNDarray] = None, maxrank: Optional[int] = None, old_matrix_size: Optional[int] = None, old_rowwise_mean: Optional[heat.core.dndarray.DNDarray] = None) -> Union[Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray], Tuple[heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray, heat.core.dndarray.DNDarray]] Helper function for iSVD and iPCA; follows roughly the "incremental PCA with mean update", Fig.1 in: David A. Ross, Jongwoo Lim, Ruei-Sung Lin, Ming-Hsuan Yang. Incremental Learning for Robust Visual Tracking. IJCV, 2008. Either incremental SVD / PCA or incremental SVD / PCA with mean subtraction is performed. :param new_data: new data as DNDarray :type new_data: DNDarray :param U_old: "old" SVD-factors if no Vt_old is provided, only U and S are computed (PCA) :type U_old: DNDarrays :param S_old: "old" SVD-factors if no Vt_old is provided, only U and S are computed (PCA) :type S_old: DNDarrays :param Vt_old: "old" SVD-factors if no Vt_old is provided, only U and S are computed (PCA) :type Vt_old: DNDarrays :param maxrank: rank to which new SVD should be truncated :type maxrank: int, optional :param old_matrix_size: size of the old matrix; this does not need to be identical to Vt_old.shape[1] as "old" SVD might have been truncated :type old_matrix_size: int, optional :param old_rowwise_mean: row-wise mean of the old matrix; if not provided, no mean subtraction is performed :type old_rowwise_mean: int, optional .. py:class:: PCA(n_components: Optional[Union[int, float]] = None, copy: bool = True, whiten: bool = False, svd_solver: str = 'hierarchical', tol: Optional[float] = None, iterated_power: Union[str, int] = 0, n_oversamples: int = 10, power_iteration_normalizer: str = 'qr', random_state: Optional[int] = None) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Pricipal Component Analysis (PCA). Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD. :param n_components: Number of components to keep. If n_components is not set all components are kept. If n_components is an integer, it specifies the number of components to keep. If n_components is a float between 0 and 1, it specifies the fraction of variance explained by the components to keep. :type n_components: int, float, None, default=None :param copy: In-place operations are not yet supported. Please set copy=True. :type copy: bool, default=True :param whiten: Not yet supported. :type whiten: bool, default=False :param svd_solver: 'full' : Full SVD is performed. In general, this is more accurate, but also slower. So far, this is only supported for tall-skinny or short-fat data. 'hierarchical' : Hierarchical SVD, i.e., an algorithm for computing an approximate, truncated SVD, is performed. Only available for data split along axis no. 0. 'randomized' : Randomized SVD is performed. :type svd_solver: {'full', 'hierarchical'}, default='hierarchical' :param tol: Not yet necessary as iterative methods for PCA are not yet implemented. :type tol: float, default=None :param iterated_power: if svd_solver='randomized', this parameter is the number of iterations for the power method. Choosing `iterated_power > 0` can lead to better results in the case of slowly decaying singular values but is computationally more expensive. :type iterated_power: int, default=0 :param n_oversamples: if svd_solver='randomized', this parameter is the number of additional random vectors to sample the range of X so that the range of X can be approximated more accurately. :type n_oversamples: int, default=10 :param power_iteration_normalizer: if svd_solver='randomized', this parameter is the normalization form of the iterated power method. So far, only QR is supported. :type power_iteration_normalizer: {'qr'}, default='qr' :param random_state: if svd_solver='randomized', this parameter allows to set the seed for the random number generator. :type random_state: int, default=None :ivar components_: Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by explained_variance_. :vartype components_: DNDarray of shape (n_components, n_features) :ivar explained_variance_: The amount of variance explained by each of the selected components. Not supported by svd_solver='hierarchical' and svd_solver='randomized'. :vartype explained_variance_: DNDarray of shape (n_components,) :ivar explained_variance_ratio_: Percentage of variance explained by each of the selected components. Not supported by svd_solver='hierarchical' and svd_solver='randomized'. :vartype explained_variance_ratio_: DNDarray of shape (n_components,) :ivar total_explained_variance_ratio_: The percentage of total variance explained by the selected components together. For svd_solver='hierarchical', an lower estimate for this quantity is provided; see :func:`ht.linalg.hsvd_rtol` and :func:`ht.linalg.hsvd_rank` for details. Not supported by svd_solver='randomized'. :vartype total_explained_variance_ratio_: float :ivar singular_values_: The singular values corresponding to each of the selected components. Not supported by svd_solver='hierarchical' and svd_solver='randomized'. :vartype singular_values_: DNDarray of shape (n_components,) :ivar mean_: Per-feature empirical mean, estimated from the training set. :vartype mean_: DNDarray of shape (n_features,) :ivar n_components_: The estimated number of components. :vartype n_components_: int :ivar n_samples_: Number of samples in the training data. :vartype n_samples_: int :ivar noise_variance_: not yet implemented :vartype noise_variance_: float .. rubric:: Notes Hierarchical SVD (`svd_solver = "hierarchical"`) computes an approximate, truncated SVD. Thus, the results are not exact, in general, unless `n_components` chosen is larger than the actual rank (=matrix rank) of the underlying data; see :func:`ht.linalg.hsvd_rank` and :func:`ht.linalg.hsvd_rtol` for details. Randomized SVD (`svd_solver = "randomized"`) is a stochastic algorithm that computes an approximate, truncated SVD. .. attribute:: n_components :annotation: = None .. attribute:: copy :annotation: = True .. attribute:: whiten :annotation: = False .. attribute:: svd_solver :annotation: = 'hierarchical' .. attribute:: tol :annotation: = None .. attribute:: iterated_power :annotation: = 0 .. attribute:: n_oversamples :annotation: = 10 .. attribute:: power_iteration_normalizer :annotation: = 'qr' .. attribute:: random_state :annotation: = None .. attribute:: components_ :annotation: = None .. attribute:: explained_variance_ :annotation: = None .. attribute:: explained_variance_ratio_ :annotation: = None .. attribute:: total_explained_variance_ratio_ :annotation: = None .. attribute:: singular_values_ :annotation: = None .. attribute:: mean_ :annotation: = None .. attribute:: n_components_ :annotation: = None .. attribute:: n_samples_ :annotation: = None .. attribute:: noise_variance_ :annotation: = None .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray, y=None) -> Self Fit the PCA model with data X. :param X: Data set of which PCA has to be computed. :type X: DNDarray of shape (n_samples, n_features) :param y: Not used, present for API consistency by convention. :type y: Ignored .. method:: transform(X: heat.DNDarray) -> heat.DNDarray Apply dimensionality based on PCA to X. :param X: Data set to be transformed. :type X: DNDarray of shape (n_samples, n_features) .. method:: inverse_transform(X: heat.DNDarray) -> heat.DNDarray Transform data back to its original space. :param X: Data set to be transformed back. :type X: DNDarray of shape (n_samples, n_components) .. py:class:: IncrementalPCA(n_components: Optional[int] = None, copy: bool = True, whiten: bool = False, batch_size: Optional[int] = None) Bases: :class:`heat.TransformMixin`, :class:`heat.BaseEstimator` Incremental Principal Component Analysis (PCA). This class allows for incremental updates of the PCA model. This is especially useful for large data sets that do not fit into memory. An example how to apply this class is given in, e.g., `benchmarks/cb/decomposition.py`. :param n_components: Number of components to keep. If `n_components` is not set all components are kept (default). :type n_components: int, optional :param copy: In-place operations are not yet supported. Please set `copy=True`. :type copy: bool, default=True :param whiten: Not yet supported. :type whiten: bool, default=False :param batch_size: Currently not needed and only added for API consistency and possible future extensions. :type batch_size: int, optional :ivar components_: Principal axes in feature space, representing the directions of maximum variance in the data. The components are sorted by `explained_variance_. :vartype components_: DNDarray of shape (n_components, n_features) :ivar singular_values_: The singular values corresponding to each of the selected components. :vartype singular_values_: DNDarray of shape (n_components,) :ivar mean_: Per-feature empirical mean, estimated from the training set. :vartype mean_: DNDarray of shape (n_features,) :ivar n_components_: The estimated number of components. :vartype n_components_: int :ivar n_samples_seen_: Number of samples processed so far. :vartype n_samples_seen_: int .. attribute:: whiten :annotation: = False .. attribute:: n_components :annotation: = None .. attribute:: batch_size :annotation: = None .. attribute:: components_ :annotation: = None .. attribute:: singular_values_ :annotation: = None .. attribute:: mean_ :annotation: = None .. attribute:: n_components_ :annotation: = None .. attribute:: batch_size_ :annotation: = None .. attribute:: n_samples_seen_ :annotation: = 0 .. role:: raw-html(raw) :format: html .. method:: fit(path: str, chunk_size: int, dataset: str = 'DATA') -> Self Fit the IncrementalPCA model using data loaded in chunks from a HDF5 file. This method processes data incrementally, loading chunks of data from a file and updating the PCA model iteratively. It is particularly useful for large datasets that cannot fit into memory. :param path: Path to the file containing the dataset. The file must be in HDF5 format. :type path: str :param chunk_size: Number of rows to load and process in each chunk. Must be smaller than or equal to the total number of rows in the dataset. :type chunk_size: int :param dataset: Name of the dataset within the file to load. :type dataset: str, default="DATA" :returns: The fitted IncrementalPCA instance. :rtype: Self :raises ValueError: If the file format is not HDF5. If `chunk_size` is larger than the number of rows in the dataset. If the number of columns is smaller than the number of processes. .. method:: partial_fit(X: heat.DNDarray, y=None) One single step of incrementally building up the PCA. Input X is the current batch of data that needs to be added to the existing PCA. .. method:: transform(X: heat.DNDarray) -> heat.DNDarray Apply dimensionality based on PCA to X. :param X: Data set to be transformed. :type X: DNDarray of shape (n_samples, n_features) .. method:: inverse_transform(X: heat.DNDarray) -> heat.DNDarray Transform data back to its original space. :param X: Data set to be transformed back. :type X: DNDarray of shape (n_samples, n_components) .. function:: _torch_matrix_diag(diagonal) .. py:class:: DMD(svd_solver: Optional[str] = 'full', svd_rank: Optional[int] = None, svd_tol: Optional[float] = None) Bases: :class:`heat.RegressionMixin`, :class:`heat.BaseEstimator` Dynamic Mode Decomposition (DMD), plain vanilla version with SVD-based implementation. The time series of which DMD shall be computed must be provided as a 2-D DNDarray of shape (n_features, n_timesteps). Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis. :param svd_solver: Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'. :type svd_solver: str, optional :param svd_rank: The rank to which SVD shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation. For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified. For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified. For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to. :type svd_rank: int, optional :param svd_tol: The tolerance to which SVD shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation. For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified. For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None. :type svd_tol: float, optional :ivar svd_solver: The algorithm used for the singular value decomposition (SVD). :vartype svd_solver: str :ivar svd_rank: The rank to which SVD shall be truncated. :vartype svd_rank: int :ivar svd_tol: The tolerance to which SVD shall be truncated. :vartype svd_tol: float :ivar rom_basis_: The reduced order model basis. :vartype rom_basis_: DNDarray :ivar rom_transfer_matrix_: The reduced order model transfer matrix. :vartype rom_transfer_matrix_: DNDarray :ivar rom_eigenvalues_: The reduced order model eigenvalues. :vartype rom_eigenvalues_: DNDarray :ivar rom_eigenmodes_: The reduced order model eigenmodes ("DMD modes") :vartype rom_eigenmodes_: DNDarray .. rubric:: Notes We follow the "exact DMD" method as described in [1], Sect. 2.2. Please note that "rank" in the context of SVD always refers to the number of singular values/vectors to compute (i.e., "rank" refers to the mathematical rank of a matrix). This is completely different from the notion of "(MPI-)rank", i.e., the ID given to a process, in a parallel MPI-application. .. rubric:: References [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016. .. attribute:: svd_solver :annotation: = 'full' .. attribute:: svd_rank :annotation: = None .. attribute:: svd_tol :annotation: = None .. attribute:: rom_basis_ :annotation: = None .. attribute:: rom_transfer_matrix_ :annotation: = None .. attribute:: rom_eigenvalues_ :annotation: = None .. attribute:: rom_eigenmodes_ :annotation: = None .. attribute:: dmdmodes_ :annotation: = None .. attribute:: n_modes_ :annotation: = None .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray) -> Self Fits the DMD model to the given data. :param X: The time series data to fit the DMD model to. Must be of shape (n_features, n_timesteps). :type X: DNDarray .. method:: predict_next(X: heat.DNDarray, n_steps: int = 1) -> heat.DNDarray Predicts and returns the state(s) after n_steps-many time steps for given a current state(s). :param X: The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, i.e., X can be of shape (n_features,) or (n_features, n_current_states). The output will have the same shape as the input. :type X: DNDarray :param n_steps: The number of steps to predict into the future. Default is 1, i.e., the next time step is predicted. :type n_steps: int, optional .. method:: predict(X: heat.DNDarray, steps: Union[int, List[int]]) -> heat.DNDarray Predics and returns future states given a current state(s) and returns them all as an array of size (n_steps, n_features). This function avoids a time-stepping loop (i.e., repeated calls to 'predict_next') and computes the future states in one go. To do so, the number of future times to predict must be of moderate size as an array of shape (n_steps, self.n_modes_, self.n_modes_) must fit into memory. Moreover, it must be ensured that: - the array of initial states is not split or split along the batch axis (axis 1) and the feature axis is small (i.e., self.rom_basis_ is not split) :param X: The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, i.e., X can be of shape (n_features,) or (n_current_states, n_features). :type X: DNDarray :param steps: if int: predictions at time step 0, 1, ..., steps-1 are computed if List[int]: predictions at time steps given in the list are computed :type steps: int or List[int] .. method:: __str__() .. py:class:: DMDc(svd_solver: Optional[str] = 'full', svd_rank: Optional[int] = None, svd_tol: Optional[float] = None) Bases: :class:`heat.RegressionMixin`, :class:`heat.BaseEstimator` Dynamic Mode Decomposition with Control (DMDc), plain vanilla version with SVD-based implementation. The time series of states and controls must be provided as 2-D DNDarrays of shapes (n_state_features, n_timesteps) and (n_control_features, n_timesteps), respectively. Please, note that this deviates from Heat's convention that data sets are handeled as 2-D arrays with the feature axis being the second axis. :param svd_solver: Specifies the algorithm to use for the singular value decomposition (SVD). Options are 'full' (default), 'hierarchical', and 'randomized'. :type svd_solver: str, optional :param svd_rank: The rank to which SVD of the states shall be truncated. For `'full'` SVD, `svd_rank = None` together with `svd_tol = None` (default) will result in no truncation. For `svd_solver='full'`, at most one of `svd_rank` or `svd_tol` may be specified. For `svd_solver='hierarchical'`, either `svd_rank` (rank to truncate to) or `svd_tol` (tolerance to truncate to) must be specified. For `svd_solver='randomized'`, `svd_rank` must be specified and determines the the rank to truncate to. :type svd_rank: int, optional :param svd_tol: The tolerance to which SVD of the states shall be truncated. For `'full'` SVD, `svd_tol = None` together with `svd_rank = None` (default) will result in no truncation. For `svd_solver='hierarchical'`, either `svd_tol` (accuracy to truncate to) or `svd_rank` (rank to truncate to) must be specified. For `svd_solver='randomized'`, `svd_tol` is meaningless and must be None. :type svd_tol: float, optional :ivar svd_solver: The algorithm used for the singular value decomposition (SVD). :vartype svd_solver: str :ivar svd_rank: The rank to which SVD shall be truncated. :vartype svd_rank: int :ivar svd_tol: The tolerance to which SVD shall be truncated. :vartype svd_tol: float :ivar rom_basis_: The reduced order model basis. :vartype rom_basis_: DNDarray :ivar rom_transfer_matrix_: The reduced order model transfer matrix. :vartype rom_transfer_matrix_: DNDarray :ivar rom_control_matrix_: The reduced order model control matrix. :vartype rom_control_matrix_: DNDarray :ivar rom_eigenvalues_: The reduced order model eigenvalues. :vartype rom_eigenvalues_: DNDarray :ivar rom_eigenmodes_: The reduced order model eigenmodes ("DMD modes") :vartype rom_eigenmodes_: DNDarray .. rubric:: Notes We follow the approach described in [1], Sects. 3.3 and 3.4. In the case that svd_rank is prescribed, the rank of the SVD of the full system matrix is set to svd_rank + n_control_features; cf. https://github.com/dynamicslab/pykoopman for the same approach. Please note that "rank" in the context of SVD always refers to the number of singular values/vectors to compute (i.e., "rank" refers to the mathematical rank of a matrix). This is completely different from the notion of "(MPI-)rank", i.e., the ID given to a process, in a parallel MPI-application. .. rubric:: References [1] J. L. Proctor, S. L. Brunton, and J. N. Kutz, "Dynamic Mode Decomposition with Control," SIAM Journal on Applied Dynamical Systems, vol. 15, no. 1, pp. 142-161, 2016. .. attribute:: svd_solver :annotation: = 'full' .. attribute:: svd_rank :annotation: = None .. attribute:: svd_tol :annotation: = None .. attribute:: rom_basis_ :annotation: = None .. attribute:: rom_transfer_matrix_ :annotation: = None .. attribute:: rom_control_matrix_ :annotation: = None .. attribute:: rom_eigenvalues_ :annotation: = None .. attribute:: rom_eigenmodes_ :annotation: = None .. attribute:: dmdmodes_ :annotation: = None .. attribute:: n_modes_ :annotation: = None .. attribute:: n_modes_system_ :annotation: = None .. role:: raw-html(raw) :format: html .. method:: fit(X: heat.DNDarray, C: heat.DNDarray) -> Self Fits the DMD model to the given data. :param X: The time series data of states to fit the DMD model to. Must be of shape (n_state_features, n_timesteps). :type X: DNDarray :param C: The time series of control inputs to fit the DMD model to. Must be of shape (n_control_features, n_timesteps). :type C: DNDarray .. method:: predict(X: heat.DNDarray, C: heat.DNDarray) -> heat.DNDarray Predicts and returns future states given the current state(s) ``X`` and control trajectory ``C``. :param X: The current state(s) for the prediction. Must have the same number of features as the training data, but can be batched for multiple current states, i.e., X can be of shape (n_state_features,) or (n_batch, n_state_features). :type X: DNDarray :param C: The control trajectory for the prediction. Must have the same number of control features as the training data, i.e., C must be of shape (n_control_features,) --for a single time step-- or (n_control_features, n_timesteps). :type C: DNDarray .. method:: __str__()