hfs.HierarchicalPreprocessor¶
- class hfs.HierarchicalPreprocessor(hierarchy: Optional[ndarray] = None)[source]¶
Estimator for preprocessing hierarchical data for feature selection.
The hierarchical feature selectors expect the input data and the hierarchy graph to conform to certain pre-conditions. This preprocessor prepares the data and graph for the feature selection.
- __init__(hierarchy: Optional[ndarray] = None)[source]¶
Initializes a HierarchicalPreprocessor.
- Parameters
- hierarchynp.ndarray
The hierarchy graph as an adjacency matrix.
- fit(X, y=None, columns=None)[source]¶
Sets the parameters for data transformation and prepares hierarchy.
Following conditions need to be fulfilled for the feature selection algorithms:
every node in the hierarchy graph should be able to be mapped to one column in the dataset and every column in the dataset should have a corresponding node in the hierarchy.
for binary data, if a feature has the value 1, all of its descendents in the hierarchy should also have the value 1.
To achieve these conditions missing columns are added to the hierarchy and unnecessary nodes are removed. The self._columns parameter is adjusted so that it can be used to add additional columns to the dataset in the transform method. After fitting the dataset can be transformed with the transform method and the updated hierarchy and columns mapping can be retrieved with get_hierarchy and get_columns.
- Parameters
- X{array-like, sparse matrix}, shape (n_samples, n_features)
The training input samples.
- yNone
There is no need of a target in a transformer, yet the pipeline API requires this parameter. X : {array-like, sparse matrix}, shape (n_samples, n_features) The training input samples.
- columns: list or None, length n_features
The mapping from the hierarchy graph’s nodes to the columns in X. A list of ints. If this parameter is None the columns in X and the corresponding nodes in the hierarchy are expected to be in the same order.
- Returns
- selfobject
Returns self.
- get_hierarchy()[source]¶
Get the transformed hierarchy graph.
- Raises
- RuntimeError
If the method is called before fit has been called. In this case the hierarchy graph has not been updated yet.
- transform(X)[source]¶
Transforms dataset to fulfill conditions for feature selection.
After transformation, if a feature is 1, all of its descendents are 1. Missing columns are added to the dataset.
- Parameters
- X{array-like, sparse-matrix}, shape (n_samples, n_features)
The input samples.
- Returns
- X_array of shape (n_samples, n_selected_features)
The transformed dataset.