MR and HIP¶

2015 Wan and Freitas extended their previous paper with three further feature selection methods, from which two are implemented in this library (the third one shows high similarity to HNB-s) []. The Hierarchical Information-Preserving (HIP) algorithm eliminates redundancy by selecting the deepest positive or the highest negative feature. So for each instance and each feature $f_i$ , if the value of all $f_i$ equals 1,all ancestors $f_a$ are removed while if it 0 all descendants $f_d$ will be removed. In contrast to their paper of 2013 this algorithm does not include the relevance score. They close this gap with Most Relevant (MR) algorithm, which relies on the previously defined relevance measure. MR selects the most relevant features on each path of the sets with a 1 as value and the sets with a 0. After processing each path only the most relevant terms are left and any redundancy is eliminated.

As processing each path is hardly feasible for big hierarchical structures, the implementation traverses the nodes in topological order saving the most relevant nodes (mr’s) seen up to this point $f_i$ . In the case of $f_i=1$ , for each predecessor $f_{i+1}$ and each saved mr, if this has a higher relevance than $f_i$ , the status of $f_i$ is set to removal and the mr is saved to $f_i$ . If there is one mr, which relevance is lower, meaning on this path ending in $f_i$ , $f_i$ has the highest value, $f_i$ itself is saved in the mr’s of its own node. For $f_i=0$ the process is similar performed in the reverse topological order. More information can be found in the commented code itself. In contrast to the implementation of HIP is self-explaining and rather straight-forward.