Several applications require testing a large number of statistical hypotheses with hierarchical tree structure, where the child hypotheses are more specific than their parent hypotheses. In such cases, rejection of a child hypothesis makes the rejection of all its ancestor hypotheses redundant, therefore it is natural to focus on the highest resolution discoveries, i.e. the outer nodes, defined as the discovered nodes that are not ancestors of other discoveries.
We propose a hierarchical method for testing trees of hypotheses with outer nodes false discovery rate (FDR) control, which exploits the logical relationships between the hypotheses in the tree. Our theoretical and numerical results address separately testing trees of hypotheses which are induced by hierarchical clustering of explanatory variables in a linear regression model, where the clustering is based on the correlations between the variables. In this setting, the method identifies the smallest clusters with evidence for containing important variables, while controlling for the expected proportion of discovered clusters with no important variables.
Our method is compared to several competitors in a simulation study, and is shown to be more powerful in several settings. We illustrate the application of the method for hierarchical variable selection on real data, and show that in some cases it leads to more specific discoveries than its competitors.