Variable importance methods

Alex Zwanenburg

2026-04-22

Variable importance methods score the importance of each feature. This information is then used by the learning algorithm to determine which and how many features should be included in the model.

Overview of variable importance methods. a This is a general method where an appropriate specific method will be chosen, or multiple distributions or linking families are tested in an attempt to find the best option. b This method requires hyperparameter optimisation.
method tag binomial multinomial continuous survival
correlation
Pearson’s r pearson × ×
Spearman’s ρ spearman × ×
Kendall’s τ kendall × ×
concordance
concordancea concordance × × × ×
CORElearn
information gain ratio gain_ratio × ×
gini-index gini × ×
minimum description length mdl × ×
ReliefF with exponential weighting of distance ranks relieff_exp_rank × × ×
mutual information
mutual information maximisation mim × × × ×
mutual information features selection mifs × × × ×
minimum redundancy maximum relevance mrmr × × × ×
univariate regression
univariate regression univariate_regression × × × ×
multivariate regression
multivariate regression multivariate_regression × × × ×
lasso regression
generala lasso × × × ×
logistic lasso_binomial ×
multi-logistic lasso_multinomial ×
normal (gaussian) lasso_gaussian ×
cox lasso_cox ×
ridge regression
generala ridge × × × ×
logistic ridge_binomial ×
multi-logistic ridge_multinomial ×
normal (gaussian) ridge_gaussian ×
cox ridge_cox ×
elastic net regression
generala,b elastic_net × × × ×
logisticb elastic_net_binomial ×
multi-logisticb elastic_net_multinomial ×
normal (gaussian)b elastic_net_gaussian ×
coxb elastic_net_cox ×
random forest (RFSRC) variable importance
permutationb random_forest_permutation × × × ×
permutation (unoptimised) random_forest_permutation_default × × × ×
hold-out b random_forest_holdout × × × ×
hold-out (unoptimised) random_forest_holdout_default × × × ×
random forest (ranger) variable importance
permutationb random_forest_ranger_permutation × × × ×
permutation (unoptimised) random_forest_ranger_permutation_default × × × ×
hold-out permutationb random_forest_ranger_holdout_permutation × × × ×
hold-out perm. (unoptimised) random_forest_ranger_holdout_permutation_default × × × ×
impurityb random_forest_ranger_impurity × × × x
impurity (unoptimised) random_forest_ranger_impurity_default × × × x
special methods
no selection none × × × ×
random selection random × × × ×
signature only signature_only × × × ×

Configuration options

Variable importance methods and related options can be set using the variable_importance parameter in the xml file or as function argument.

tag / argument description default
vimp_method The desired variable importance method. Multiple methods may be provided at the same time. This setting has no default and must be provided. – (required)
vimp_method_parameter Several variable importance methods have hyperparameters that can be set and/or optimised. – (optional)
vimp_aggregation_method The aggregation method used to aggregate feature ranks over different bootstraps. borda
vimp_aggregation_rank_threshold Several aggregation methods count features if they have a rank below the threshold, i.e. are among the most important features. If NULL, a dynamic threshold is decided through Otsu-thresholding. 5
parallel_vimp Enables parallel processing for variable importance. Ignored if parallel = FALSE. Overrides parallel = TRUE. TRUE

Providing parameters for variable importance computation

Some of the variable importance methods, notably those based on random forests and (penalised) regression, have parameters that can be configured. These parameters are mentioned under the respective entries in the Overview of variable importance methods section. Moreover, some of these parameters are model parameters. In this case, these parameters are optimised using hyperparameter optimisation, which is described in the learning algorithms and hyperparameter optimisation vignette.

The syntax for such parameters is the same as for hyperparameter optimisation. For the multivariate_regression variable importance method the alpha parameter (which determines feature elimination during forward selection) may be provided as follows using the configuration file:

<vimp_method_parameter>
  <multivariate_regression>
    <alpha>0.05</alpha>
  </multivariate_regression>
</vimp_method_parameter>

Or as a nested list passed as the vimp_method_parameter argument to summon_familiar:

vimp_method_parameter = list("multivariate_regression"=list("alpha"=0.05))

Overview of variable importance methods

The variable importance methods implemented in familiar are described in more detail in this section.

Correlation methods

Correlation methods determine variable importance by assessing the correlation between a feature and the outcome of interest. High (anti-)correlation indicates an important feature, whereas low (anti-)correlation indicates that a feature is not directly related to the outcome. Correlation-based variable importance is determined using the cor function of the stats package that is part of the R core distribution (R Core Team 2024).

Three correlation coefficients can be computed:

To compute correlation of features with survival outcomes, only samples with an event are considered.

Concordance methods

Concordance methods assess how well the ordering of feature values corresponds to the ordering of the outcome. The method internally refers to the kendall method for continuous outcomes. For categorical outcomes, i.e. binomial and multinomial, concordance is assessed using the area under the receiver operating characteristic curve. For survival outcomes, concordance is measured using the concordance_index.

CORElearn methods

Familiar provides an interface to several variable importance methods implemented in the CORElearn package. These methods are the Information Gain Ratio (gain_ratio), the Gini-index (gini), Minimum Description Length (mdl) and ReliefF and rReliefF with exponential distance rank weighting (relieff_exp_rank).

Mutual information-based methods

Mutual information \(I\) is a measure of interdependency between two variables \(x\) and \(y\). In the context of variable importance, \(x\) is a feature vector and \(y\) is the outcome vector.

Computing mutual information requires that probability distributions of \(x\) and \(y\) are known. In practice, we don’t know either one. For categorical \(x\) and \(y\) we can use the sample estimates instead. For continuous or mixed data, the situation is more complex, and is often resolved through binning.

In familiar we therefore use the following two approaches to compute mutual information:

  1. For binomial, multinomial and continuous outcomes mutual information is computed using the praznik package.

  2. For survival outcomes we adapted the approximation proposed by De Jay et al. after Gel’fand and Yaglom (Gel′fand and Yaglom 1959; De Jay et al. 2013) for use with a concordance index: \(I = -0.5 \log(1 - (2 * (C-0.5))^2 + \epsilon)\), with \(C\) the concordance index.

Mutual information maximisation

The mim method is a univariate method that ranks each feature by its mutual information with the outcome.

Mutual information variable importance

Mutual information variable importance (MIFS) finds a feature set that maximises mutual information (Battiti 1994). This is done using forward selection. As in mutual information maximisation, mutual information \(I_{y,j}\) between each feature and the outcome is computed. Starting from a potential pool of all features, the feature with the highest mutual information is selected and removed from the pool.

The rest proceeds iteratively. The mutual information \(I_{s,1j}\) between the previously selected feature and the remaining features is computed. This mutual information is also called redundancy. The feature with the highest mutual information with the outcome and least redundancy (i.e. maximum \(I_{y,j} - I_{s,1j}\)) is selected next, and removed from the pool of remaining features. Then the mutual information \(I_{s,2j}\) between this feature and remaining features is computed, and the feature that maximises \(I_{y,j} - I{s,1j} - I_{s,2j}\) is selected, and so forth.

The iterative process stops if there is no feature \(j\) for which \(I_{y,j} - \sum_{i\in S} I_{s,ij} > 0\), with \(S\) being the subset of selected features, or all features have been exhausted.

To reduce the number of required computations, the implementation in familiar actively eliminates any feature \(j\) for which \(I_{y,j} - \sum_{i\in S} I_{s,ij} \leq 0\) at the earliest instance, as the \(\sum_{i\in S} I_{s,ij}\) term will monotonously increase.

Minimum redundancy maximum relevance

Minimum redundancy maximum relevance (mRMR) is similar to MIFS but differs in the way redundancy is used during optimisation (Peng et al. 2005). Whereas for MIFS the optimisation criterion is \(I_{y,j} - \sum_{i\in S} I_{s,ij}\), in mRMR the optimisation criterion is \(I_{y,j} - \frac{1} {\left| S \right|} \sum_{i\in S} I_{s,ij}\), with \(\left| S \right|\) the number of features already selected.

To limit computational complexity, features for which \(I_{y,j} - \frac{1} {\left| S \right|} \sum_{i\in S} I_{s,ij} \leq 0\), are eliminated.

Univariate and multivariate regression methods

Univariate and multivariate regression use a single feature or set of features as predictors for regression models, respectively. The performance of the regression model is then measured using a metric. Training and testing of regression models are repeated multiple times using bootstraps. For each bootstrap, the in-bag samples are used for training and the out-of-bag samples are using for testing.

This also defines the parameters of both methods, which are shown in the table below.

parameter tag values optimised comments
regression learner learner dependent on outcome no Any generalised linear regression model from the learning algorithms and hyperparameter optimisation vignette can be selected. Default values are glm_logistic for binomial, glm_multinomial for multinomial, glm_gaussian for continuous, and cox for survival outcomes.
performance metric metric dependent on outcome no Any metric from the performance metrics vignette can be selected. Default values are auc_roc for binomial and multinomial, mse for continuous, and concordance_index for survival outcomes
number of bootstraps n_bootstrap \(\mathbb{Z} \in \left[1, \infty\right)\) no The default value is \(10\).
drop-out alpha level alpha \(\mathbb{R} \in \left[0, 1\right]\) no The default value is \(0.33\). Only used in multivariate regression.

Univariate regression

In the univariate regression method, a regression model is built with each feature separately using the in-bag data of the bootstrap. Then this model is evaluated using the metric, expressed as an objective score with standardised optimum (see computing the objective score in the learning algorithms and hyperparameter optimisation vignette).

Then, the median of objective scores on out-of-bag data is used as variable importance for each feature.

Multivariate regression

The procedure described for univariate regression forms the first step in multivariate regression. The rest follows forward selection. The most important feature is assigned to the subset of selected features and removed from the set of available features. Separate regression models are then built with each remaining feature and all the feature(s) in the selected feature subset as predictors. Thus, the subset of selected features iteratively increases in size until no features are remaining or the objective score no longer increases.

To limit mostly redundant computation, features that are unlikely to be selected are actively removed. To do so, the standard deviation of the objective score over the bootstraps is computed for each feature. The fraction alpha of the lowest scoring features is eliminated each iteration.

Lasso, ridge and elastic net regression

Penalised regression is also a form of feature selection, as it selects an ‘optimal’ set of features to create a regression model. As features are usually normalised as part of pre-processing, the magnitude of each coefficient can be interpreted as its importance. All three shrinkage methods are implemented using the glmnet package (Hastie et al. 2009; Simon et al. 2011).

Only elastic net regression has a model hyperparameter that requires optimisation, but other parameters may be set as well, as shown in the table below:

parameter tag values optimised comments
family family gaussian, binomial, poisson, multinomial, cox continuous outcomes For continuous outcomes gaussian and poisson may be tested. The family is not optimised when it is specified, e.g. lasso_gaussian. For other outcomes only one applicable family exists.
elastic net penalty alpha \(\mathbb{R} \in \left[0,1\right]\) elastic net This penalty is fixed for ridge regression (alpha = 0) and lasso (alpha = 1).
optimal lambda lambda_min lambda.1se, lambda.min no Default is lambda.min.
number of CV folds n_folds \(\mathbb{Z} \in \left[3,n\right]\) no Default is \(3\) if \(n<30\), \(\lfloor n/10\rfloor\) if \(30\leq n \leq 200\) and \(20\) if \(n>200\).
normalisation normalise FALSE, TRUE no Default is FALSE, as normalisation is part of pre-processing in familiar.

Random forest-based methods

Several feature selection methods are based on random forests. All these methods require training a random forest. Hence, familiar will train a random forest based on the training data prior to determining variable importance. Random forest learners have a set of hyperparameters that can be optimised prior to training. These parameters, which are slightly different for ranger-based and randomForestSRC-based methods, are shown below.

parameter tag values optimised comments
number of trees n_tree \(\mathbb{Z} \in \left[0,\infty\right)\) yes This parameter is expressed on the \(\log_{2}\) scale, i.e. the actual input value will be \(2^\texttt{n_tree}\) (Oshiro et al. 2012). The default range is \(\left[4, 10\right]\).
subsampling fraction sample_size \(\mathbb{R} \in \left(0, 1.0\right]\) yes Fraction of available data that is used for to create a single tree. The default range is \(\left[2 / m, 1.0\right]\), with \(m\) the number of samples.
number of features at each node m_try \(\mathbb{R} \in \left[0.0, 1.0\right]\) yes Familiar ensures that there is always at least one candidate feature.
node size node_size \(\mathbb{Z} \in \left[1, \infty\right)\) yes Minimum number of unique samples in terminal nodes. The default range is \(\left[5, \lfloor m / 3\rfloor\right]\), with \(m\) the number of samples.
maximum tree depth tree_depth \(\mathbb{Z} \in \left[1,\infty\right)\) yes Maximum depth to which trees are allowed to grow. The default range is \(\left[1, 10\right]\).
number of split points n_split \(\mathbb{Z} \in \left[0, \infty\right)\) no By default, splitting is deterministic and has one split point (\(0\)).
splitting rule (randomForestSRC only) split_rule gini, auc, entropy, mse, quantile.regr, la.quantile.regr, logrank, logrankscore, bs.gradient no Default splitting rules are gini for binomial and multinonial outcomes, mse for continuous outcomes and logrank for survival outcomes.
splitting rule (ranger only) split_rule gini, hellinger, extratrees, beta, variance, logrank, C, maxstat no Default splitting rules are gini for binomial and multinomial outcomes and maxstat for continuous and survival outcomes.
significance split threshold (ranger only) alpha \(\mathbb{R} \in \left(0.0, 1.0\right]\) maxstat Minimum significance level for further splitting. The default range is \(\left[10^{-6}, 1.0\right]\)

The unoptimised methods do not require hyperparameter optimisation, and use default values from the ranger and randomForestSRC.

Permutation importance

The permutation importance method is implemented by random_forest_permutation and random_forest_permutation_default (randomForestSRC package) and random_forest_ranger_permutation and random_forest_ranger_permutation_default (ranger package). In short, this method functions as follows [Ishwaran2007-va]. As usual, each tree in the random forest is constructed using the in-bag samples of a bootstrap of the data. The predictive performance of each model is first measured using the out-of-bag data. Subsequently, the out-of-bag instances for each feature are randomly permuted, and predictive performance is assessed again. The difference between the normal performance and the permuted performance is used as a measure of the variable importance. For important features, this difference is large, whereas for irrelevant features the difference is negligible or even negative.

Holdout permutation importance

This variant on permutation importance (random_forest_ranger_holdout_permutation and random_forest_ranger_holdout_permutation_default) is implemented using ranger::holdoutRF. Instead of using out-of-bag to compute feature importance, two cross-validation folds are used. A random forest is trained on either fold, and variable importance determined on the other (Janitza et al. 2018).

The hold-out variable importance method implemented in the randomForestSRC package (random_forest_holdout and random_forest_holdout_default) is implemented using randomForestSRC::holdout.vimp. It is similar to the previous variant, but does not use cross-validation folds. Instead, out-of-bag prediction errors for models trained with and without each feature are compared.

Minimum depth variable selection

Important features tend to appear closer to the root of trees in random forests. Therefore, the position of each feature within a tree is assessed in minimum depth variable selection (Ishwaran et al. 2010).

Impurity importance

At each node, the data is split into (two) subsets, which connects to two branches. After splitting, each single subset is purer than the parent dataset. As a concrete example, in regression problems the variance of each of the subsets is lower than that of the data prior to splitting. The decrease in variance specifically, or the decrease of impurity generally, is then used to assess feature importance.

familiar uses the impurity_corrected importance measure, which is unbiased to the number of split points of a feature and its distribution (Nembrini et al. 2018).

Special methods

Familiar offers several methods that are special in that they are not feature selection methods in the sense that they determine a variable importance that can be used for establishing feature rankings.

No variable importance

As the name suggests, the none method avoids computing variable importances altogether. All features are passed into a model. Feature order is randomly shuffled prior to building a model to avoid influence of the order of provided features.

Random variable importance

The random method randomly draws features prior to model building. It does not assign a random variable importance to a feature. New features are drawn each time a model is built. All features are available for the draw, but only \(m\) features are drawn. Here \(m\) is the signature size that is usually optimised by hyperparameter optimisation.

Signature only

When configuring familiar, any number of features can be set as a model signature using the signature configuration parameter. However, more features may be added to this signature after computing variable importances. To make sure that only the provided features enter a model, the signature_only method may be used.

Aggregating variable importance

In case of variable importance or modelling in the presence of subsampling (e.g. bootstraps), the ranks of features may need to be aggregated across the different instances (Wald et al. 2012). The rank aggregation methods shown in the table below can be used for this purpose. Several methods require a threshold to indicate the size of the set of most highly ranked features, which can be set by specifying the vimp_aggregation_rank_threshold configuration parameter.

aggregation method tag comments
none none
mean rank mean
median rank median
best rank best
worst rank worst
stability selection stability uses threshold
exponential selection exponential uses threshold
borda ranking borda
enhanced borda ranking enhanced_borda uses threshold
truncated borda ranking truncated_borda uses threshold
enhanced truncated borda ranking enhanced_truncated_borda uses threshold

Notation

Let \(N\) be the number of ranking experiments that should be aggregated. Feature \(i\) for experiment \(j\) of \(N\) then has rank \(r_{ij}\). A lower rank indicates a more important feature. Some features may not receive a score during a ranking experiment, for example for multivariate variable importance methods such as lasso regression, or through use of a threshold \(\tau\). This is designated by \(\delta_{ij}\), which is \(0\) if the feature is absent, and \(1\) if it is present.

In case a threshold is used, \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

Thus, for each experiment \(m_j = \sum^M_{i=1} \delta_{ij}\) features are ranked, out of \(M\) features. \(m_j\) is then also the maximum rank found in experiment \(j\).

Aggregating ranks for each feature results in an aggregate rank score \(s_i\). Features are subsequently ranked according to this method-specific score to arrive at an aggregate feature rank \(r_i\).

No rank aggregation

The none option does not aggregate ranks. Rather, scores are aggregated by computing the average score of a feature over all experiments that contain it. Ranks are then computed from the aggregated scores.

Mean rank aggregation

The mean rank aggregation method ranks features by computing the mean rank of a feature across all experiments that contain it.

\[s_i = \frac{\sum^{N}_{j=1} \delta_{ij} r_{ij}}{\sum^{N}_{j=1} \delta_{ij}}\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in ascending order.

Median rank aggregation

The median rank aggregation method ranks features by computing the median rank of a feature across all experiments that contain it.

\[s_i = \underset{j \in N, \, \delta_{ij}=1}{\textrm{median}}(r_{ij})\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in ascending order.

Best rank aggregation

The best rank aggregation method ranks features by the best rank that a feature has across all experiments that contain it.

\[s_i = \underset{j \in N, \, \delta_{ij}=1}{\textrm{min}} (r_{ij})\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in ascending order.

Worst rank aggregation

The worst rank aggregation method ranks features by the worst rank that a feature has across all instances that contain it.

\[s_i = \underset{j \in N, \, \delta_{ij}=1}{\textrm{max}} (r_{ij})\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in ascending order.

Stability rank aggregation

The stability aggregation method ranks features by their occurrence within the set of highly ranked features across all experiments. Our implementation generalises the method originally proposed by Meinshausen and Bühlmann (Meinshausen and Bühlmann 2010).

This method uses threshold \(\tau\) to designate the highly ranked features. Thus \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

The aggregate rank score is computed as:

\[s_i = \frac{1}{N} \sum^N_{j=1} \delta_{ij}\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order, as more commonly occurring features are considered more important.

Exponential rank aggregation

The exponential aggregation method ranks features by the sum of the negative exponentials of their normalised ranks in instances where they occur within the set of highly ranked features. This method was originally suggested by Haury et al. (Haury et al. 2011).

This method uses threshold \(\tau\) to designate the highly ranked features. Thus \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

\[s_i = \sum^N_{j=1} \delta_{ij} \exp({-r_{ij} / \tau)}\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order.

Borda rank aggregation

Borda rank aggregation ranks a feature by the sum of normalised ranks (the borda score) across all experiments that contain it. In case every experiment contains all features, the result is equivalent to the mean aggregation method (Wald et al. 2012).

\[s_i = \sum^N_{j=1} \frac{m_j - r_{ij} + 1}{m_j}\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order.

Enhanced borda rank aggregation

Enhanced borda rank aggregation combines borda rank aggregation with stability rank aggregation. The borda score is multiplied by the occurrence of the feature within the set of highly ranked features across all experiments (Wald et al. 2012).

This method uses threshold \(\tau\) to designate the highly ranked features for the purpose of computing the occurrence. Thus \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

\[s_i = \left( \frac{1}{N} \sum^N_{j=1} \delta_{ij} \right) \left( \sum^N_{j=1} \frac{m_j - r_{ij} + 1}{m_j} \right)\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order.

Truncated borda rank aggregation

Truncated borda rank aggregation is borda rank aggregation performed with only the set of most highly ranked features in each instance.

This method uses threshold \(\tau\) to designate the highly ranked features. Thus \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

\[s_i = \sum^N_{j=1} \delta_{ij} \frac{\tau - r_{ij} + 1}{\tau}\]

Note that compared to the borda method, the number of ranked features in an experiment \(m_j\) is replaced by threshold \(\tau\).

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order.

Truncated enhanced borda rank aggregation

Truncated enhanced borda rank aggregation is enhanced borda aggregation performed with only the set of most highly ranked features in each experiment.

This method uses threshold \(\tau\) to designate the highly ranked features. Thus \(\delta_{ij} = 1\) if \(r_{ij} \leq \tau\), and \(0\) otherwise.

\[s_i = \left( \frac{1}{N} \sum^N_{j=1} \delta_{ij} \right) \left( \sum^N_{j=1} \delta_{ij} \frac{\tau - r_{ij} + 1}{\tau} \right)\]

The aggregate rank of features is then determined by sorting aggregate scores \(s_i\) in descending order.

References

Battiti, R. 1994. “Using Mutual Information for Selecting Features in Supervised Neural Net Learning.” IEEE Trans. Neural Netw. 5 (4): 537–50.
De Jay, Nicolas, Simon Papillon-Cavanagh, Catharina Olsen, Nehme El-Hachem, Gianluca Bontempi, and Benjamin Haibe-Kains. 2013. mRMRe: An R Package for Parallelized mRMR Ensemble Feature Selection.” Bioinformatics 29 (18): 2365–68.
Gel′fand, I M, and A M Yaglom. 1959. “Calculation of the Amount of Information about a Random Function Contained in Another Such Function.” In Eleven Papers on Analysis, Probability and Topology, edited by E B Dynkin, I M Gel’fand, A O Gel’fond, and M A Krasnosel’skii, vol. 12. American Mathematical Society Translations: Series 2. American Mathematical Society.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer Series in Statistics. Springer Science+Business Media, LLC.
Haury, Anne-Claire, Pierre Gestraud, and Jean-Philippe Vert. 2011. “The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures.” PLoS One 6 (12): e28210.
Ishwaran, Hemant, Udaya B Kogalur, Eiran Z Gorodeski, Andy J Minn, and Michael S Lauer. 2010. High-Dimensional Variable Selection for Survival Data.” J. Am. Stat. Assoc. 105 (489): 205–17.
Janitza, Silke, Ender Celik, and Anne-Laure Boulesteix. 2018. “A Computationally Fast Variable Importance Test for Random Forests for High-Dimensional Data.” Adv. Data Anal. Classif. 12 (4): 885–915.
Meinshausen, Nicolai, and Peter Bühlmann. 2010. “Stability Selection.” J. R. Stat. Soc. Series B Stat. Methodol. 72 (4): 417–73.
Nembrini, Stefano, Inke R König, and Marvin N Wright. 2018. “The Revival of the Gini Importance?” Bioinformatics 34 (21): 3711–18.
Oshiro, Thais Mayumi, Pedro Santoro Perez, and José Augusto Baranauskas. 2012. “How Many Trees in a Random Forest?” Machine Learning and Data Mining in Pattern Recognition, 154–68.
Peng, Hanchuan, Fuhui Long, and Chris Ding. 2005. “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy.” IEEE Trans. Pattern Anal. Mach. Intell. 27 (8): 1226–38.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. https://www.R-project.org/.
Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011. “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” J. Stat. Softw. 39 (5): 1–13.
Wald, R, T M Khoshgoftaar, D Dittman, W Awada, and A Napolitano. 2012. “An Extensive Comparison of Feature Ranking Aggregation Techniques in Bioinformatics.” 2012 IEEE 13th International Conference on Information Reuse Integration (IRI), August, 377–84.