Package {ebrahim.gof}


Type: Package
Title: Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression
Version: 2.1.0
Date: 2026-06-17
Maintainer: Ebrahim Khaled Ebrahim <ebrahimkhaled@alexu.edu.eg>
Description: Implements the Ebrahim-Farrington goodness-of-fit test for logistic regression models, particularly effective for sparse data and binary outcomes. This test provides an improved alternative to the traditional Hosmer-Lemeshow test by using a modified Pearson chi-square statistic with data-dependent grouping. The test is based on Farrington (1996) theoretical framework but simplified for practical implementation with binary data. Includes functions for both the original Farrington test (for grouped data) and the new Ebrahim-Farrington test (for binary data with automatic grouping), the Directed Ebrahim-Farrington (DEF) test that targets calibration-shape departures, and an ensemble that combines the DEF bases via the Cauchy combination test. Also provides 'run.all.gof()', which runs a battery of classical and modern goodness-of-fit and calibration tests (including McCullagh, Osius-Rojek, le Cessie-van Houwelingen, Stute-Zhu, and the GiViTI calibration test) in one call. For more details see Hosmer (1980) <doi:10.1080/03610928008827941> and Farrington (1996) <doi:10.1111/j.2517-6161.1996.tb02086.x>.
License: GPL-3
URL: https://github.com/ebrahimkhaled/ebrahim.gof
BugReports: https://github.com/ebrahimkhaled/ebrahim.gof/issues
Depends: R (≥ 3.5.0)
Imports: stats
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown, ResourceSelection, ggplot2, CompQuadForm, statmod, mgcv, BAGofT, givitiR, callr
Encoding: UTF-8
RoxygenNote: 7.3.2
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-06-17 15:22:55 UTC; ebrah
Author: Ebrahim Khaled Ebrahim ORCID iD [aut, cre]
Repository: CRAN
Date/Publication: 2026-06-17 21:40:02 UTC

Covariate-Space Directed Ebrahim-Farrington (CDEF) Goodness-of-Fit Test

Description

A directed goodness-of-fit test for binary logistic regression whose direction lives in covariate space (functions of the predictors) rather than in fitted-probability space like def.gof. It projects the standardized residuals onto a covariate-space basis (polynomials and pairwise products, natural splines, or a combination that also includes fitted-probability bends) and calibrates the quadratic form with the Farrington estimation-adjusted projection, exactly as in def.gof. This makes it sensitive to omitted interactions and to local / oscillatory departures that fitted-probability grouping can miss.

Usage

cdef.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  basis = c("poly", "spline", "combined"),
  method = c("satterthwaite", "imhof")
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs and X).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Design/covariate matrix (with or without an intercept column); required when object is a y vector. Ignored when object is a glm.

basis

One of "poly" (squares, cubes, pairwise products), "spline" (natural cubic splines per covariate plus a pairwise term; needs splines), or "combined" (covariate polynomials plus fitted-probability bends).

method

One of "satterthwaite" (default) or "imhof".

Details

Let \tilde r_i=(y_i-\hat p_i)/\sqrt{\hat p_i(1-\hat p_i)} be the standardized residuals and Z a covariate-space basis matrix. The statistic is S=(Z'\tilde r)'(Z'Z)^{-1}(Z'\tilde r), whose null distribution is a weighted sum of \chi^2_1 variables with weights the eigenvalues of (Z'Z)^{-1}Z'\Omega Z, where \Omega=I-V^{1/2}X(X'VX)^{-1}X'V^{1/2} adjusts for estimating \hat\beta. The p-value uses a Satterthwaite scaled-\chi^2 approximation (default) or Imhof's method (CompQuadForm). Rank-deficient bases are reduced automatically.

Value

A one-row data.frame with Test, Basis, Test_Statistic, df, Method, and p_value.

References

Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. JRSS-B 58(2), 349-360.

See Also

def.gof, ef.gof, run.all.gof.

Examples

set.seed(1)
n <- 600; x1 <- runif(n, -3, 3); x2 <- rnorm(n)
# truth has an omitted interaction; fit the additive model
y <- rbinom(n, 1, plogis(0.3 + 0.8 * x1 - 0.5 * x2 + 0.4 * x1 * x2))
fit <- glm(y ~ x1 + x2, family = binomial())
cdef.gof(fit)                    # covariate-space directed test (poly basis)
cdef.gof(fit, basis = "spline")  # for local / oscillatory misfit


Combine Directed GOF Tests into One Decision (Ensemble)

Description

Combines the three Directed Ebrahim-Farrington (DEF) basis tests ("poly2", "poly3", "stukel") into a single goodness-of-fit decision, so the user does not have to choose a basis. By default the p-values are combined with the Cauchy Combination Test (CCT), which controls the error rate under the strong dependence between tests computed on the same fitted model. The omnibus EF test can optionally be added to the vote.

Usage

def.ensemble.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  components = c("poly2", "poly3", "stukel"),
  add_ef = FALSE,
  combine = c("cct", "minp", "fisher"),
  G = 10,
  extra_pvalues = NULL
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Optional design matrix, threaded to def.gof for the exact calibration (only used with the y/predicted_probs form).

components

Character vector, a subset of c("poly2","poly3","stukel"). Default is all three.

add_ef

Logical; if TRUE, the omnibus EF p-value (ef.gof) is appended to the components. Default FALSE.

combine

One of "cct" (default), "minp", "fisher".

G

Integer number of groups passed to def.gof/ef.gof (default 10).

extra_pvalues

Optional named numeric vector of additional p-values to include (e.g. a Tsiatis test computed elsewhere). Default NULL.

Details

Because the component tests are computed on the same fit, their p-values are strongly dependent. The CCT (combine = "cct") has an asymptotic standard-Cauchy null whose tail is robust to this dependence, so it needs no calibration. The "minp" (Sidak) and "fisher" rules assume independence and are offered for comparison only; under positive dependence "minp" is conservative and "fisher" is anti-conservative, so they should be calibrated by simulation before use (not done here).

Value

A one-row data.frame with columns Test, Combiner, Components, k, and p_value.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Liu, Y. and Xie, J. (2020). Cauchy combination test. JASA, 115(529), 393-402.

See Also

def.gof, ef.gof.

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())
def.ensemble.gof(fit)                 # CCT of the three DEF bases
def.ensemble.gof(fit, add_ef = TRUE)  # add the omnibus EF


Directed Ebrahim-Farrington (DEF) Goodness-of-Fit Test

Description

Performs the Directed Ebrahim-Farrington (DEF) goodness-of-fit test for a fitted binary logistic regression model. DEF concentrates its power on a small set of calibration-curve "shape" directions by projecting the grouped standardized residuals onto a low-dimensional basis and testing the squared length of that projection.

Usage

def.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  G = 10,
  basis = c("poly3", "poly2", "stukel", "ensemble"),
  method = c("satterthwaite", "imhof")
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector, ignored when it is a glm.

X

Optional design matrix, used only with the y/predicted_probs form: it enables the exact estimation-adjusted (\Omega) calibration (logit working weights assumed). Without it the conservative \chi^2_k reference is used and a warning is issued. Ignored when object is a glm.

G

Integer number of equal-frequency groups (default 10; must be >= 3).

basis

One of "poly3" (default), "poly2", "stukel", or "ensemble".

method

One of "satterthwaite" (default) or "imhof".

Details

The observations are sorted by predicted probability and split into G equal-frequency groups; the standardized grouped residual vector r is projected onto a basis matrix Z of smooth shapes, giving S = (Z'r)'(Z'Z)^{-1}(Z'r). Its null distribution is a weighted sum of \chi^2_1 variables with weights equal to the eigenvalues of (Z'Z)^{-1}Z'\Omega Z, where \Omega = I - U(X'WX)^{-1}U' is the estimation-adjusted covariance of the grouped residuals. The p-value uses a Satterthwaite scaled-\chi^2 approximation (default) or Imhof's method (if the CompQuadForm package is installed). Bases: "poly2", "poly3" (default), "stukel"; "ensemble" runs all three and combines them via def.ensemble.gof.

Value

A one-row data.frame with columns Test, Basis, Test_Statistic (the statistic S), df, Method, and p_value. When basis = "ensemble", the return is that of def.ensemble.gof.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Ebrahim, K. E. and El-Kotory, A. Omnibus versus Directed Goodness-of-Fit Tests for Sparse Data in Binary Logistic Regression (companion paper).

See Also

ef.gof, def.ensemble.gof.

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())
def.gof(fit)                       # default poly3 basis
def.gof(fit, basis = "stukel")     # tail-shape basis
def.gof(fit, basis = "ensemble")   # combine all three (CCT)


Deployable learned-ensemble GOF test via parametric bootstrap

Description

Turns a pre-trained ensemble meta into a deployable goodness-of-fit test for any fitted model: it scores the model, then calibrates the p-value by a per-dataset parametric bootstrap from the fitted model (so no knowledge of the truth or the data-generating design is required). Validity comes from the bootstrap, independent of how meta was trained.

Usage

deploy.gof(object, meta, B = 99, feature_fn = gof.features)

Arguments

object

A fitted binary logistic glm.

meta

A pre-trained scorer: either a function f(features) returning a scalar misfit score, or an object with a predict method consuming a one-row feature matrix.

B

Number of parametric-bootstrap resamples (default 99).

feature_fn

Function mapping a fitted glm to its feature vector (default gof.features).

Value

A one-row data.frame with the score, B, and the bootstrap p_value.

See Also

gof.features, cdef.gof.


Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression

Description

Performs the Ebrahim-Farrington goodness-of-fit test for logistic regression models. This test is particularly effective for binary data and sparse datasets, providing an improved alternative to the traditional Hosmer-Lemeshow test.

Usage

ef.gof(
  y,
  predicted_probs = NULL,
  model = NULL,
  m = NULL,
  G = 10,
  method = c("chisq", "normal")
)

Arguments

y

A fitted binary logistic glm (then predicted_probs is taken from it automatically), or a numeric vector of binary responses (0/1) for binary data / counts of successes for grouped data.

predicted_probs

Numeric vector of predicted probabilities from the logistic regression model. Must be same length as y.

model

Optional glm object. Required only for the original Farrington test with grouped data (when m is provided and G is NULL).

m

Optional numeric vector of trial counts for each observation (for grouped data). If NULL, data is assumed to be binary.

G

Optional integer specifying the number of groups for binary data grouping. Default is 10. If NULL, no grouping is performed and m must be provided.

method

Reference distribution for the grouped EF statistic: "chisq" (default) refers T_{EF} to a \chi^2_{G-2} distribution; "normal" uses the standardized Z_{EF} (the behaviour of package versions <= 1.0.0).

Details

The Ebrahim-Farrington test is based on Farrington's (1996) theoretical framework but simplified for practical implementation with binary data. The test uses a modified Pearson chi-square statistic with data-dependent grouping, where observations are grouped by their predicted probabilities.

For binary data (when G is specified), the test automatically groups observations into G groups based on predicted probabilities and applies the simplified Ebrahim-Farrington statistic:

Z_{EF} = \frac{T_{EF} - (G - 2)}{\sqrt{2(G-2)}}

where T_{EF} is the modified Pearson chi-square statistic, and G is the number of groups.

For grouped data (when m is provided), the test applies the original Farrington test with full variance calculations.

Value

A data frame with the following columns:

Test

Character string identifying the test performed

Test_Statistic

Numeric value of the standardized test statistic

p_value

Numeric p-value for the test

Note

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. *Journal of the Royal Statistical Society. Series B (Methodological)*, 58(2), 349-360. Ebrahim, K. E. (2025). Goodness-of-Fits Tests and Calibration Machine Learning Algorithms for Logistic Regression Model with Sparse Data. *Master's Thesis*, Alexandria University. Hosmer, D. W., & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. *Communications in Statistics - Theory and Methods*, 9(10), 1043–1069. https://doi.org/10.1080/03610928008827941

See Also

hoslem.test for the Hosmer-Lemeshow test

Examples

# Example 1: Binary data with automatic grouping (Ebrahim-Farrington test)
set.seed(123)
n <- 500
x <- rnorm(n)
linpred <- 0.5 + 1.2 * x
prob <- 1 / (1 + exp(-linpred))
y <- rbinom(n, 1, prob)

# Fit logistic regression
model <- glm(y ~ x, family = binomial())
predicted_probs <- fitted(model)

# Perform Ebrahim-Farrington test with 10 groups
result <- ef.gof(y, predicted_probs, G = 10)
print(result)

# Example 2: Compare with different number of groups
result_4 <- ef.gof(y, predicted_probs, G = 4)
result_20 <- ef.gof(y, predicted_probs, G = 20)

# Example 3: Grouped data (original Farrington test)
# Note: This requires actual grouped data with trials > 1
## Not run: 
# Simulated grouped data
n_groups <- 50
m_trials <- sample(5:20, n_groups, replace = TRUE)
x_grouped <- rnorm(n_groups)
linpred_grouped <- -0.5 + 1.0 * x_grouped
prob_grouped <- 1 / (1 + exp(-linpred_grouped))
y_grouped <- rbinom(n_groups, m_trials, prob_grouped)

# Fit model for grouped data
data_grouped <- data.frame(successes = y_grouped, trials = m_trials, x = x_grouped)
model_grouped <- glm(cbind(successes, trials - successes) ~ x, 
                     data = data_grouped, family = binomial())
predicted_probs_grouped <- fitted(model_grouped)

# Original Farrington test
result_grouped <- ef.gof(y_grouped, predicted_probs_grouped, 
                         model = model_grouped, m = m_trials)
print(result_grouped)

## End(Not run)


Goodness-of-fit evidence features for a fitted model

Description

Builds the evidence vector used by the learned-ensemble goodness-of-fit test: one-sided z-scores \Phi^{-1}(1-p) from a panel of GOF tests plus the covariate-space directed tests. Larger values mean stronger evidence of misfit.

Usage

gof.features(
  object,
  tests = c("HL", "HL-equalwidth", "Pigeon-Heyse", "Tsiatis", "Xie", "EF", "DEF.poly2",
    "DEF.poly3", "DEF.stukel")
)

Arguments

object

A fitted binary logistic glm.

tests

Character vector of run.all.gof test names to use as panel features (default: a fast partition + DEF-family panel).

Value

A named numeric vector of evidence features.

See Also

deploy.gof, cdef.gof, run.all.gof.


Plot the GiViTI calibration belt from a goodness-of-fit battery

Description

Draws the GiViTI calibration belt stored on a run.all.gof result that was produced with calibration_plot = TRUE. The belt shows the fitted calibration curve with a confidence region against the 45-degree line.

Usage

## S3 method for class 'gof_battery'
plot(x, ...)

Arguments

x

A gof_battery object from run.all.gof.

...

Passed to the givitiR plot method.

Value

x, invisibly.


Print a goodness-of-fit battery

Description

Formats the run.all.gof result as a compact, readable table: rows grouped by test family, p-values shown to four decimals (or scientific for very small values, "-" when not available), and a significance flag. The object is still a plain data.frame underneath, so all the raw columns remain available for programmatic use.

Usage

## S3 method for class 'gof_battery'
print(x, ...)

Arguments

x

A gof_battery object returned by run.all.gof.

...

Ignored.

Value

x, invisibly.


Run a Battery of Goodness-of-Fit Tests at Once

Description

Runs several goodness-of-fit tests for a binary logistic regression in one call and returns one tidy data.frame, one row per test. Pass a fitted glm to run the whole battery; pass (y, predicted_probs) to run the tests that need only predictions. Each test is wrapped so that a failure of one test never aborts the whole run.

Usage

run.all.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  tests = "all",
  G = 10,
  include_slow = TRUE,
  calibration_plot = FALSE,
  control = list()
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Optional design matrix; lets the directed (DEF) tests run from the (y, predicted_probs) form.

tests

Either "all" (default) or a character vector of test names to run (e.g. c("EF","DEF.poly3","HL")).

G

Integer number of groups passed to the grouping tests (default 10).

include_slow

Logical; when TRUE (the default) the full battery runs, including the slow tests: le Cessie-van Houwelingen smoothing (O(n^2)-O(n^3)), the GAM tests, Stute-Zhu, eHL, BAGofT, and GiViTI. Set FALSE for a quick run with the fast tests only. A one-time message notes this whenever slow tests are included.

calibration_plot

Logical; when TRUE and GiViTI is among the tests, also compute and draw the GiViTI calibration belt and store it on the result (retrievable with plot()). Default FALSE.

control

Optional named list of per-test options (e.g. list(BAGofT = list(nsim = 100), GiViTI = list(devel = "internal"))).

Details

The currently bundled tests are: Pearson, Deviance, Osius-Rojek, McCullagh, Copas-RSS, and Information-Matrix (the White/Orme test) (global / standardized); McCullagh standardizes the Pearson statistic by its exact conditional moments (Kuss 2002 algorithm); HL (Hosmer-Lemeshow deciles), HL-equalwidth, Pigeon-Heyse, and F-test (the modified Hosmer-Lemeshow F-test: deviance residuals ANOVA-F-tested across deciles) (partition); EF and EF-normal (the omnibus Ebrahim-Farrington test with the chi-square and normal references; the normal form reproduces the thesis simulation); DEF.poly2/poly3/stukel and Stukel (directed); Tsiatis, Xie, and Pulkstenis-Robinson (covariate-space); the two ensemble rows (Ensemble.Vote(3DEF) and Ensemble.Univ(3DEF+EF)) from the Cauchy combination test; and, when include_slow = TRUE, the opt-in slow tests: le-Cessie-van Houwelingen smoothing, the GAM-based tests HL-GAM, PR-GAM, Xie-GAM (need mgcv; fit an overfit GAM for grouping), Stute-Zhu (a cumulative-residual parametric-bootstrap test; set the number of reps with control = list("Stute-Zhu" = list(B = ...))), eHL (the e-value Hosmer-Lemeshow test, reported as p = min(1, 1/e)), and BAGofT (the binary-adaptive GOF test; needs the BAGofT package, control = list(BAGofT = list(nsim = ...))), and Lai-Liu-HL (Lai & Liu's standardized-power procedure for the Hosmer-Lemeshow test, which has no p-value: it reports the standardized power as the statistic and a randomized accept/reject decision in the Note; target size via control = list("Lai-Liu-HL" = list(n0 = ..., k = ...))), and GiViTI and GiViTI-external (the GiViTI polynomial calibration test with the internal and external development assumptions; wraps givitiR, run in an isolated callr subprocess so a failure in its compiled dependencies returns NA rather than aborting the session; set control = list(GiViTI = list(devel = "internal"))).

Notes: Tsiatis and Xie cluster the covariate space with k-means (a fixed internal seed, so results are reproducible and the caller's RNG is left untouched). Xie uses the corrected degrees of freedom G - k/2 - 1 with k the number of predictors. Pulkstenis-Robinson auto-detects the categorical covariate (any factor/character/logical, or a numeric with at most getOption("ebrahim.gof.pr.maxlev", 6) distinct values); it returns NA with a note when none is present.

Every bundled test reproduces the implementation used in the original thesis simulation: Osius-Rojek and Stukel follow LogisticDx's gof.glm (Stukel via statmod::glm.scoretest when statmod is installed), Copas-RSS follows rms's gof residual, HL follows ResourceSelection::hoslem.test, and the others match their standalone reference functions; all were checked to agree numerically.

Value

A data.frame (of class gof_battery) with columns Test, Family, Statistic, df, p_value, and Note, one row per test. A dedicated print method shows the rows grouped by family with formatted p-values and significance flags; the underlying columns remain available for programmatic use.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

See Also

ef.gof, def.gof, def.ensemble.gof.

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())

## quick run: the fast tests only
run.all.gof(fit, include_slow = FALSE)

## pick specific tests
run.all.gof(fit, tests = c("McCullagh", "Osius-Rojek", "HL"))


## the full battery (default include_slow = TRUE); the slow tests may need the
## suggested packages mgcv, BAGofT, givitiR and callr
run.all.gof(fit, control = list("Stute-Zhu" = list(B = 50)))

## draw the GiViTI calibration belt (needs givitiR + callr)
res <- run.all.gof(fit, tests = c("McCullagh", "GiViTI"),
                   calibration_plot = TRUE)
plot(res)   # redraw the stored belt