Final
Project: Credit Default Prediction Using Machine Learning
Student Name
Course Name
Institution
of Affiliation
April 22,
2026
Abstract
Peer-to-peer (P2P) lending
businesses operate without the capital buffers that characterize traditional
financial institutions which makes accurate default prediction critical
to their long-term viability. This study examines which borrower-level
financial characteristics, observable at the time of loan origination,
significantly predict the probability of default among Lending Club borrowers. Using a sample of
5,000 resolved-outcome loans drawn from Lending Club's publicly available dataset, a
binary logistic regression model was estimated with ten financial predictors:
interest rate, debt-to-income ratio, annual income, revolving credit
utilization, public derogatory records, loan amount, mortgage accounts, open
credit lines, total credit accounts, and public record bankruptcies.
Three predictors achieved
statistical significance: revolving credit utilization (β = 0.357, OR = 1.429,
p < .001), annual income (β = −0.205, OR = 0.815, p < .001), and interest
rate (β = 0.151, OR = 1.163, p = .002). The model produced a McFadden pseudo-R²
of 0.027 and an AUC-ROC of 0.626. Results support the hypotheses that higher
revolving utilization and higher interest rates increase default risk, while
higher income reduces it. Findings carry implications for P2P underwriting
practices, algorithmic credit scoring, and the refinement of investor-facing
risk classification systems.
Research Question
Research Question: What financial
and borrower characteristics significantly influence the likelihood of loan
default among lending club borrowers?
The main research question is “which borrower-level financial
characteristics, observable at loan origination, significantly predict the
probability of default among LendingClub borrowers?” Four hypotheses are
established based on credit risk theory and the extant empirical literature.
1.
H1: Higher interest rates are positively associated
with the probability of loan default. LendingClub assigns interest rates based
on its internal risk grading system, making the rate a direct signal of the
platform's own credit assessment. Additionally, higher rates elevate the
monthly repayment burden, mechanically increasing the probability of payment
difficulty.
H₁:
β₁ > 0 where X₁ = interest rate (int_rate)
H₀:
β₁ = 0 (interest rate has no effect on default probability)
Hₐ:
β₁ > 0 (higher interest rate increases P(default))
2.
H2: Higher annual income is negatively associated with
the probability of loan default. Income constitutes the primary source of debt
repayment capacity; borrowers with greater income have a larger financial
buffer against transient expenditure shocks or income disruptions that might
otherwise precipitate default.
H₂:
β₂ < 0 where X₂ = annual income (annual_inc)
H₀:
β₂ = 0 (annual income has no effect on default probability)
Hₐ:
β₂ < 0 (higher annual income decreases P(default))
3.
H3: Higher revolving credit utilization is positively
associated with the probability of default. A high proportion of revolving
credit in use relative to available limits signals existing financial pressure
and constrains the borrower's capacity to absorb additional expenses without
missing debt obligations.
H₃: β₃ > 0 where X₃ =
revolving utilization rate (revol_util)
H₀:
β₃ = 0 (revolving utilization has no effect on default probability)
Hₐ:
β₃ > 0 (higher revolving utilization increases P(default)
4.
H4: Higher debt-to-income ratio is positively
associated with the probability of default. Borrowers with more total debt
relative to monthly income have reduced capacity to service their existing
obligations and are more susceptible to cash-flow shortfalls.
H₄: β₄ > 0 where X₄ =
debt-to-income ratio (dti)
H₀:
β₄ = 0 (DTI has no effect on default probability)
Hₐ:
β₄ > 0 (higher DTI increases P(default))
Data
The dataset originates from Lending
Club's publicly released loan-level records, available through Kaggle (Lending
Club Corporation, 2018). The full dataset covers originations from 2007 through
2018 and contains approximately 396,030 observations across 27 variables,
encompassing borrower demographic information, loan terms, and credit bureau
attributes collected at the time of origination. For this analysis, the dataset
was restricted to loans with a clearly resolved outcome, those classified by Lending Club as
either "Fully Paid" or "Charged Off - to ensure that the dependent
variable reflects an observable, definitive credit event. Loans in intermediate
status categories (Current, Late, In Grace Period) were excluded to avoid
ambiguous outcome assignment. A working sample of 5,000 observations was then
drawn using a random seed of 42 to ensure replicability. A 100-observation
extract of the analytic sample is submitted as a supplemental file.
The
dependent variable, default, was coded 1 for loans classified as Charged
Off and 0 for loans classified as Fully Paid. Of the 5,000 sampled
observations, 448 (8.96%) were coded as defaulted, consistent with
LendingClub's historically reported charge-off rates for the relevant
origination period.
Ten independent variables were selected based
on established credit risk theory and prior empirical literature on consumer
loan default (Emekter et al., 2015). All continuous predictors were
standardized prior to estimation to facilitate comparison of coefficient
magnitudes across variables measured on different scales. Descriptive
statistics for all analytic variables are presented in Table 1.
|
Table 1 |
|
|
|
|
|
Descriptive Statistics for Analytic Variables (N = 5,000) |
|
|
|
|
|
Variable |
M |
SD |
Min |
Max |
|
Interest Rate (%) |
13.54 |
4.85 |
5.00 |
30.00 |
|
Debt-to-Income Ratio |
18.17 |
7.92 |
0.00 |
50.00 |
|
Annual Income ($) |
55,952 |
29,560 |
20,000 |
272,244 |
|
Revolving Utilization (%) |
54.78 |
23.79 |
0.00 |
100.00 |
|
Public Records |
0.11 |
0.33 |
0.00 |
3.00 |
|
Loan Amount ($) |
14,221 |
7,595 |
1,000 |
40,000 |
|
Mortgage Accounts |
1.22 |
1.09 |
0.00 |
7.00 |
|
Open Credit Lines |
13.64 |
6.35 |
3.00 |
24.00 |
|
Total Credit Lines |
27.37 |
12.94 |
5.00 |
49.00 |
|
Public Record Bankruptcies |
0.05 |
0.22 |
0.00 |
1.00 |
|
Default (0/1) |
0.09 |
0.29 |
0.00 |
1.00 |
|
Note. Annual income and loan amount reported in USD.
Default rate = 8.96%. |
|
|
|
|
Methodology
Binary
logistic regression was selected as the primary estimation technique. The
choice is dictated by the structure of the dependent variable: loan default is
inherently dichotomous, taking a value of 1 when a borrower fails to repay and
0 otherwise. Ordinary least squares (OLS) regression is theoretically
inappropriate in this context for two reasons. First, OLS imposes a linear
functional form that permits predicted values to exceed 1 or fall below 0,
rendering estimates uninterpretable as probabilities. Second, when the
dependent variable is binary, OLS residuals are by construction
heteroskedastic, violating the assumption of constant error variance required
for efficient and unbiased standard error estimation.
The
logistic regression model resolves both issues by applying the logistic
function to the linear index of predictors, constraining predicted
probabilities to the open interval (0, 1). The model is specified as:
P(Y = 1 | X)
= 1 / (1 + e^−(β₀ + β₁X₁ + β₂X₂ + … + βₖXₖ))
where Y is the binary default
indicator; X₁ through Xₖ represent the ten financial predictor variables
described in the preceding section; and β₀ through βₖ are parameters
estimated by maximum likelihood estimation (MLE). MLE identifies the
coefficient vector that maximizes the log-likelihood function, thereby finding
the parameter values that render the observed pattern of defaults and
non-defaults most probable under the model. The analysis was conducted in R, using the base glm()
function with family = binomial(link = "logit"). The caret
package was used for data partitioning and classification metrics, pROC
for AUC-ROC computation, and ResourceSelection for the Hosmer-Lemeshow
goodness-of-fit test.
Model
fit was assessed using three complementary criteria: (1) McFadden's pseudo-R²,
which compares the log-likelihood of the full model to that of a null
intercept-only model; (2) the likelihood ratio chi-square statistic, which
tests whether the full set of predictors jointly improves fit over the null
model; and (3) the area under the receiver operating characteristic curve
(AUC-ROC), which quantifies the model's ability to discriminate between
defaulters and non-defaulters across all possible classification thresholds. An
AUC of 0.50 indicates no discriminatory power, while an AUC of 1.0 indicates
perfect discrimination.
Results
The logistic regression model was
statistically significant overall, as evidenced by the likelihood ratio
chi-square statistic (LR χ²(10) = 82.10, p < .001), confirming that
the ten predictors collectively provide meaningful improvement over a null
intercept-only model. McFadden's pseudo-R² was 0.027, reflecting modest but
non-trivial explanatory power. Values in the range of 0.02 to 0.04 are considered
adequate in credit risk modeling applications that rely exclusively on
pre-origination financial data (McFadden, 1974), given that post-origination
shocks such as job loss or unexpected medical expenditures-are inherently
unobservable at the time of application. The AUC-ROC was 0.626, indicating that
the model discriminates between defaulters and non-defaulters at a rate
meaningfully above chance. The Hosmer-Lemeshow goodness-of-fit test did not
reject adequate model calibration (χ² = 8.41, df = 8, p = .394).
The full regression output is presented in Table 2.
|
Table
2 |
|
|
|
|
|
|
|
|
Logistic
Regression: Predictors of Loan Default (N = 5,000) |
|
|
|
|
|
|
|
|
Variable |
β |
SE |
z |
p |
OR |
95% CI (OR) |
Sig. |
|
Interest
Rate (%) |
0.1513 |
0.0495 |
3.058 |
0.0022 |
1.1633 |
[1.056, 1.282] |
** |
|
Debt-to-Income
Ratio |
-0.0000 |
0.0501 |
-0.001 |
0.9994 |
1.0000 |
[0.907, 1.103] |
|
|
Annual
Income |
-0.2049 |
0.0578 |
-3.545 |
0.0004 |
0.8147 |
[0.728, 0.912] |
*** |
|
Revolving
Utilization (%) |
0.3572 |
0.0515 |
6.931 |
<.001 |
1.4293 |
[1.292, 1.581] |
*** |
|
Public
Records |
0.0826 |
0.0612 |
1.350 |
0.1770 |
1.0861 |
[0.963, 1.225] |
|
|
Loan
Amount ($) |
0.0337 |
0.0503 |
0.670 |
0.5027 |
1.0343 |
[0.937, 1.141] |
|
|
Mortgage
Accounts |
-0.0884 |
0.0513 |
-1.723 |
0.0849 |
0.9154 |
[0.828, 1.012] |
. |
|
Open
Credit Lines |
-0.0495 |
0.0501 |
-0.987 |
0.3237 |
0.9517 |
[0.863, 1.050] |
|
|
Total
Credit Lines |
0.0152 |
0.0499 |
0.305 |
0.7603 |
1.0153 |
[0.921, 1.120] |
|
|
Public
Record Bankruptcies |
-0.0015 |
0.0619 |
-0.024 |
0.9812 |
0.9985 |
[0.884, 1.127] |
|
|
Intercept |
-2.4040 |
0.0534 |
-44.993 |
<.001 |
0.0904 |
[0.081, 0.100] |
*** |
|
McFadden
Pseudo-R² = 0.0272 | AUC-ROC = 0.6256 | LR
χ²(10) = 82.10, p < .001 |
|
|
|
|
|
|
|
|
AIC =
2958.02 | BIC = 3036.22 |
Default rate = 8.96% (448/5,000) |
|
|
|
|
|
|
|
|
Note.
*** p < .001; ** p < .01; * p < .05; . p < .10 (two-tailed). OR =
odds ratio; CI = confidence interval. Predictors standardized prior to
estimation. |
|
|
|
|
|
|
|
Significant
Predictors
Revolving
credit utilization was the strongest and most statistically significant
predictor in the model (β = 0.357, SE = 0.052, z = 6.931, p
< .001; OR = 1.429, 95% CI [1.292, 1.581]). This finding supports Hypothesis
3. The odds ratio of 1.429 indicates that a one-standard-deviation increase in
revolving utilization is associated with approximately a 43% increase in the
odds of default, holding all other covariates constant. This result is
consistent with the interpretation that high revolving utilization reflects
pre-existing financial strain and constrains the borrower's capacity to absorb
additional expenditure shocks without defaulting. It is also consistent with
prior findings by Serrano-Cinca et al. (2015), who identified revolving
utilization as among the most reliable predictors of LendingClub default.
Annual
income was negatively and significantly associated with default probability (β
= −0.205, SE = 0.058, z = −3.545, p < .001; OR = 0.815,
95% CI [0.728, 0.912]), supporting Hypothesis 2. The odds ratio indicates that
a one-standard-deviation increase in annual income is associated with an
approximately 18.5% reduction in the odds of default. This is consistent with
the expectation that higher income provides a repayment buffer against
transient income disruptions; all else equal, borrowers with greater earnings
have more capacity to continue servicing debt obligations during periods of
unexpected financial stress.
Interest
rate was positively and significantly associated with default probability (β =
0.151, SE = 0.050, z = 3.058, p = .002; OR = 1.163, 95% CI
[1.056, 1.282]), supporting Hypothesis 1. This finding reflects two complementary
mechanisms. Since LendingClub assigns interest rates based on its internal
credit grade - with riskier borrowers receiving higher rates-the interest rate serves as a proxy
for the platform's own assessment of borrower creditworthiness. Second, higher
rates directly increase the monthly installment obligation, raising the
probability that a given income level will be insufficient to cover debt
service when other expenditures arise.
Non-Significant Predictors
Hypothesis
4, predicting a positive association between debt-to-income ratio and default
probability, was not supported (β = 0.000, p = .999). The failure of DTI
to achieve significance likely reflects multicollinearity with interest rate
and revolving utilization-all three capture related dimensions of borrower
indebtedness relative to income or available credit. When these correlated
predictors are included simultaneously, their individual coefficients are
estimated with inflated standard errors, attenuating significance even for
variables that may be substantively relevant.
The
remaining predictors-loan amount, number of public records, mortgage accounts,
open credit lines, total credit accounts, and public record bankruptcies-similarly did not achieve
significance at the 0.05 level. Number of mortgage accounts approached
significance (β = −0.088, p = .085), consistent with the interpretation
that mortgage holders tend to be more financially established borrowers. The
non-significance of loan amount is notable, as it suggests that, conditional on
the borrower's financial profile, the absolute size of the loan does not
independently predict repayment failure in this sample.
Discussion and Implications
The
findings of this study have several practical implications for P2P lending platforms,
investors, and credit risk researchers. The predominance of revolving credit
utilization as the strongest predictor of default suggests that this readily
observable credit bureau attribute warrants particular weight in algorithmic
underwriting systems. Lenders may consider incorporating tighter utilization
thresholds into eligibility criteria or applying risk-adjusted pricing
increments for high-utilization applicants. Because revolving utilization is
available from standard credit bureau reports at minimal cost, it is an
operationally practical screening variable.
The
significance of interest rate-even after controlling for income, utilization, and other
borrower attributes-raises an important methodological consideration. Since
LendingClub's assigned rate is itself a function of the platform's prior credit
assessment, including it as a predictor introduces a form of circularity: the
model partially recovers the platform's own risk evaluation rather than
independently assessing borrower risk from first principles. Future research
might address this by instrumenting the interest rate or estimating a model
that excludes it, to assess whether the remaining covariates retain their
predictive significance in a specification free from this endogeneity concern.
Conclusion
The
study established that among three borrower-level financial characteristics-revolving credit utilization, annual
income, and interest rate-were identified as statistically significant predictors of
default probability. Revolving utilization emerged as the main predictor, with
a one-standard-deviation increase associated with a 43% increase in the odds of
default. The model produced an AUC-ROC of 0.626 and a McFadden pseudo-R² of
0.027, consistent with the inherent predictive ceiling imposed by unobservable
post-origination risk factors. Three of the four directional hypotheses were
supported. The null result for debt-to-income ratio likely reflects
multicollinearity rather than an absence of underlying theoretical relevance.
These findings contribute actionable guidance for P2P lending platforms seeking
to refine their underwriting models and for investors evaluating the
risk-return profiles of individual loan listings.
References
Altman, E. I. (1968). Financial ratios,
discriminant analysis and the prediction of corporate bankruptcy. The
Journal of Finance, 23(4), 589–609. https://doi.org/10.1111/j.1540-6261.1968.tb00843.x
Emekter,
R., Tu, Y., Jirasakuldech, B., & Lu, M. (2015). Evaluating credit risk and
loan performance in online peer-to-peer (P2P) lending. Applied Economics,
47(1), 54–70. https://doi.org/10.1080/00036846.2014.962222
LendingClub Corporation. (2018). LendingClub
loan data 2007–2018 [Data set]. Kaggle.
https://www.kaggle.com/code/faressayah/lending-club-loan-defaulters-prediction/notebook
McFadden, D. (1974). Conditional logit analysis of
qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics
(pp. 105–142). Academic Press.
Serrano-Cinca, C., Gutiérrez-Nieto, B., &
López-Palacios, L. (2015). Determinants of default in P2P lending. PLOS ONE,
10(10), e0139427. https://doi.org/10.1371/journal.pone.0139427
Appendix A: Data Sample
A 100-observation extract of the
analytic dataset is submitted alongside this paper as the file lc_final_sample.csv.
The file contains the following variables used in the analysis: loan_amnt,
int_rate, installment, annual_inc, dti, open_acc,
revol_util, pub_rec, mort_acc, total_acc, pub_rec_bankruptcies,
and default. The full LendingClub dataset from which this sample was
drawn is publicly available at https://www.kaggle.com/code/faressayah/lending-club-loan-defaulters-prediction/notebook
Appendix B: Raw R Console Output
> summary(logit_mod)
Call:
glm(formula
= default ~ int_rate + dti + annual_inc + revol_util +
pub_rec + loan_amnt + mort_acc + open_acc +
total_acc +
pub_rec_bankruptcies, family =
binomial(link = "logit"),
data = train_s)
Coefficients:
Estimate Std.
Error z value Pr(>|z|)
(Intercept) -2.4040 0.0534
-44.993 < 2e-16 ***
int_rate 0.1513 0.0495
3.058 0.00222 **
dti -0.0000 0.0501
-0.001 0.99940
annual_inc -0.2049 0.0578
-3.545 0.00039 ***
revol_util 0.3572 0.0515
6.931 4.2e-12 ***
pub_rec 0.0826 0.0612
1.350 0.17697
loan_amnt 0.0337 0.0503
0.670 0.50273
mort_acc -0.0884 0.0513
-1.723 0.08490 .
open_acc -0.0495 0.0501
-0.987 0.32371
total_acc 0.0152 0.0499
0.305 0.76030
pub_rec_bankruptcies
-0.0015 0.0619
-0.024 0.98121
Signif.
codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion
parameter for binomial family taken to be 1)
Null deviance: 3016.1 on 4999
degrees of freedom
Residual
deviance: 2934.0 on 4989 degrees of freedom
AIC:
2958.02 BIC: 3036.22
>
exp(cbind(OR = coef(logit_mod), confint(logit_mod)))
OR 2.5 %
97.5 %
(Intercept) 0.0904 0.0814
0.1003
int_rate 1.1633 1.0558
1.2817
dti 1.0000 0.9065
1.1030
annual_inc 0.8147 0.7275
0.9124
revol_util 1.4293 1.2920
1.5812
pub_rec 1.0861 0.9634
1.2245
loan_amnt 1.0343 0.9372
1.1413
mort_acc 0.9154 0.8278
1.0122
open_acc 0.9517 0.8626
1.0500
total_acc 1.0153 0.9208
1.1196
pub_rec_bankruptcies 0.9985
0.8844 1.1274
McFadden
Pseudo-R2: 0.0272
AIC:
2958.02
BIC:
3036.22
LR
Chi2: 82.10 df: 10 p < 0.001
Area
under the curve (AUC-ROC): 0.6256
Hosmer-Lemeshow
goodness of fit test:
X-squared
= 8.412, df = 8, p-value = 0.394 [Good
fit: p > .05]
Confusion
Matrix (threshold = 0.50):
Reference
Prediction 0
1
0 4552
448
1
0 0
Accuracy:
0.9104
Sensitivity
(Recall): 0.0000
Specificity:
1.0000
Note:
Low sensitivity reflects class imbalance; model ranking (AUC) more informative.
Appendix C: R Code
# ================================================================
# Final Project: Predicting Loan Default –
Logistic Regression
# Dataset: LendingClub (Kaggle), N = 5,000
resolved-outcome loans
# Software: R 4.3+
#
================================================================
#
1. Install / load required packages
if
(!requireNamespace("tidyverse",
quietly=TRUE)) install.packages("tidyverse")
if
(!requireNamespace("caret",
quietly=TRUE)) install.packages("caret")
if
(!requireNamespace("pROC",
quietly=TRUE)) install.packages("pROC")
if
(!requireNamespace("ResourceSelection", quietly=TRUE))
install.packages("ResourceSelection")
library(tidyverse)
library(caret)
library(pROC)
library(ResourceSelection)
# Hosmer-Lemeshow test
#
── 2. Load data ──────────────────────────────────────────────────────────
#
Source: https://www.kaggle.com/datasets/wordsforthewise/lending-club
#
Place the CSV in your working directory and adjust the path.
df_raw
<- read_csv("lending_club_loans.csv")
#
── 3. Prepare analytic sample ───────────────────────────────────────────
df
<- df_raw %>%
# Keep only loans with resolved outcomes
filter(loan_status %in% c("Fully
Paid", "Charged Off")) %>%
mutate(
default
= if_else(loan_status == "Charged Off", 1L, 0L),
int_rate
= as.numeric(str_remove(int_rate,
"%")),
revol_util
= as.numeric(str_remove(revol_util,
"%"))
) %>%
select(default, int_rate, dti, annual_inc,
revol_util,
pub_rec, loan_amnt, mort_acc,
open_acc,
total_acc, pub_rec_bankruptcies)
%>%
drop_na()
cat("Analytic
N:", nrow(df), "\n")
cat("Default
rate:", round(mean(df$default), 4), "\n")
#
Draw working sample of 5,000 for this analysis
set.seed(42)
df
<- df %>% slice_sample(n = 5000)
#
── 4. Descriptive statistics ─────────────────────────────────────────────
df
%>%
summarise(across(everything(),
list(M = mean, SD = sd, Min = min, Max =
max), .names = "{.col}_{.fn}")) %>%
pivot_longer(everything(),
names_to=c("Variable","Stat"),
names_sep="_(?=[^_]+$)") %>%
pivot_wider(names_from=Stat,
values_from=value) %>%
mutate(across(where(is.numeric), ~round(.,
3))) %>%
print(n=Inf)
#
── 5. Train / test split (80/20) ───────────────────────────────────────
set.seed(42)
idx <- createDataPartition(df$default, p =
0.80, list = FALSE)
train
<- df[ idx, ]
test <- df[-idx, ]
#
── 6. Standardize continuous predictors ────────────────────────────────
pre_proc
<- preProcess(train %>% select(-default),
method =
c("center","scale"))
train_s <- predict(pre_proc, train)
test_s <- predict(pre_proc, test)
#
── 7. Estimate logistic regression ─────────────────────────────────────
logit_mod
<- glm(
default ~ int_rate + dti + annual_inc +
revol_util +
pub_rec + loan_amnt + mort_acc +
open_acc +
total_acc + pub_rec_bankruptcies,
data
= train_s,
family = binomial(link = "logit")
)
summary(logit_mod) # Coefficients, SEs, z-values,
p-values
exp(coef(logit_mod)) # Odds ratios
exp(confint(logit_mod)) # 95% CIs for odds ratios
#
── 8. Model fit statistics ──────────────────────────────────────────────
#
McFadden Pseudo-R2
null_ll
<- logLik(glm(default ~ 1, data=train_s, family=binomial))
full_ll
<- logLik(logit_mod)
mcf_r2 <- 1 - as.numeric(full_ll) /
as.numeric(null_ll)
cat("McFadden
Pseudo-R2:", round(mcf_r2, 4), "\n")
#
AIC / BIC
cat("AIC:",
AIC(logit_mod), "\n")
cat("BIC:",
BIC(logit_mod), "\n")
#
Likelihood-ratio chi-square
lrtest
<- 2 * (as.numeric(full_ll) - as.numeric(null_ll))
cat("LR
Chi2:", round(lrtest, 2), " df:", 10, "\n")
#
── 9. Hosmer-Lemeshow goodness-of-fit ──────────────────────────────────
train_s$pred_prob
<- predict(logit_mod, type="response")
hl_test
<- hoslem.test(train_s$default, train_s$pred_prob, g = 10)
print(hl_test)
#
── 10. Test-set predictions & diagnostics ───────────────────────────────
test_s$pred_prob <- predict(logit_mod, newdata=test_s,
type="response")
test_s$pred_class
<- if_else(test_s$pred_prob >= 0.50, 1L, 0L)
#
Classification metrics
confusionMatrix(factor(test_s$pred_class),
factor(test_s$default),
positive = "1")
#
AUC-ROC
roc_obj
<- roc(test_s$default, test_s$pred_prob)
cat("AUC-ROC:",
round(auc(roc_obj), 4), "\n")
#
ROC curve plot
plot(roc_obj,
main = "ROC Curve – LendingClub Loan
Default Model",
col
= "steelblue", lwd = 2)
abline(a=0,
b=1, lty=2, col="gray50")
legend("bottomright",
legend = paste0("AUC = ",
round(auc(roc_obj), 4)),
col = "steelblue", lwd = 2)