Logistic Regression Tool

Explore binary classification, sigmoid function, log loss, log odds & decision thresholds

tune Controls

scatter_plot Data Generation

settings Model Configuration

Log Loss is used as the cost function. It heavily penalises confident wrong predictions.

play_circle Training

vertical_align_center Decision Threshold

Classify as Class 1 when P ≥ threshold. Move the slider to see how predictions, confusion matrix, and metrics change in real time!

info Model State

Epoch 0
w1 0.0000
w2 0.0000
Bias (b) 0.0000
Train Log Loss
Test Log Loss
bubble_chart Scatter Plot & Decision Boundary
Class 0 (train) Class 1 (train) Class 0 (test) Class 1 (test) Boundary
data_usage Total: 0
model_training Train: 0
quiz Test: 0
ssid_chart Sigmoid Function & Predicted Probabilities
Actual Class 0 Actual Class 1 Threshold
trending_down Log Loss Curve
Train Loss Test Loss
donut_large ROC Curve & AUC
ROC Curve Random (AUC=0.5) Current Threshold
AUC Score
Area Under Curve
At threshold = 0.50
TPR:
FPR:
grid_on Confusion Matrix & Metrics (Train Set)
Pred: 0
Pred: 1
Actual: 0
0TN
0FP
Actual: 1
0FN
0TP
Accuracy
Precision
Recall
F1 Score
school Understanding Log Loss (Binary Cross-Entropy) expand_more

Core Concept Log Loss (also called Binary Cross-Entropy) is the standard loss function for logistic regression. It measures how well predicted probabilities match the actual binary labels.

L = − 1N ∑ [ yi · log(pi) + (1 − yi) · log(1 − pi) ]

Where y is the true label (0 or 1) and p is the model's predicted probability of class 1.

How It Works

  • When y = 1: Loss = −log(p). If the model predicts p ≈ 1, loss is near 0. If p ≈ 0, loss → ∞. The model is heavily punished for being confidently wrong.
  • When y = 0: Loss = −log(1−p). If the model predicts p ≈ 0, loss is near 0. If p ≈ 1, loss → ∞.
  • The logarithmic penalty means the cost grows exponentially as the model becomes more confidently wrong.

Why Use Log Loss?

  • Convexity: Unlike accuracy, log loss produces a smooth, convex cost surface with a single global minimum — perfect for gradient descent optimisation.
  • Probabilistic interpretation: Minimising log loss is equivalent to maximum likelihood estimation (MLE) under a Bernoulli distribution. The model learns the most likely parameters given the data.
  • Gradient quality: The gradient of log loss w.r.t. the weights is simply (p − y) · x, which is smooth and well-behaved — no vanishing or exploding gradients.
  • Calibration: Models trained with log loss tend to produce well-calibrated probabilities, meaning a prediction of 0.8 really does correspond to an 80% chance of class 1.
Example: A patient has cancer (y=1). Model A predicts P=0.9 → loss = −log(0.9) = 0.105. Model B predicts P=0.1 → loss = −log(0.1) = 2.303. Model B is penalised 22× more for being confidently wrong.
school Understanding Log Odds (Logit Function) expand_more

Core Concept Log Odds (also called the logit) is the raw output of the logistic regression model before applying the sigmoid function. It is the natural link between a linear model and probabilities.

z = w1x1 + w2x2 + b    (linear combination — this is the log odds)
odds = p(1−p)      log odds = log(p(1−p)) = z
p = σ(z) = 1(1 + e−z)    (sigmoid converts log odds → probability)

Intuition

  • Odds express probability as a ratio: "3 to 1 odds" means 3 times more likely to happen than not (p = 0.75).
  • Log odds put that ratio on a symmetric, unbounded scale (−∞ to +∞), centred at 0 when the odds are even (p = 0.5).
  • The sigmoid function is simply the inverse of the logit — it squashes any real number back into [0, 1].

Key Reference Points

Log Odds (z) Odds Probability (p)
−4.61:1000.01
−2.21:90.10
−0.412:30.40
01:10.50
+0.413:20.60
+2.29:10.90
+4.6100:10.99

Interpreting Model Weights

Each weight wi tells you: "For every 1-unit increase in feature xi, the log odds of class 1 increase by wi."

Example: If wage = 0.05 in a disease prediction model, then each additional year of age increases the log odds of disease by 0.05 — equivalently, the odds are multiplied by e0.05 ≈ 1.051 (a 5.1% increase in odds per year).

Look at the sigmoid plot above to see how each data point's z-score (log odds) maps to a predicted probability through the sigmoid curve.

Uses of Log Odds

  • Model interpretability: Coefficients have a direct meaning in terms of log odds change per unit feature change.
  • Generalised Linear Models (GLMs): The logit is the canonical link function for binomial regression, connecting linear predictors to binary outcomes.
  • Statistical testing: Wald tests and likelihood-ratio tests on coefficients are performed in the log-odds space.
  • Feature importance: Larger absolute weights indicate features that shift the log odds more, i.e. more influential features.
school How Threshold Changes Predictions expand_more

Key Insight Logistic regression outputs a probability, not a class label. The decision threshold is a separate, tuneable parameter that converts probability into a binary prediction: classify as Class 1 if p ≥ threshold, else Class 0.

ŷ = 1   if   P(class=1|x) ≥ threshold,    else   ŷ = 0

Lowering the Threshold

When you lower the threshold (e.g., from 0.5 to 0.3):

  • More samples are classified as positive (Class 1)
  • Recall increases — fewer actual positives are missed (fewer FN)
  • Precision decreases — more negatives are incorrectly labelled positive (more FP)
  • The decision boundary on the scatter plot shifts, capturing more of the Class 1 region
Use case: Medical screening for a deadly disease. You want to catch every sick patient (high recall), even if it means more false alarms. Set threshold low, e.g. 0.2.

Raising the Threshold

When you raise the threshold (e.g., from 0.5 to 0.7):

  • Fewer samples are classified as positive (Class 1)
  • Precision increases — positive predictions are more reliable (fewer FP)
  • Recall decreases — some actual positives are missed (more FN)
  • The decision boundary shifts, making it harder for a sample to qualify as Class 1
Use case: Email spam filter. You want to be very sure before sending an email to spam (high precision), to avoid blocking important messages. Set threshold high, e.g. 0.8.

How the Threshold Relates to the Decision Boundary

At threshold = 0.5, the decision boundary lies where z = 0 (log odds = 0, meaning equal odds). Moving the threshold changes the effective z-cutoff:

zcutoff = log(threshold1 − threshold)
  • Threshold 0.5 → zcutoff = 0
  • Threshold 0.3 → zcutoff = −0.85 (boundary shifts toward class 0)
  • Threshold 0.7 → zcutoff = +0.85 (boundary shifts toward class 1)

Choosing the Right Threshold

  • Default (0.5): Suitable when classes are balanced and false positives and false negatives are equally costly.
  • Imbalanced classes: When one class is rare, 0.5 often performs poorly. Tune the threshold using the F1 score or PR curve.
  • ROC curve: Plots True Positive Rate vs False Positive Rate across all thresholds — see the interactive ROC plot above! The red dot moves as you change the threshold. The area under the ROC curve (AUC) measures overall model quality regardless of threshold.
  • Business cost: The optimal threshold minimises the total expected cost, which depends on the relative cost of false positives vs false negatives in your specific domain.

Try it now! Use the Decision Threshold slider in the controls panel and watch how the scatter plot, sigmoid chart, confusion matrix, and all metrics update simultaneously.