Logistic Regression Tool - AI Innovation Hub

tune Controls

scatter_plot Data Generation

Points per Class: 40

Class Separation: 2.0

Noise (Std Dev): 1.0

Train / Test Split: 80% train

settings Model Configuration

Learning Rate (α)

Max Epochs

Log Loss is used as the cost function. It heavily penalises confident wrong predictions.

play_circle Training

Animation Speed: 5

vertical_align_center Decision Threshold

Threshold: 0.50

Classify as Class 1 when P ≥ threshold. Move the slider to see how predictions, confusion matrix, and metrics change in real time!

info Model State

Epoch 0

w₁ 0.0000

w₂ 0.0000

Bias (b) 0.0000

Train Log Loss —

Test Log Loss —

bubble_chart Scatter Plot & Decision Boundary

Class 0 (train) Class 1 (train) Class 0 (test) Class 1 (test) Boundary

data_usage Total: 0

model_training Train: 0

quiz Test: 0

ssid_chart Sigmoid Function & Predicted Probabilities

Actual Class 0 Actual Class 1 Threshold

trending_down Log Loss Curve

Train Loss Test Loss

donut_large ROC Curve & AUC

ROC Curve Random (AUC=0.5) Current Threshold

AUC Score

—

Area Under Curve

At threshold = 0.50

TPR: —

FPR: —

grid_on Confusion Matrix & Metrics (Train Set)

Pred: 0

Pred: 1

Actual: 0

0TN

0FP

Actual: 1

0FN

0TP

—

Accuracy

—

Precision

—

Recall

—

F1 Score

school Understanding Log Loss (Binary Cross-Entropy) expand_more

Core Concept Log Loss (also called Binary Cross-Entropy) is the standard loss function for logistic regression. It measures how well predicted probabilities match the actual binary labels.

L = − ¹⁄_N ∑ [ y_i · log(p_i) + (1 − y_i) · log(1 − p_i) ]

Where y is the true label (0 or 1) and p is the model's predicted probability of class 1.

How It Works

When y = 1: Loss = −log(p). If the model predicts p ≈ 1, loss is near 0. If p ≈ 0, loss → ∞. The model is heavily punished for being confidently wrong.
When y = 0: Loss = −log(1−p). If the model predicts p ≈ 0, loss is near 0. If p ≈ 1, loss → ∞.
The logarithmic penalty means the cost grows exponentially as the model becomes more confidently wrong.

Why Use Log Loss?

Convexity: Unlike accuracy, log loss produces a smooth, convex cost surface with a single global minimum — perfect for gradient descent optimisation.
Probabilistic interpretation: Minimising log loss is equivalent to maximum likelihood estimation (MLE) under a Bernoulli distribution. The model learns the most likely parameters given the data.
Gradient quality: The gradient of log loss w.r.t. the weights is simply (p − y) · x, which is smooth and well-behaved — no vanishing or exploding gradients.
Calibration: Models trained with log loss tend to produce well-calibrated probabilities, meaning a prediction of 0.8 really does correspond to an 80% chance of class 1.

Example: A patient has cancer (y=1). Model A predicts P=0.9 → loss = −log(0.9) = 0.105. Model B predicts P=0.1 → loss = −log(0.1) = 2.303. Model B is penalised 22× more for being confidently wrong.

school Understanding Log Odds (Logit Function) expand_more

Core Concept Log Odds (also called the logit) is the raw output of the logistic regression model before applying the sigmoid function. It is the natural link between a linear model and probabilities.

z = w₁x₁ + w₂x₂ + b (linear combination — this is the log odds)

odds = ^p⁄_(1−p) log odds = log(^p⁄_(1−p)) = z

p = σ(z) = ¹⁄_{(1 + e^−z)} (sigmoid converts log odds → probability)

Intuition

Odds express probability as a ratio: "3 to 1 odds" means 3 times more likely to happen than not (p = 0.75).
Log odds put that ratio on a symmetric, unbounded scale (−∞ to +∞), centred at 0 when the odds are even (p = 0.5).
The sigmoid function is simply the inverse of the logit — it squashes any real number back into [0, 1].

Key Reference Points

Log Odds (z)	Odds	Probability (p)
−4.6	1:100	0.01
−2.2	1:9	0.10
−0.41	2:3	0.40
0	1:1	0.50
+0.41	3:2	0.60
+2.2	9:1	0.90
+4.6	100:1	0.99

Interpreting Model Weights

Each weight w_i tells you: "For every 1-unit increase in feature x_i, the log odds of class 1 increase by w_i."

Example: If w_age = 0.05 in a disease prediction model, then each additional year of age increases the log odds of disease by 0.05 — equivalently, the odds are multiplied by e^0.05 ≈ 1.051 (a 5.1% increase in odds per year).

Look at the sigmoid plot above to see how each data point's z-score (log odds) maps to a predicted probability through the sigmoid curve.

Uses of Log Odds

Model interpretability: Coefficients have a direct meaning in terms of log odds change per unit feature change.
Generalised Linear Models (GLMs): The logit is the canonical link function for binomial regression, connecting linear predictors to binary outcomes.
Statistical testing: Wald tests and likelihood-ratio tests on coefficients are performed in the log-odds space.
Feature importance: Larger absolute weights indicate features that shift the log odds more, i.e. more influential features.

school How Threshold Changes Predictions expand_more

Key Insight Logistic regression outputs a probability, not a class label. The decision threshold is a separate, tuneable parameter that converts probability into a binary prediction: classify as Class 1 if p ≥ threshold, else Class 0.

ŷ = 1 if P(class=1|x) ≥ threshold, else ŷ = 0

Lowering the Threshold

When you lower the threshold (e.g., from 0.5 to 0.3):

More samples are classified as positive (Class 1)
Recall increases — fewer actual positives are missed (fewer FN)
Precision decreases — more negatives are incorrectly labelled positive (more FP)
The decision boundary on the scatter plot shifts, capturing more of the Class 1 region

Use case: Medical screening for a deadly disease. You want to catch every sick patient (high recall), even if it means more false alarms. Set threshold low, e.g. 0.2.

Raising the Threshold

When you raise the threshold (e.g., from 0.5 to 0.7):

Fewer samples are classified as positive (Class 1)
Precision increases — positive predictions are more reliable (fewer FP)
Recall decreases — some actual positives are missed (more FN)
The decision boundary shifts, making it harder for a sample to qualify as Class 1

Use case: Email spam filter. You want to be very sure before sending an email to spam (high precision), to avoid blocking important messages. Set threshold high, e.g. 0.8.

How the Threshold Relates to the Decision Boundary

At threshold = 0.5, the decision boundary lies where z = 0 (log odds = 0, meaning equal odds). Moving the threshold changes the effective z-cutoff:

z_cutoff = log(^threshold⁄_{1 − threshold})

Threshold 0.5 → z_cutoff = 0
Threshold 0.3 → z_cutoff = −0.85 (boundary shifts toward class 0)
Threshold 0.7 → z_cutoff = +0.85 (boundary shifts toward class 1)

Choosing the Right Threshold

Default (0.5): Suitable when classes are balanced and false positives and false negatives are equally costly.
Imbalanced classes: When one class is rare, 0.5 often performs poorly. Tune the threshold using the F1 score or PR curve.
ROC curve: Plots True Positive Rate vs False Positive Rate across all thresholds — see the interactive ROC plot above! The red dot moves as you change the threshold. The area under the ROC curve (AUC) measures overall model quality regardless of threshold.
Business cost: The optimal threshold minimises the total expected cost, which depends on the relative cost of false positives vs false negatives in your specific domain.

Try it now! Use the Decision Threshold slider in the controls panel and watch how the scatter plot, sigmoid chart, confusion matrix, and all metrics update simultaneously.