Lecture 8 - (05/03/2026)

Today’s Topics:

Probability
Confidence Intervals
Bias Variance Tradeoff

Probability¶

Probability Density¶

If X is a continuous random variable, then there exists unique nonnegative functions, $f(x)$ and $F(x)$ where:

$f(x)$ is the probability density function
$F(x)$ is the cumulative distribution function

f(x) = \frac{1}{\sigma \sqrt{2 \pi }}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

(1)

where:

$\mu$ = mean of $x$
$\sigma$ = standard deviation of $x$
$\pi$ $\approx$ 3.14159...
$e$ $\approx$ 2.71828...

\mathbb{P}(a \leq X \leq b) = \int_a^b f(x)dx = F(b)-F(a)

(2)

import numpy as np
import plotly.graph_objects as go
from scipy.stats import norm

mu, sigma = 0, 1
x = np.linspace(-4, 4, 400)
pdf = norm.pdf(x, mu, sigma)
cdf = norm.cdf(x, mu, sigma)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=x,
    y=pdf,
    mode="lines",
    name="PDF",
    line=dict(color="black")
))

fig.add_trace(go.Scatter(
    x=[],
    y=[],
    fill="tozeroy",
    mode="lines",
    fillcolor="rgba(0,100,255,0.4)",
    line=dict(width=0),
    name="Integrated Area"
))

fig.add_trace(go.Scatter(
    x=[],
    y=[],
    mode="lines",
    name="CDF",
    line=dict(color="red", width=3)
))

steps = []

for i in range(len(x)):
    xs = x[:i+1]
    ys_pdf = pdf[:i+1]
    ys_cdf = cdf[:i+1]

    step = dict(
        method="update",
        args=[{
            "x": [x, xs, xs],
            "y": [pdf, ys_pdf, ys_cdf]
        }],
        label=f"{x[i]:.2f}"
    )
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Integrate up to x = "},
    pad={"t": 50},
    steps=steps
)]

fig.update_layout(
    sliders=sliders,
    xaxis_title="x",
    yaxis_title="Value",
)

fig.show()

Loading...

But why do we care about this very specific distribution?

The Central Limit Theorem (CLT) states that the sample mean of a sufficiently large number of i.i.d. random variables is approximately normally distributed. The larger the sample, the closer it will be to normal.

Here is a good visualization to show how it comes to be

But not everything we observe is continious, some values are discrete.

Probability Mass¶

The probability mass function (PMF) or the distribution of a random variable X provides the probability that X takes on each of its possible values.

If we let $\mathbb{X}$ be the set of values that $X$ can take on. The PMF of $X$ must satisfy the following rules:

$\sum_{x \in \mathbb{X}} P(X=x) = 1$
$\forall x \in \mathbb{X}, 0 \leq P(X=x) \leq 1$

Given the following table:

Name	Age
Alice	50
Bob	52
Charlie	51
Diana	50

What is the probability that someone chosen, uniformly and at random from this set is 50?

Let $Z$ be a random variable representing the age of the first person chosen uniformly and at random from our set, what is:

$P(Z=50)$
$P(Z=51)$
$P(Z=52)$
$P(Z=53)$
$P(0 \leq Z \leq 51)$

We want the fraction of time each value occurs.

Let’s use the accumulator design pattern to automate this:

Initialize count to 0. (not a single value, but an array of values).
Count each value.
Divide through by overall total.

Let $X$ represent the result of one roll from a fair six-sided die.

We know that $x \in \{1,2,3,4,5,6\}$ and $P(X=1) = P(X=2) = P(X=3) = P(X=4) = P(X=5) = P(X=6) = \frac{1}{6}$

We can use this to plot the PMF of X for N trials

Roll a 6-sided die
Add the sum and repeat N times
Divide counts through by N

import plotly.graph_objects as go
import numpy as np

rolls = np.random.randint(1, 7, 1000)

counts = [0,0,0,0,0,0]

for r in rolls:
    counts[r-1] += 1

total = sum(counts)
pmf = [c/total for c in counts]

x = [1,2,3,4,5,6]

fig = go.Figure()

fig.add_trace(go.Bar(
    x=x,
    y=pmf,
    text=[f"{p:.3f}" for p in pmf],
    textposition="outside"
))

fig.update_layout(
    xaxis_title="Dice Value",
    yaxis_title="Probability",
    yaxis=dict(range=[0,1])
)

fig.show()

Loading...

This specfic distribution is called a uniform distribution because each outcome is equally likely.

Right now we need to make a guess to say it’s within reason, let’s formalise a way to confidently say that they are equal.

Confidence Intervals¶

Assuming that the data is normally distributed, the 95% confidence interval can be computed as:

Lower bound: $\bar x - z \frac{\sigma}{\sqrt{n}}$
Upper bound: $\bar x + z \frac{\sigma}{\sqrt{n}}$

where:

$\bar x$ = sample mean
$z$ = confidence level value
$\sigma$ = standard deviation
$n$ = sample size

Condifence level is inversely related to the statistical significance ( $\alpha$ ), here are some of the common ones used.

Confidence Level	90%	95%	99%
$\alpha$	0.1	0.05	0.01
$z$	1.64	1.96	2.57

Let’s take a look at confidence intervals in action

Bias-Variance Tradeoff¶

Tradeoff in choosing models:

Model Bias: Too simple to represent the underlying data generation process
Model Variance: Complex but fits the noise in the data rather than the data’s overall pattern.

To reduce bias:

Choose a more complicated model
Add more features

To reduce variance:

Simplify the model
Add more data

\begin{aligned} MSE &= \mathbb{E}\big[(\hat{f}(x) - f(x))^2\big]\\ &= \mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)] + \mathbb{E}[\hat{f}(x)] - f(x))^2\big] \\ &= \mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])^2\big] + (\mathbb{E}[\hat{f}(x)] - f(x))^2 \\ &\quad + 2\,\mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])(\mathbb{E}[\hat{f}(x)] - f(x))\big] \\ &= \underbrace{\mathrm{Var}[\hat{f}(x)]}_{\text{variance}} + \underbrace{(\mathrm{Bias}[\hat{f}(x)])^2}_{\text{bias}^2} + \underbrace{\sigma^2}_{\text{irreducible error}} \end{aligned}

(3)

Even in the most perfect model, we will need to balance some form of bias and variance to find the local minima.

import numpy as np
import plotly.graph_objects as go
from numpy.polynomial.polynomial import Polynomial

np.random.seed(0)
x = np.linspace(0, 10, 20)
y_true = np.sin(x)
y = y_true + np.random.normal(0, 0.3, size=x.shape)

degrees = list(range(1, 15))

fits = []
x_fit = np.linspace(0, 10, 200)
for d in degrees:
    coeffs = np.polyfit(x, y, d)
    y_fit = np.polyval(coeffs, x_fit)
    fits.append(y_fit)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=x, y=y,
    mode='markers',
    name='Data',
    marker=dict(color='black', size=8)
))

fig.add_trace(go.Scatter(
    x=x_fit, y=fits[0],
    mode='lines',
    name=f'Best fit',
    line=dict(color='blue')
))

steps = []
for i, d in enumerate(degrees):
    step = dict(
        method='update',
        args=[{
            'y':[y, fits[i]],
        },
        {'annotations':[dict(
            x=5, y=1.5, text=f"Polynomial Degree: {d}", showarrow=False,
            font=dict(size=16)
        )]}],
        label=str(d)
    )
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Polynomial Degree: "},
    pad={"t": 50},
    steps=steps
)]

fig.update_layout(
    sliders=sliders,
    annotations=[dict(
        x=5, y=1.5, text=f"Polynomial Degree: {degrees[0]}", showarrow=False,
        font=dict(size=16)
    )]
)

fig.show()

Loading...