Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Lecture 8 - (05/03/2026)

Today’s Topics:

  • Probability

  • Confidence Intervals

  • Bias Variance Tradeoff

Probability

Probability Density

If X is a continuous random variable, then there exists unique nonnegative functions, f(x)f(x) and F(x)F(x) where:

  • f(x)f(x) is the probability density function

  • F(x)F(x) is the cumulative distribution function

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma \sqrt{2 \pi }}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

where:

  • μ\mu = mean of xx

  • σ\sigma = standard deviation of xx

  • π\pi \approx 3.14159...

  • ee \approx 2.71828...

P(aXb)=abf(x)dx=F(b)F(a)\mathbb{P}(a \leq X \leq b) = \int_a^b f(x)dx = F(b)-F(a)
import numpy as np
import plotly.graph_objects as go
from scipy.stats import norm

mu, sigma = 0, 1
x = np.linspace(-4, 4, 400)
pdf = norm.pdf(x, mu, sigma)
cdf = norm.cdf(x, mu, sigma)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=x,
    y=pdf,
    mode="lines",
    name="PDF",
    line=dict(color="black")
))

fig.add_trace(go.Scatter(
    x=[],
    y=[],
    fill="tozeroy",
    mode="lines",
    fillcolor="rgba(0,100,255,0.4)",
    line=dict(width=0),
    name="Integrated Area"
))

fig.add_trace(go.Scatter(
    x=[],
    y=[],
    mode="lines",
    name="CDF",
    line=dict(color="red", width=3)
))

steps = []

for i in range(len(x)):
    xs = x[:i+1]
    ys_pdf = pdf[:i+1]
    ys_cdf = cdf[:i+1]

    step = dict(
        method="update",
        args=[{
            "x": [x, xs, xs],
            "y": [pdf, ys_pdf, ys_cdf]
        }],
        label=f"{x[i]:.2f}"
    )
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Integrate up to x = "},
    pad={"t": 50},
    steps=steps
)]

fig.update_layout(
    sliders=sliders,
    xaxis_title="x",
    yaxis_title="Value",
)

fig.show()
Loading...

But why do we care about this very specific distribution?

The Central Limit Theorem (CLT) states that the sample mean of a sufficiently large number of i.i.d. random variables is approximately normally distributed. The larger the sample, the closer it will be to normal.

Here is a good visualization to show how it comes to be

But not everything we observe is continious, some values are discrete.

Probability Mass

The probability mass function (PMF) or the distribution of a random variable X provides the probability that X takes on each of its possible values.

If we let X\mathbb{X} be the set of values that XX can take on. The PMF of XX must satisfy the following rules:

  • xXP(X=x)=1\sum_{x \in \mathbb{X}} P(X=x) = 1

  • xX,0P(X=x)1\forall x \in \mathbb{X}, 0 \leq P(X=x) \leq 1

Given the following table:

NameAge
Alice50
Bob52
Charlie51
Diana50

What is the probability that someone chosen, uniformly and at random from this set is 50?

Let ZZ be a random variable representing the age of the first person chosen uniformly and at random from our set, what is:

  • P(Z=50)P(Z=50)

  • P(Z=51)P(Z=51)

  • P(Z=52)P(Z=52)

  • P(Z=53)P(Z=53)

  • P(0Z51)P(0 \leq Z \leq 51)

We want the fraction of time each value occurs.

Let’s use the accumulator design pattern to automate this:

  • Initialize count to 0. (not a single value, but an array of values).

  • Count each value.

  • Divide through by overall total.

Let XX represent the result of one roll from a fair six-sided die.

We know that x{1,2,3,4,5,6}x \in \{1,2,3,4,5,6\} and P(X=1)=P(X=2)=P(X=3)=P(X=4)=P(X=5)=P(X=6)=16P(X=1) = P(X=2) = P(X=3) = P(X=4) = P(X=5) = P(X=6) = \frac{1}{6}

We can use this to plot the PMF of X for N trials

  • Roll a 6-sided die

  • Add the sum and repeat N times

  • Divide counts through by N

import plotly.graph_objects as go
import numpy as np

rolls = np.random.randint(1, 7, 1000)

counts = [0,0,0,0,0,0]

for r in rolls:
    counts[r-1] += 1

total = sum(counts)
pmf = [c/total for c in counts]

x = [1,2,3,4,5,6]

fig = go.Figure()

fig.add_trace(go.Bar(
    x=x,
    y=pmf,
    text=[f"{p:.3f}" for p in pmf],
    textposition="outside"
))

fig.update_layout(
    xaxis_title="Dice Value",
    yaxis_title="Probability",
    yaxis=dict(range=[0,1])
)

fig.show()
Loading...

This specfic distribution is called a uniform distribution because each outcome is equally likely.

Right now we need to make a guess to say it’s within reason, let’s formalise a way to confidently say that they are equal.

Confidence Intervals

Assuming that the data is normally distributed, the 95% confidence interval can be computed as:

  • Lower bound: xˉzσn\bar x - z \frac{\sigma}{\sqrt{n}}

  • Upper bound: xˉ+zσn\bar x + z \frac{\sigma}{\sqrt{n}}

where:

  • xˉ\bar x = sample mean

  • zz = confidence level value

  • σ\sigma = standard deviation

  • nn = sample size

Condifence level is inversely related to the statistical significance (α\alpha), here are some of the common ones used.

Confidence Level90%95%99%
α\alpha0.10.050.01
zz1.641.962.57

Let’s take a look at confidence intervals in action

Bias-Variance Tradeoff

Tradeoff in choosing models:

  • Model Bias: Too simple to represent the underlying data generation process

  • Model Variance: Complex but fits the noise in the data rather than the data’s overall pattern.

To reduce bias:

  • Choose a more complicated model

  • Add more features

To reduce variance:

  • Simplify the model

  • Add more data

MSE=E[(f^(x)f(x))2]=E[(f^(x)E[f^(x)]+E[f^(x)]f(x))2]=E[(f^(x)E[f^(x)])2]+(E[f^(x)]f(x))2+2E[(f^(x)E[f^(x)])(E[f^(x)]f(x))]=Var[f^(x)]variance+(Bias[f^(x)])2bias2+σ2irreducible error\begin{aligned} MSE &= \mathbb{E}\big[(\hat{f}(x) - f(x))^2\big]\\ &= \mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)] + \mathbb{E}[\hat{f}(x)] - f(x))^2\big] \\ &= \mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])^2\big] + (\mathbb{E}[\hat{f}(x)] - f(x))^2 \\ &\quad + 2\,\mathbb{E}\big[(\hat{f}(x) - \mathbb{E}[\hat{f}(x)])(\mathbb{E}[\hat{f}(x)] - f(x))\big] \\ &= \underbrace{\mathrm{Var}[\hat{f}(x)]}_{\text{variance}} + \underbrace{(\mathrm{Bias}[\hat{f}(x)])^2}_{\text{bias}^2} + \underbrace{\sigma^2}_{\text{irreducible error}} \end{aligned}

Even in the most perfect model, we will need to balance some form of bias and variance to find the local minima.

image
import numpy as np
import plotly.graph_objects as go
from numpy.polynomial.polynomial import Polynomial

np.random.seed(0)
x = np.linspace(0, 10, 20)
y_true = np.sin(x)
y = y_true + np.random.normal(0, 0.3, size=x.shape)

degrees = list(range(1, 15))

fits = []
x_fit = np.linspace(0, 10, 200)
for d in degrees:
    coeffs = np.polyfit(x, y, d)
    y_fit = np.polyval(coeffs, x_fit)
    fits.append(y_fit)

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=x, y=y,
    mode='markers',
    name='Data',
    marker=dict(color='black', size=8)
))

fig.add_trace(go.Scatter(
    x=x_fit, y=fits[0],
    mode='lines',
    name=f'Best fit',
    line=dict(color='blue')
))

steps = []
for i, d in enumerate(degrees):
    step = dict(
        method='update',
        args=[{
            'y':[y, fits[i]],
        },
        {'annotations':[dict(
            x=5, y=1.5, text=f"Polynomial Degree: {d}", showarrow=False,
            font=dict(size=16)
        )]}],
        label=str(d)
    )
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "Polynomial Degree: "},
    pad={"t": 50},
    steps=steps
)]

fig.update_layout(
    sliders=sliders,
    annotations=[dict(
        x=5, y=1.5, text=f"Polynomial Degree: {degrees[0]}", showarrow=False,
        font=dict(size=16)
    )]
)

fig.show()
Loading...
image