Skip to main content# [R] Inferential Statistics IV: Uniform and (Standard) Normal Probability (Density) Functions

# Introduction:

# 1 Uniform PMFs, PDFs and CDFs ─ Uniform Probability Mass Functions (Discrete), Probability Density Functions (Continuous) and the Cumulative Distribution Function (Continuous)

## 1.1 Uniform Probability Mass Function (Discrete)

## 1.2 Uniform Probability Density Function (continuous)

**1.3 The Cumulative Distribution Function of a Uniform PDF**

# 2 The (Standard) Normal Probability Distribution ─ Gaussian PDFs and their corresponding CDFs

## 2.1 The Central Limit Theorem

## 2.2 The Basic Gaussian Function

## 2.2.1 Optional Facts on

## 2.3 The Parametrized Gaussian Function

## 2.4 The Parametrized Gaussian Probability Density Function (Normal PDF)

## 2.5 Integrating Non-Param. and Param. Gaussian Functions (Optional)

## 2.6 Integrating the (Standard) Normal PDF (including CDF)

# 3 Standardization ─ Z-Scores, Z-Tables (Standard Normal Tables)

# 4 Brief Outlook into Inferential Statistics V

Intermediate Stat-o-Sphere

Published onAug 18, 2023

[R] Inferential Statistics IV: Uniform and (Standard) Normal Probability (Density) Functions

** Review:** This is a pre-released article and we are currently looking for reviewers. Contact us via [email protected]. You can also directly send us your review, or use our open peer-review functions via pubpub (create an account here). In order to comment mark some text and click on “Start discussion”, or just comment at the end of an article.

*Corresponding R script: *

*Summary:*

Welcome to the fourth part of our series! In this part we are going to focus on several kinds of probability functions that we need to understand in order to be able to finally perform a z- and t-test (we will actually perform most of the steps in this tutorial already). Note that we will cover t-distributions in the next part of this series, along with the z- and t-test.

**We will start with simple uniform distributions**, in order to completly demystify what a probability *density* function actually is. Plotting density functions is usually recommended to give a visual/descriptive persepctive if a normal distribution (or some other) is given in the data. However, the **focus of this tutorial will lay on Gaussian bell-curved normal distributions**. We will also take a look at the semantics of the term “normal” and how its use in probability theory differs from the every day cultural use of the word “normal”.

Apart from that, we will learn how to integrate a Gaussian function and will learn what z- and t-tables are. These tables can be used to sort of skip the process of actually performing such an integration of a normal distribution, which is a common procedure when performing a z- or t-test by hand, before computers became suffiently powerful. Since we are living in the 21. century, we will also provide R code to create a z-table for yourself.

We again recommend to make use of the **contents function (upper right corner)**, to reorient yourself within the script. Since we are approaching higher spheres, we will not provide summaries at the end of every chapter, but we still (as always!) provided a downloadable summary above.

**Note that we are not going to talk about linear modelling in this tutorial.** All examples will only concern regular distributions of data, which is equivalent to dropping the independent variable by only focussing on the (mean of the) dependent variable. **We will still occasionally use Lady Meow as an example, but if we do so, we will only look at the average amount of tea Lady Meow drank (mean of a sample) and will ***not*** consider an independet relation to time!** In the next part of this series, we will discuss the z-test and t-test for both cases, regular distributions and hypothesis testing for linear models!

Until now we haven’t talked much about probability functions, yet. So, at first we have to entangle some terminology. We already know what a **discrete and a continuous distribution** is: In the first part of this series, we worked with discrete and mostly binary (categorical) probability distributions (e.g. theta and its complement or ‘heads or tails’ as variables; we didn’t represent it as a function of several trials though). Our linear model in the second part of this series however was a continuous function, but did not consist of probabilistic values on neither the x- nor the y-axis (but estimates of measurements).

**Next we are going to go through three special types of probability functions: the probability mass function (PMF), the probability density functions (PDF), as well as the cumulative distribution function (CDF) of a PDF (***in nuce*** a CDF is a function of possible p-values — no worries, we will get there and it is far more simple then it sounds).**

There are also simple probability functions, such as a likelihood function, binominal functions and others. However, we will at first stick with the above three for now. You may also have heard of the **maximum likelihood method** and estimate. **We will get to it in further parts of this tutorial series and then explicitly discuss regular probability functions. **

**In case it appears confusing upfront concerning the term likelihood: **note that the classic p-value obtained via integrating a probability *density* function (PDF) *also represents a likelihood*, but the PDF itself is not a simple probability likelihood *function*, it is a probability *density* function. Again, no worries it is all much easier than it sounds, promised. Especially the concept of a density instantly makes sense when looking at a uniform distribution.

**So, for the sake of simplicity and to really get hold of some basic differences between PMFs, PDFs and CDFs,** **we decided to start with a uniform distribution**, before approaching the standard normal probability density function, commonly used to evaluate a *z-value* (only slightly different to a t-value) and subsequently a *p-value*. **This will take a few steps, but we promise that it is totally worth it. As mentioned, especially the meaning of the concept “density” (y-axis) can be confusing and is easier to understand given a uniform distribution first (e.g., via a six-sided dice with a ****for each dice value (uniform)** **as example).**

**However, you can of course skip this part if you want to get right into Gaussian functions, if you wish! **

For our discrete uniform distribution we can operate similarly as we did in the third part of this series, where we discussed the weighted arithmetic mean: we can obtain the probability of each dice value occurring via dividing 1 by the length **This will result in a uniform or constant discrete set of probabilities that sum to 1.**

```
# Probability mass function for p(x_i) = 1/6:
dice = c(1:6)
px = 1/length(dice) # 1/n == for uniform probability distr.
# Written as vector of length(dice_value_x):
px = c(rep(1/length(dice),length(dice)))
# Plot discrete probabilities:
plot(x=dice,y=px, ylim= c(0,1), xlim = c(0,7))
```

Our plot will look like this and is just one small step away of representing a probability mass function!

Mathematically a **probability mass function now can be defined as follows:** **If** the dice value **then** the probability of such a zero dice value occurring is 0. **If** the dice value has a (discrete) range from 1 to 6, **then** the probability will always be **if** the dice value **then** **else** 0.

Using R, we can translate the above into a function using the conditional operators

and **if()**

as well as a **else if()****for-loop** to check every possible value and calculate its probability! This means that our function will also be able to output a **the input values of **** for the function below can therefore be a sequence of values below and beyond the number of possible dice values, without returning an error**.

**The below function is also written to work with any length of dice values, treating them all as uniform or fair dice.** This may not always correctly correspond to real circumstances, thinking of, e.g., a uniform 19-sided dice, but expanding the length of

**CAVE:** Being able to understand logic operators, such as **if** and **else**, is an universal ability that can be applied to any kind of programming language (mostly only slightly different “syntax” is used (= different use of signs for programing, so to speak)).

```
# Probability mass function for fair dice with n_sides:
dice_px = function(x,n_sides){ # x = sequence / range of possible values
dice_val = c(1:n_sides)
px = seq # initialize vector for loop with 1:length(x)
for(i in 1:length(x)){ # := for every object of the vector x
if(x[i] < min(dice_val)){
px[i] = 0
}
else if(x[i] >= min(dice_val) & x[i] <= max(dice_val)){
px[i] = 1/n_sides
}
else if(x[i] > max(dice_val)){
px[i] = 0
}
} # End for i
# Plot as probability mass function:
plot(x=seq,y=px, ylim = c(0:1))
for(i in 1:n_sides){
segments(x0=dice_val[i],y0=0,x1=dice_val[i],y1=1/n_sides, col = "blue")
} # End for i
# Return px in console:
return(px)
} # End of function
# Let us test our function:
seq = c(0:7)
px = dice_px(x=seq,n_sides=6)
# 0.1666667 == 1/6
# [1] 0.0000000 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667
# [8] 0.0000000
# Plot discrete probabilities only:
plot(x=seq,y=px, ylim = c(0:1))
```

**To wrap this up:** In the case of a PMF the y-axis refers to the probability of the event occurring and the x-axis to the corresponding event — in our case an event is cast as a single dice value of a six-sided dice. However, we will see that different to “regular” probability functions, and different to our probability mass function, the y-axis (density) of a density functions (continuous) does ** not** refer to the corresponding probabilities, due to rather simple matters of integration (a PDF sort of blurs the meaning of the y-axis).

**You might have also come across examples regarding functions where the y-axis refers to a certain number of occurrences of a certain value of ****, ***not*** a probability by itself (e.g., the number of occurrences of each dice value, often plotted in the form of a histogram). However, dividing the number of individual occurrences by the total number **** of occurrences will scale the y-axis to a percentage or probabilistic value without changing the appearance of the graph (just normalizes the values of the y-axis).**

Let us say, we rolled our dice 600 times and ended up with the below (idealized) uniform distribution. Below you will find the formula that lets us translate frequencies into probabilities!

**So now, what can we do with a PMF, apart from calculating the probability of one certain dice value occurring?** Say, we want to roll a 3, a 4 or a 5 for some reason during a game where one six-sided dice is involved. What is the probability that we are rolling either a 3, a 4 or a 5? A PMF can help us evaluate the probability of that “range of discrete values”. **Note that** **the term “area”, and sometimes also “range”, is commonly associated to ***continuous*** functions: the integral of a function as the “area under the curve”**. Just keep in mind!

In order to evaluate the probability of rolling a 3, 4 or a 5, we can simply sum the respective probabilities:

```
# Probability of rolling a 3, 4 or 5, i.e., a range so to speak:
px_3to5 = px[3]+px[4]+px[5]
px_3to5 = sum(px[3:5])
# [1] 0.5
```

We could also add the mean and the standard deviation of our uniform PMF, but it would not add much to the story.

**Let us now directly move to the topic of a uniform probability density function (continuous).**

Defining a PMF was rather easy. As mentioned, the relation between the value of the y-axis and x-axis totally makes sense, since the y-axis just refers to the probability of each individual x (*not* just the probability of a discrete event occurring, *but* there is a *simple reason* why this is the case: **the constrain of a PDF is that the integral of a PDF hast to sum to 1!**

**Let us first use basic R functions to plot a uniform probability density function via**

. Mathematically, we will partially follow a video on uniform PDFs from the channel “The Organic Chemistry Tutor”.**dunif()**

```
# Plot Uniform PDF using basic R functions:
x = seq(0,7, length = 10000) # try length 50 and plot again!
# Uniform density for each x:
y = dunif(x, min = 1, max = 6)
# Plot uniform PDF:
plot(x,y, type = "l", ylim = c(0,1))
```

The definition of a uniform PDF goes as follows and is not that far away from the definition of our PMF, as we will see:

To understand why

Since we are working with a simple rectangle, we can obtain the area of that rectangle simply by multiplying *complete* integral of our uniform curve. See plot below:

In case you wonder: It may sound weird, but yes, a continuous uniform function can be referred to as *curve* in mathematics.

So, now we know that the width is

**Moving forward to PDFs we also already know that the area under our uniform curve (total integral) has to sum to 1 ─ and this will turn out as reason why the density will not be related to the probability, but the integrated area under the curve indeed does. In other words, **the fact that the integral has to sum to 1 dictates what the values of the y-axis will look like in a rather abstract way only. **Still not clear? Let’s go through it step-by-step:**

**Let’s write down what we know about a uniform PDF, this time using **** and ***possible* values (1 to 6 for a six-sided dice)!

**Doing some very, very simple rearrangements** will get us to the already mentioned definition of a uniform probability density function. Below you will also find some code to play around with:

```
# Uniform PDF
unif_PDF = function(a,b,x){
fx = x # initialize output vector of length(x), so we just use x
for(i in 1:length(x)){ # := for every object of the vector x
if(x[i] < a){
fx[i] = 0
}
else if(x[i] >= a & x[i] <= b){
fx[i] = 1/(b-a)
}
else if(x[i] > b){
fx[i] = 0
}
} # End for i
return(fx)
} # End of function
# Test:
test = unif_PDF(a = 1,b = 6,x = x)
plot(y = test, x = x, type = "l", ylim = c(0,1))
```

**Different to our PMF, we can now determine a certain area under the curve concerning infinitely small values in-between discrete values.** **NOTE:** We are now leaving the dice example as a graspable “real example” for the purpose of mathematical abstraction, since a “continuous but bounded dice” would mean that our dice would range from values between 1 to 6 *and at the same time* actually has infinitely small sides in-between the mentioned range of values. **Of course, we are leaving the realm of simple facts here.**

However, this problem is actually somewhat addressed to in the math, as we will see: the integral of a specific *point* **This is again different to simple continuous probability function: the integral of such simple probability functions doesn’t necessarily sum to 1 (since such a regular probability function is not normalized to do so). However, in contrast to a PDF, we can read out the probability of a ***certain singular event*** from the y-axis of a ***regular*** probability function right away, without the need of any integration. So both serve their purpose, so to speak.**

However, nevertheless how realistic rolling a dice with infinite sides may be, let us still again determine the area between 3 and 5. Our chosen values 3 and 5 can also be understood as a lower and upper bound of our *desired* integral (not the same as min. / max. of our whole curve). In the plot below we will just call the variables of our desired area

In order to calculate the area of interest, we can simply use the formulas we already know, again treating our uniform curve as a rectangle. Doing some rearrangements will result in a simple ratio between areas under the curve (see last line below):

```
# Area under the curve via multiplication:
upper_bo = 5 # set upper bound
low_bo = 3 # set lower bound
a_min = 1 # set minimum
b_max = 6 # set maximum, else f(x) = 0 anyways...
unif_area = (upper_bo-low_bo)*(1/(b_max-a_min))
# [1] 0.4
# Integration of a uniform prob. density function via ratio
# area of interest divided by total area:
# P(x_1<=X<=x_2):
P_area = (5-3)/(6-1)
# [1] 0.4
```

**You may wonder why the PMF of a six-sided dice in the same area gives a probability of **** and with a PDF we get **** as a result?** The reason is again rather abstract, as we have to recall that in contrast to our PMF our “continuous dice” would need to have infinite sides to relate to a continuous density function, while at the same time only leading to values between 1 to 6 in total. Due to the characteristic of infinity even the area between 3 and 5 itself would also already conceptually entail infinite sides of a dice(!!). **Without getting to fuzzy about it:** our PDF above just does not represent a discrete six sided-dice anymore, keep in mind. :D

**Let us now check for the special case where the area of interest is equal to the total area of our uniform PDF, which should result in 1:**

```
# P(x_1=a<=X<=x_2=b) := area of interest == total area:
P_area = (b_max-a_min)/(b_max-a_min)
P_area = (6-1)/(6-1)
# [1] 1
```

As mentioned already, when asking for the probability of a specific individual value of *density* function, the answer will always be: a probability of 0, different to a regular probability function (or our PMF). Looking at the above formula and recalling some math from the second part of this tutorial series, **you may instantly spot why this is the case:** we are again working with a relation (division) of differences (similar to the slope), so we do not want to encounter a 0 in the numerator in any case:

```
# P(c<=X<=c) := probability of only one point of, e.g., x = 5:
P_area = (5-5)/(6-1)
# Results in 0/5 = 0
# [1] 0
```

**For completion we will now quickly calculate the standard deviation and the mean as well.** The formula for the sd of uniform a distribution deviates a little from the formula we have used previously in part three of this series, but we will not go into details here, why this is the case. As said, we do this just for completion, since we will not need uniform distributions any longer (we will get back to uniform distributions when discussing Bayesian model regression in future some time):

```
# Mean for uniform distribution:
a = 1
b = 6
# For uniform distribution you may come across this formula:
mean = (a+b)/2
# .. but we can also use our regular old mean:
# mean = sum(dice)/length(dice):
mean = sum(dice)/length(dice)
mean = mean(dice)
# The standard deviation for uniform distributions:
# sd() does not deliver equivalent results:
sd = sqrt(((b-a)^2)/12)
# Add mean and sd to plot:
plot(y = test, x = x, type = "l", ylim = c(0,1))
abline(v = mean)
abline(v=mean+sd)
abline(v=mean-sd)
```

The **cumulative distribution function of a uniform PDF as well as PDFs in general is** simply explained, as it **just reflects the integral of the PDF**. We have therefore already looked at the values of a CDF, so to speak (also check out this tutorial, which I partially followed). **The CDF is called cumulative, as it accumulates (sums up) the probabilities the further it integrates the PDF.**

We could also integrate from - Inf to + Inf, but for uniform distributions we can also simply integrate from min. to max. The CDF can be formally defined as follows and is often denoted

The above formula says, if

```
unif_CDF = function(a,b,x){
Fx = x # initialize output vector of length(x): just use x
for(i in 1:length(x)){ # := for every object of the vector x
if(x[i] < a){
Fx[i] = 0
}
else if(x[i] >= a & x[i] <= b){
Fx[i] = (x[i]-a)/(b-a)
}
else if(x[i] > b){
Fx[i] = 1
}
} # End for i
return(Fx)
} # End of function
# Initialize a sequence of possible x:
x = seq(0,7,by = 1)
# Fx:
Fx = unif_CDF(a = 1, b = 6, x = x)
# Plot CDF:
plot(x = x, y = Fx)
```

As you may have expected, the probability raises linearly the more we integrate, until we integrated the area of the whole density function, numerically summing to 1. The x-axis elegantly does not refer to the density, but our variable

Now that we have an idea what a PMF, PDF and CDF is, given a simple uniform distribution, we can now precede to probably the most commonly used form of functions: **the bell curved Gaussian function**, i.e., the (standard) normal probability (density) distribution.

Next, we are going through the same steps as above, just not with a uniform, but a (standard) normal function.

**You may wonder why bell curved standard normal distributions are so popular and important in the first place?**

**You may even feel conflicted hearing the word “normal”?**

**It may even appear as a dark mystique power to you influenced or even under the control of a secret political agenda? :O**

To get settled and leave some uncertainties behind, we will first go through some **conceptual basics** around the famous Gaussian bell function, in order to understand how this special function is related to the world and the way we perform statistical inference.

One way to approach the concept of “normal distributions” is by **treating it as an evident phenomenon**, namely: **the** **regression to the middle**. And yes, this is why it is called regression model! Linguistically, “regression” is Latin and stands for “going back” or “backwards”. So, in general the concept regression to the middle therefore first of all refers to a process of “going back to the middle”. Mathematically, the middle is referring to the mean, the mean of a distribution. **A classic example is the (population) mean of the height of humans. **Have you ever wondered why there aren’t any people just randomly growing up to a height of 4, 6 or 9 meters? One way to answer this question is by saying: regression to the middle: the further away a measurement is from the average, the more unlikely such a measurement will be (until it approaches zero).

**Note that the Gaussian function depicted below refers to a regular probability function, not a probability density function yet!**

Below you will find an actual example from real data via the website “Our World in Data”:

It is hard to answer why such principal restrictions concerning, e.g., the height of an organism, are exactly the case. In general, most brief answers will probably refer to some concept of environmental adaption and even boundaries dictated by physics as such to some degree. However, no matter if you know the exact reason, the work of such effects/boundaries *still* often represent themselves in the form of a bell curved function.

**Historically, the concept of a regression (to the middle) was initially introduced** **in** the context of the **genetic phenotype of peas** by Francis Galton, a cousin of Charles Darwin, in the 19^{th} century, **as well as in the context of heredity and height** (our example above). However, Galton did not discover the mathematical concept of such an effect, i.e., the normal distribution, but he figured out it can be used to empirically describe and predict natures behavior to some degree.

**The take home message is here:** the mathematical method of regression analysis as such is interestingly partially rooted in nature itself. If you like it or not, normal distributions are not just a concept, but also a phenomenon to some degree (e.g., humans don’t randomly grow 10m of height and even if, it would be something happening very seldomly).

**Still, keep in mind that: **

**a) not every phenomenon can be described by a normal distribution, as it is not the only principle at work in statistical inference and nature, so to speak. In other words: there are also other kinds of distributions (logistic, binominal etc.). Below we will also see that normal distributions are a common effect of multiple sampling (a mathematical effect). In other words: a statistical model does not represent reality by itself (e.g., due to bias, confounders etc.). This is something that Galton (and also Pearson) completely overestimated and by doing so influenced the grim idea of eugenics. There is ****a great article**** by the science magazine ****“Nautilus”**** discussing how eugenics shaped statistics in its early stages. **

**Also keep in mind that: **

**b) The normal distribution is distinct from a concept of “norm” in the psycho-social (or cultural) sense. The deviation of height for example did not result from a conceptual (!) intention set by human minds / cognition (such as cultural norms). Being normal is not even necessarily a goal in that respect, such that some random cultural ideal might even be rather the unprobeable case (e.g., very tall people, if considered an ideal). . .**

So keep in mind, facts like the average height, or any other value of some arbitrary measurement, can still be instrumentalized to, e.g., opportunistically argue against the worth of others for some self-interest, so it is important to distinguish between the descriptive norm, e.g., of people’s height, and the “normative norm”, which not only involves a description of how things are, but arbitrarily sets how they are supposed to be in the first place (mere opinion, which from a cultural perspective often represents itself as, e.g., tradition).

Apart from examples in biology or certain phenomena in general: another mathematical reason why Gaussian functions play a great role in (frequentist) statistics is the so-called **central limit theorem**. The central limit theorem states that if we take **several samples of a population with any kind of function as form and take the mean of each sample**, we will get a **normally distributed function of means (!!)** over time. It may sound crazy, but it’s true and can be proven with R! See next chapter for details and code.

Again, the central limit theorem states that if we take **several samples of a population with any kind of function as form and take the mean of each sample**, we will get a **normally distributed distribution of means (!!)** over time, in other words: when our sample size of *means of the samples*, in order to make sense of “the effect” of the central limit theorem below. This can be thought of as approaching a population with a normal distribution and with size *limit*.

Note that in this chapter we will mostly follow this R tutorial on statology.org. I can also recommend this StatQuest Tutorial by Josh Starmer on that topic (can’t always recommend StatQuest!) and this video by 3blue1brown (Grant Sanderson), which was just recently published and which covers a lot of other topics discussed in this tutorial, as well!

Here is some code to demonstrate the effect of the central limit theorem in R given a uniform distribution. **In the downloadable R script of this tutorial we did the same with a logarithmic function, to prove this doesn’t only hold for sampling from uniform distributions (which is, e.g., also symmetric).**

```
# Initialize data via runif, which will
# give us a random but uniformly distributed
# set of data. Each time you run this line, the result will
# be another set of data points.
# Uses set.seed(0) or other values to make results reproducible
# (we will not get further into the the topic here):
data = runif(n = 1000, min = 1, max = 10)
# We can also plot our data as histogram
hist(data, col = "lightblue")
# Use a for loop to take 5000 random samples of size n = 5 each:
n_samp = 5000
sample_n_5 = c() # create empty vector
# The below function will also take the mean of the sample and
# will store the output in our vector sample_n_5:
for (i in 1:n_samp){
# Store mean of each sample with length 5:
sample_n_5[i] = mean(sample(data, 5, replace=TRUE))
} # End for i
# Look at the histogram of the sample_n_5:
hist(sample_n_5, col = "lightblue")
```

**In consequence of the central limit theorem, **because the distribution of a bunch of sample means tends towards a normal distribution over time, we can assume a normal distribution right away in a lot of cases for performing (frequentist) hypothesis testing (integrating a standard normal PDF).

**In the next part of this series the central limit theorem will get important when thinking of sampling from a normal distribution right away:** when we sample from a normal distribution, the higher the sample size, the closer our sample mean will approach the actual mean of the population. In consequence, when we have a high sample size and a very high difference between the population mean and the sample mean, we can assume that such an event is very unlikely explainable by actually sampling from the population (null-hypothesis) but more likely the result of sampling from another distribution (the alternative hypohtesis), due to the central limit theorem (again, assuming that we sampled from the normally distributed population right away, i.e., from the perspective of the null-hypothesis). This will be the basic idea behind a z-/ and very similarly a t-test in the next part of this series.

**In order to be able to follow the central limit theorem and a classic normal and not a t-distribution, the rule of thumb is said to be a sample size of at least 30.** Otherwise, it is recommended to work with a t-distribution, which is very similar to a regular Gaussian and is also bell-curved. We will also get there in the next part of this series, no worries.

**However, the above only works in the case of independent random sampling and finite variance.** Note that there can be cases where the independence breakes apart, e.g., due to some global event: Let us say the tea market brakes down (a kind of bias is introduced by that) and Lady Meow is not able to get hold of any fresh tea. Other cats will then also have the problem of getting hold of some tea, so the overall mean of tea drank during a specific time will suddenly change and we will obtain mean values that were previously thought to be rather seldom…

**We hope chapter 4 and 4.1 have given you a good overview of the mathematical and conceptual importance and the meaning of working with normal distributions and how to distinguish the use of the term “normal” from the term / concept of a “norm”. **

Next we will **move on to the mathematical description of normally distributed functions** as such, which are also referred to as Gaussian functions.

The formula for a basic Gaussian function is actually fairly simple and we will look into why the formula below results in the respective plot (we will partially follow the partially well written intro of the Wikipedia article on Gaussian functions for this part (December 2022):

```
# Gaussian base form:
gaussian_fun = function(x){
fx = exp(-x^2)
return(fx)
} # End of function
# Plot of basic Gaussian:
seq = seq(-4,4,by = .1)
fx = gaussian_fun(seq)
plot(x = seq, y = fx)
```

**You can probably identify the above function as bell curved, but how did this actually result from the equation?**

In general, the number

**However, before we get into ****let us** **first recall what squaring did to our centralized values of x before summing (!) them (discussed in ****part three of this series****). Remember? Look again at the plot below and try to make out the pattern:**

```
# Recall a similar effect when squaring the deviation of
# x_i-mean(x) and looking at the plot:
x = c(0:10)
plot(x = x, y = (x-mean(x))) # deviation x_i from mean(x)
abline(h=0)
plot(x = x, y = abs(x-mean(x))) # absolute value of x_i-mean(x)
plot(x = x, y = (x-mean(x))^2) # plot of squared deviation
# Recall: summing the squared deviation results in
# the total sum of squares...
```

**Again: squaring turned a linear function with negative values into a smooth quadratic parabola function with positive values only. Recall that centralizing the values of x was the first step to obtain the total sum of squares / variance / sd (relateable to a ***parametric*** Gaussian, see further below).** We know that a parabola function can have a maximum, similar to a Gaussian function. We also already know that only a parabola like

Think of the step of squaring the exponent of a regular exponential function

as performing a similar process as above with our initially linear centralized values: Squaring the exponent of an exponential function turns a steady into a — at least in its middle part — rather parabolic relation of values with either a minimum in **exp(-x)**

or a maximum in **exp(x^2)**

. **exp(-x^2)**

We can also plot the

, which shows how a proper Gaussian function integrates a parabola function **exp(-(abs(x))**

```
# Plot of exp(-abs(x)):
test_abs = function(x){
fx = exp(-abs(x))
return(fx)
} # End of function
# Recall that we will use the absolute value of a sequence
# consisting of values ranging from -4 to 4 (by a step of .1):
plot(x = seq, y = test_abs(seq), typ = "l")
```

As mentioned, in the case of a regular (non-parametric) Gaussian, we could even use a different base, and we will *still* get a Gaussian form. Below we will use base 2.

```
# It does not even matter that we use the number e as base:
test_gen = function(x){
fx = 2^(-x^2)
return(fx)
} # End of function
plot(x = seq, y = test_gen(seq), typ = "l")
```

Even though the number

**exp(x)**

**For completion, here are some ***optional facts*** why ****exp(x)**** plays an important role in mathematics and physics. Up front: it is essentially convenience, as** **has some mathematical characteristics that makes a lot of calculations easier:**

a) The *derivate* of

b) The *slope* at a certain point

c) The *integral* up to a certain point

**Proof of all above the above in R:**

```
plot(x = seq, y = fx_pos, type = "l") # fx_pos from above
# forming coordinate system (+x,-x,+y)
abline(v = 0)
# Point at x = 0, so f(0) = y = e^0 = 1
exp(0) # Now add point to plot:
points(y = 1,x = 0) # P (0|1)
# Point at x = 1
# y = f(x) = e^x; f(1) = e^1 = e
exp(1)
points(exp(1)) # P (e|1)
# Slope of tangent at a point P(f(x)|x) := f'(x) = e^x,
# since f(x) = e^x = f'(x)
# Evaluate y coordinate of P(f(x)|x):
y_coordinate = exp(1)
# [1] 2.718282 = Euler's number e.
# We can add the respective tangent at exp(y)
# point with a slope = e^x. In case you wonder:
# the linear function x does not need to be defined in curve().
curve(exp(1)*x,-3,3, add = TRUE)
# Integral from -Inf to x = 1
expo_fun = function(x) {exp(x)}
integrate(expo_fun, lower = -Inf, upper = 1)
# 2.718282 with absolute error < 0.00015
# = e again.
```

In the next chapters, we will see that the use of **parameter** into an equation. We will not focus on the number **In general, using **** as base gives us the ability to parametrize and therefore generalize our Gaussian function.**

Next, we are going to look into the parametrized form of a Gaussian function. The formula may look complicated, but it is actually really simple:

The parameters **height of the peak** (**value of **** at the peak of the Gaussian** (*mean*); and **width of the curve** (*standard* *deviation!*), which refers to the turning points of the tails of a Gaussian (see plot further below). This is how the sd indirectly controls the width of the bell curve.

**So here we finally get to a point where we can clearly relate the mean and the sd to a normal / Gaussian distribution. Whoop!!**

Looking at **the mean **** in the numerator** of the fraction in the exponent of **is again centralized!** **With some slight** adjustments this whole exponent will later be referred to as **z-score for standard normal PDFs**, as we will see!!

**As mentioned, the height of the peak** can be understood as a parameter for **scaling the y-axis**. However, keep in mind that the above formula does **not** **yet** refer to a density for

For the numeric example below, we used the above data from the height example (“Our World in Data”) and tried to figure out a distribution for humans in general.

```
# Gaussian with parametric extension:
seq = seq(100,220,by=.1) # height in cm.
a = .5 # height of curve peak
b = (178.4+164.7)/2 # x of peak = mean = 50%
c = ((178.4-170.8)+(164.7-157.6))/2
# width of the bell = sd
# + turning point of function!
gaussian_fun_para = function(x, a, b, c){
fx = a*exp(-(((x-b)^2)/(2*c^2)))
return(fx)
} # End of function
fx = gaussian_fun_para(seq,a,b,c)
plot(x = seq, y = fx, xlim=c(135,205), type ="l")
abline(v=b, col = "lightblue")
abline(v=b+c, lty = 2)
abline(v=b-c, lty = 2)
```

To understand the exponent of the above parametrized form of the Gaussian, **we will further play around a bit with the formula to understand what each value actually does.** R offers a lot of possibilities to use, e.g., plotting in order to heuristically explore functions via visualizing data:

Here is some code where we manipulated the formulas including the respective plot. Below you will find a corresponding figure including the formulas that were used for each plot. Note upfront that the first and the second test in R code below (

and **test_1**

) resulted in equivalent values, since **test_2**

```
# Parameters
a=1
b=0
c=1
# Sequence
seq = seq(-4,4,by=.1)
# Test 1:5
test_1=exp(-(((seq-b)^2)/(2*c^2)))
test_2=exp(-(((seq-b)^2)/(2*c)))
test_3=exp(-(((seq-b)^2)))
test_4=exp(-(((seq-b))))
test_5=exp(-(((seq-b))/(2*c)))
# Plot of Test 1:5
plot(x=seq,y=test_1)
plot(x=seq,y=test_2)
plot(x=seq,y=test_3)
plot(x=seq,y=test_4)
plot(x=seq,y=test_5)
```

You may figured out for yourself already but the **parametrized Gaussian probability function** is structurally the **same as** the **Gaussian or normal probability ***density*** function**. The only difference is again that the area will now sum to 1 (we will soon learn why). The param. normal PDF is also referred to as *standard* normal PDF, when a mean of 0 and a sd (or variance) of 1 is given (

). **sqrt(1) == 1**

**What can we do with it?** The *standard* normal PDF with a mean of 0 and sd of 1 is representing the population (or another sample) and can essentially be used to compare a population it with a standardized sample distribution, i.e., with a sample distribution that is transformed into a standard normal form. More precise, the sample mean will be translated into measures of population standard deviation: **If the mean of the population is 10, the standard deviation of the population is 2 and the ***standardized sample mean*** equals +1, then we can reclaim the sample mean to be 12, since it is deviating +1 population-sd from the population mean of 10 (recall, we set the pop-sd to be 2 in this example).**

**To give an example what can be achieved by comparing the means of two distributions in a standardized way:** say we want to test if a new drug changes the blood pressure of a patient. What we could do is, we could compare the sample mean of those patients that received the new drug with the population mean, i.e., a group that did *not* receive the new drug (or “all other people”). **However, looking at the difference of means for itself, does not tell us much.** Say, we end up with a difference of systolic blood pressure of 5 mmHg, is that much? Looking back at part three of this series, you may ask: couldn’t 5mmHg just be the standard deviation of the mean? In other words, could it be that such a difference is quite common when sampling patients’ systolic blood pressure? In other words: the difference in means for itself does not tell us much about how to asses such a difference. Standardization will help us to evaluate such individual differences by transforming individual values of measurements into *a general relation*. Again, this means that the deviation of the sample mean from the population mean will be looked at in units of standard deviation *of the population mean. *

**This is essentially what the formula of a z-test does:** translating the sample mean, such that it becomes relatable to a Gaussian PDF. However, translating the sample mean into a standardized scheme for itself does not tell us enough either, since it remains unclear how likely such a certain obtained difference may turn out to be. We also need to integrate our PDF up to our standadized sample mean in order to obtain a probabilistic perspective, as we will see further below. So again, different to a regular probability function, we can not just read out the probability value by looking at the y-axis, since we are handling a (standard normal) probabiltiy *density* function. Note that the process of **standardization** is only slightly different to what is called **studentization**, using a Student’s t-test, i.e., a **t-distribution** (similar to a standard normal PDF, see further below). The standard normal PDF is also referred to as **z-distribution**. You may have already come across the term **t-score or z-score **before.

**Before we get all crazy with hypothesis testing, and since integrating a Gaussian is not that trivial, let us first go through the formula of a Gaussian PDF step-by-step, especially in order to understand how a parametric Gaussian function becomes a density function.** German readers, depending on their age, may have come across one of these in their past:

The **general parametrized formula for the normal (non-standardized) probability density function** that follows the mean and the variance of a distribution is denoted as follows (recall that sigma-squared is essentially the variance; the formula intentionally deviates from the formula from the 10 Mark bill):

**The left part of the formula is spoken out:** *set* parameters). We could also use the sd, however as we know, it is just a matter of squaring and some minor adjustments of the above formula. When doing research, you might encounter formulas with other slightly different rearranged formalism (as in the 10 Mark bill above), apart from using the standard deviation instead of the variance. Don’t worry, as long as they all result in the same. Especially the formula for the standard normal PDF could also be simplified further, since it has a mean of 0 and a sd of 1 (purple formula):

**However, we will stick with the ***non-simplified*** formula, in order to fully understand what it actually says and for what it can be used.**

To understand the basic structure of the formula, recall that our *uniform density function* was denoted **which resulted from the constrain to sum to 1.** In the normal PDF above, we see a similar pattern for our parameter **normalization factor (it normalizes the y-axis to form a density). **

The reason that the number **Let es first do some descriptive work on our function:**

**Again, the so called standard normal distribution is denoted as having a mean of 0 and a variance of 1.** Below you will find some code for the above. You can also plug in a different mean and variance below, however, recall that it will not result in a *standardized* PDF anymore:

```
# General formula for a probability density function (PDF).
# Insert mean = 0 and var = 1 to obtain the values for input x
# given a so-called standard normal distribution:
prob_dens = function(x,mean,var){
fx = (1/sqrt(2*pi*var))*exp((-((x-mean)^2))/(2*var))
return(fx)
} # End of function
# In case placing () in the above becomes an issue when
# writing a function, you can also decompose the formula:
prob_dens_decomp = function(x,mean,var){
fx1 = 1/sqrt(2*pi*var)
fx2 = (-((x-mean)^2))/(2*var)
fx = fx1*exp(fx2)
return(fx)
} # End of function
# Initlize seq:
x = seq(-4,4,by=.01)
# Given standard normal distribution:
norm_PDF = prob_dens(x, mean = 0, var = 1)
# Using basic R function dnorm():
norm_PDF_alt = dnorm(x, mean = 0, sd = 1)
# Check for equality:
all.equal(norm_PDF,norm_PDF_alt)
# [1] TRUE
# Plot
plot(x=x,y=norm_PDF)
plot(x=x,y=norm_PDF_alt)
```

Note that the x-axis

**We noticed previously when discussing the parametric Gaussian function that changing **** did have effect on the height of our function but **** not on the general form of the function.** We also noticed that the sd did have effect on where the turning points are located and therefore how narrow our function will be, but the sd did not affect the height. Looking at our formula above, you might have noticed that things have changed by now, since the sd is now also entailed in the general parameter of

Let us plot the output of our

function using a variance of **prob_dens()****Here we can see why a low variance is essentially considered a good thing**, since a high variance leads to a wider spread of values and subsequently to a higher variability when taking samples from the respective distribution (e.g., it becomes more likely to get a deviation of +/-4*sd when taking a sample from a distribution with higher variance). Our assumptions, our predictions will therefore be more messy, given a higher variance. In other words: A higher variance here means that, e.g., trying to predict new events via the mean of a sample may come with higher errors. The *lower* variance / sd.

As we figured out in part three of this series, the value of the variance itself does not tell us much about the quality of a function, but the comparision with other functions does (see below). The sd on the other hand can directly be related to the mean and delivers a value that we can interpret descriptively right away (turning points of the function).

```
# Plot with var = 1 and var = 3
plot(x=x,y=prob_dens(x,mean = 0,var = 1), type="l", col = "blue")
# The lines function adds another cure to existing plot:
lines(x=x,y=prob_dens(x,mean = 0,var = 3), col = "red")
```

Again: The formal reason for the above deformations of the Gaussian by the sd is of course the fact that the sd/variance appears in our parameter

Note that most of the time we will operate with **equal variances** in this tutorial, so the standardized *sample* distribution will have the same form as the standard normal PDF, only the mean will deviate. We did not yet compare a sample distribution with a population distribution, so no worries if this note does not ring a bell yet.

Next up we will integrate Gaussian functions and finally understand a little better why

**This chapter is rather detailed and entails some aspects that could be considered optional, so in case you have trouble understanding all of it, it does not matter too much, as long as you get the general idea how we can turn a param. Gaussian into a param. Gaussian PDF.** We will also recommend some video tutorials to follow in parallel.

The formula for the integral of a *non*-param. Gaussian goes as follows:

We already noticed that the number *density* function that sums to one (see decomposition below). In a standard normal PDF this is done in advance, so to speak. **So, one way to answer the question why **** is involved: it has to do with making the complete integral sum to a value of 1 (normalization), as the integral of a regular Gaussian is equal to **** and not a value from 0 to 1 by itself.**

**This does not answer why **** appears in the first place in the integral yet.** Though I hope, that so far the formula of our standard normal PDF makes sense to you again. The decomposition below derives the formula of the **basic (non-parametric!) Gaussian PDF:**

Below you will find some code to replicate the integral of the basic Gaussian function using the

function in R. By using our **integrate()**

function together with the basic R function **gaussian_fun()**

, we can determine a specific area we want to integrate, e.g., the whole curve via **integrate()**

, **lower = -Inf**

. **upper = +Inf**

```
# Gaussian base form:
gaussian_fun = function(x){
fx = exp(-x^2)
return(fx)
} # End of function
# Using integrate function:
gaussian_integral = integrate(gaussian_fun, lower = -Inf, upper = +Inf)
# Use gaussian_integral$value to get value only, as integrate()
# returns a message and a list:
all.equal(gaussian_integral$value,sqrt(pi))
# [1] TRUE
```

**To finally understand why the number **** is involved in the first place, we have to dig a little deeper into the math:**

The first thing I thought of, when I saw that the integral of a non-parametric Gaussian is the square root of

We are only going to touch the surface here, but there are a lot of videos on that matter on the internet. Here are two videos I watched to write this part of the tutorial: one in German by Weitz / HAW Hamburg, another English video by vcubingx. This article by Ryan Brideau also discusses the same topic in a similar way.

**Calculating the volume can also be written as taking the integral of an integral, where importantly **** and **** actually refer to the same set(!!). **

In general, there are some difficulties when integrating a Gaussian (especially by hand) and there are several ways to do so. **We will only dip into the topic to get some intuition and will not replicate the whole math**, but however: One way to integrate a Gaussian is by using *polar coordinates* instead of *cartesian coordinates,* **eventually rotating the 2D integral around its peak, which essentially results in the volume of a 3D Gaussian** (it is easy, due to the symmetric structure of 3D Gaussian, see plot later below).

The below formula may appear complicated, but think of the integral from *not* 1).

The part where the integral of *not* refer to a dependent variable! It is just the same set multiplied by itself in order to add a third dimension to the 2D Gaussian.

**To wrap this up:** **What we get by essentially adding a redundant dimension is a perfectly symmetric 3D gaussian (see plot below).** There are lot of videos on the internet on that topic, so we will leave it by that and we will also not replicate the various other existing mathematical methods to integrate a Gaussian via R here (we will stick with the built-in

function). **integrate()**

The plot below is supposed to show that rotation is easily doable, since the 3D Gaussian is perfectly symmetric.

Here is a more detailed plot. Not the best, since it was hard to integrate the information into the above plot but I hope you can make sense of it:

Grant Sanderson (3blue1brown) just recently published a video on all of the above in a series that also entails the video on the central limit theorem that we mentioned before. Also note that Grant Sanderson provides all the python code for his the visualization of the math in his videos — in case you are interested in python as well!

**However, below you will find the code to replicate the above 3D plot in R:**

```
# 3D Plot of a Gaussian:
x = seq(-4,4,by=.1)
y = x
# We will not go into full detail why we need the outer product for z,
# but the result is a matrix and z needs to be related to two
# coordinates: x and y.
# However, outer() allows to write the function of our 3D Gaussian
# within the outer() function:
z = outer(x, y, function(x,y)(exp(-(x^2+y^2))))
# The persp() function lets us plot in 3D; adjust theta and phi
# in order to change perspective:
persp(x = x,y = y,z = z, theta =20,phi=20)
# From the top:
persp(x = x,y = y,z = z, theta = 0, phi = 90)
```

**The integral of the parametric form looks a little messier, but it is fairly simple when decomposed.**

The next part is a short replication of parts of the Wikipedia article on Gaussian functions. It explains how our parameters **Note that moving from line 3 to 4 may appear bumpy, but it is again just serving normalization, such that the last line results in ****. **Later** **we will also further look into the specific purpose of the substitution of

Upfront: The value of **in a slightly altered form** is also referred to as **z-score**, which is **important for converting any set into a standard normal form**, as we will see later below. Note that the substitution of

```
# Integration using our parametrized Gaussian function:
gaussian_fun_para = function(x, a, b, c){
fx = a*exp(-(((x-b)^2)/(2*c^2)))
return(fx)
} # End of function
gauss_para_integ = integrate(gaussian_fun_para,a=a,b=b,c=c,lower=-Inf,upper=Inf)
all.equal(gauss_para_integ$value,(a*sqrt(2*pi*c^2)))
# [1] TRUE
```

**So, now we know why we use **** and why **** is involved in the first place. **

**You may wonder why ****the parameter **** will cancel itself out! On the other hand the term **

We now have all of the knowledge to finally calculate the whole integral of a normal PDF and therefore nearly all the knowledge we need to calculate the infamous p-value, which only concerns a specific integrated area under the curve. . . Note that below we will just integrate from - Inf to + Inf and half of a Gaussian function, i.e., we will not perform an actual z-/t-test in this tutorial.

Integrating is essential to obtain probabilistic values from our PDF. However, we were **not testing for anything yet**. Testing would again involve integrating a specific value (a z- or t-value). We will get there soon.

A short way of integrating our standard normal PDF in R is again by using our

or the built-in **prob_dens()**

function together with the R function **dnorm()**

. In case you wonder about the code below, the **integrate()**

function has a special structural feature: it takes a function such as **integrate()**

as input for itself and the parameters of that function (**prob_dens**

and **mean =**

) can be called as parameters within **var =**

, see below:**integrate()**

```
# Integrate stand. norm. probability density function:
integrate(prob_dens, mean = 0, var = 1, -Inf, +Inf)
# Result:
# 1 with absolute error < 9.4e-05
# ... or via using dnorm(), now with the parameters "mean =" and "sd =":
integrate(dnorm, mean = 0, sd = 1, -Inf, +Inf)
# Result:
# 1 with absolute error < 9.4e-05
```

However, the shortest way may be by using the built-in

function that evaluates the p-value given a normal distribution:**pnorm()**

```
# We could also use the function pnorm():
p_value_Inf = pnorm(Inf,mean = 0,sd = 1)
# [1] 1
# p-value at the mean = 0,
# given a mean of 0 and sd of 1:
p_value_pnorm = pnorm(0,mean = 0,sd = 1)
p_value = integrate(prob_dens, mean = 0, var =1, -Inf, 0)$value
# [1] 0.5
# Test for equality:
all.equal(p_value_pnorm,p_value)
# [1] TRUE
```

We can now go on and calculate and **plot the respective normal CDF of our normal PDF that cumulatively adds up the p-values of the respective z-scores (the values on the x-axis).** We will discuss the z-score and its formula in the next chapter, but so far we know already that it is just a way to translate any values of the x-axis into a standardized form of values representing a standard deviation from the mean (which itself is 0). The function entails a for-loop to calculate the p-value for every possible value of our sequence (-4 to 4, by .1) via the integrate function:

```
# CDF of a normal PDF:
norm_CDF = function(seq, mean, var){
# Initialize vector length(density)
Fx = seq
for(i in 1:length(seq)){
Fx[i] = integrate(prob_dens, mean = 0, var = 1, lower = -Inf, upper = seq[i])$value
} # End for i
# Plot result:
plot(x=seq, y = Fx, type = "l")
return(Fx)
} # End of function
# Test our function and plot our CDF (code for plotting is entailed
# in function above already):
seq = seq(-4,4,by=.1)
norm_CDF(seq,0,1)
```

**We added the possibility to adjust the mean and variance in the ****prob_dens()**** function**, but note that **this is not the same as evaluating** the previously mentioned **z-score** of our sample in order to perform the complete standardization. However, evaluating the z-score of a given set of values is fairly easy, since it has nearly the same structure as the exponent of **In case you are confused:** above we actually did not standardize a set of values, but looked at the standardized PDF itself.

Obtaining a t-score is again very, very similar to obtaining a z-score, as we will see, **so** **we will start with the z-score; perform a z-test and will then move on to the t-test**.

So far we have looked at the (standard) normal PDF as such and did not perform any actual standardization of data and also did not perform any hypothesis testing yet. **Up front:** testing in a z- and t-test will just mean: comparing two means in order to evaluate if they deviate enough that we can argue that they are likely two different distributions and not just samples from one and the same distribution. **To give an example what can be achieved by comparing the means of two distributions:** say we want to test if a new drug changes the blood pressure of a patient. What we could do is, we could compare the sample mean of those patients that received the new drug with the population mean, i.e., a group that did *not* receive the new drug (or “all other people”). However, looking at the difference of means for itself, does not tell us much.

So far, it was rather easy to integrate a uniform distribution and doing so helped us to understand the formula of our normal PDF, which we figured out needed some deeper insight into mathematics in order to integrate such functions. However, the fact that integration here involves a complicated process is also important for error control. **A computer scientific solution therefore also always indirectly aims to keep (human) errors in calculations low. **Computers give us the opportunity to operate with complex math without doing any calculations “by hand” or via cognition at all. This is of course also dangerous, since we do not need to actually understand a z-test in order to execute some lines of code — of course scientists should be able to understand and explain more than just some lines of code.

This is one of the reasons we chose to create *thorough* tutorials for you, before we come up with heuristic quick guides of some kind, as it is unfortunately rather common to use statistical methods without actually understanding them porperly. People with a mere heuristic approach also get easily agitated when one is asking basic questions, which is sad, since that means: you may have to justify yourself because you have knowledge instead justifying a lack of knowledge. Note that this indeed has the charachter of a a classic opportunist gate-keeping tactic and especially explains why statistical analysis is often not well performed in papers. Simply talking about p-hacking etc. is not enough to tackle attitude driven science.

**However**, by using nowadays computer software such as R, the focus can lay on the concept, the formula, and a little less on the representational realization of an algorithm (calculating by head and hand). As you have hopefully figured out by now: This makes the mathematics intellectually a lot more graspable for us and also allows for smooth experimentation and various forms of visualization of mathematics/statistics…

**Apart from a computer scientific solution for our integral problem**, you may already be familiar with **t-tables**, or a **standard normal table, also called a** **z-score** **table** in that respect (we will replicate a z-score table in R further below!!). These tables deliver you the integral up to a certain bound value (z-/t-score), so they don’t have to be calculated by hand (which was rather daunting most of the time in the 20th century)! In other words: t-/z-tables were the error control in the 20th century, where computers where not as common and powerful as computers are today. **You might have also guessed already, but yes, a** **z-table is essentially just a CDF in a tabular form (see further below).**

**Let us now look at the formula of the z-score** which is used to standardize any value of *test* (testing for differences in means), we will essentially use the same z-score formula, just using the mean of our sample as input **The result of a z-test results in a special kind of z that will be called z-value instead of z-score.**

**So, keep in mind, we are still not at a stage of actual testing, but will at least be able to convert any numerical value of a measurement into a standard normal PDF. This is again done for convenience and comparability of our data under a generalized (standardized) perspective:**

**IMPORTANT NOTE:** With this formula not only the **of our sample** gets properly scaled (as our

function did), but also the density in a way that it fully corresponds to a **prob_dens()***standard* normal PDF with mean 0 and sd of 1 and can therefore be properly compared with a population or another sample. N**ow the integral to a point of **** will correspond to the p-values of z-score tables, such that neither we, nor our computer has to perform an actual integration! **

**The figure below shows what we can do using the z-score formula. IMPORTANTLY, apart from that you may wonder how it comes that we are comparing two distributions? **The sample for itself is also considered a distribution. In fact, performing a z-test requires that the sample itself has a normal distribution. We mentioned the Shapiro-Wilck test in part three of this series as a quick side quest before.

**Let us do some calculations in order to figure out what the above actually means. **At first we will look at an **example where the sample mean of a sample of values is equivalent with the population mean**. This can be looked at as a *special case of the z-test.* In this case we can represent the difference between sample mean and population mean just by the formula of the z-score, since the difference is zero anyways. In this special case we just end up with standard normal distribution itself, representing our baseline by itself. When integrating we can represent and calculate the probability of *all* possible differences of the (standardized) sample mean from the (standardized) population mean. This baseline approach is essentially what a z-score table consists of, as we will see.

**Numeric example:** So, let us say we have knowledge on the *population* mean of the number of cups of tea Lady Meow drank in a whole day (say pop. mean = 5). Say, we also took a sample recently, i.e., measured the tea drinking behavior of Lady Meow again and want to compare the sample with the population and the sample turns out to also have a mean of 5. We will therefore end up with a difference of 0.

**The integral up to a z-score of 0**, representing the standardized baseline population mean, will be .5, which is essentially just half of the integral. By doing so, we are essentially asking how likely it is that our sample mean is significantly *lower or equal* to the population mean (later discussed as lower tail z-test of a 0 difference in means). **Upfront:** when calculating a p-value, asking for both being either lower or higher in difference from the population mean (equality/inequality as such), *then we would have to double the above p-value* (later discussed again as two-tailed z-test). In the case of sample mean == population mean, this double sided approach will result in a 100% chance (probability of 1) that the sample is equal to the population with a non-standardized mean of 5 (again, this is equivalent to a z-test at least for the case where the sample mean == population mean).

**How would we in general interpret the real life case where the result of a clinical trial turns out to be showing a 0 difference in means?** It depends on the question we ask when performing a hypothesis test, but most of the time we essentially want this value to be very low, aiming for a difference in means, so that we can argue that the sample is actually different from the population. **To give a medical example**: when comparing sample participants that were given medication with a population that obtained *no* medication, then we mostly hope for a difference when comparing these two groups, since we probably hope for an effective new medication that makes a significant difference to doing nothing at all **(or placebo effect, which is essentially no conditional effect of a medication — claiming anything else is considered manipulative product/procedure binding)!** In other words, we hope to find medication that has an evident effect and deserves the name medication in the first place.

**As a side note:** Mathematically we already know from the first chapters when looking at uniform distributions, that a PDF also allows us in general to integrate any region, e.g., when asking how likely it is to obtain a sample with a mean that is between -1*sd and +1*sd. . .

Below you will find some Code for the special case of a 0 difference between population and sample mean, as well as a plot of the standardization of the dependent variable of our second synthetic data set observing Lady Meow ^°^!

```
###### Special case of pop. mean == sample mean:
pop_cups = c(0:10)
# Population mean, variance and standard deviation:
# Recall that var() and sd() calculates teh sample var/sd
# including Bessel's correction, therefore we use
# the formulas below:
pop_mean = mean(pop_cups)
# [1] 5
pop_var = sum((pop_cups-mean(pop_cups))^2)/length(pop_cups)
# [1] 10
pop_sd = sqrt(pop_var)
# [1] 3.162278
# Function for evaluation the z-value of a
# non-standardized data set:
z_score = function(x,pop_mean,pop_sd){
z = (x-pop_mean)/pop_sd
return(z)
} # End of function
# Standardized Z-Score for a particular value of x:
# Say we want to standardize a value of x that is equal
# to our mean itself as a special where a z-test formula
# is not needed yet:
z_score(pop_mean, pop_mean, pop_sd)
# [1] 0
# Integrating will reasonably result in
# 0.5, i.e., half of the area under the curve:
pnorm(0)
# [1] 0.5
# For a two-tailed test, checking on equality/inequality:
2*pnorm(0)
# [1] 1
# Let us now look at the z-score for our
# value of mean+/-sd in pop_cups:
z_score((pop_mean-pop_sd), pop_mean, pop_sd)
# [1] -1
z_score((pop_mean+pop_sd), pop_mean, pop_sd)
# [1] +1
z_score((pop_mean+2*pop_sd), pop_mean, pop_sd)
# [1] +2
# Worked!
#### Plot standardization of the dependent variable of
#### second synth. observing Lady Meow
# Second synthetic data set:
# Cups of Tea:
cups = c(0, 1, 2, 3.5, 4.8, 5.2, 6, 6.9, 8.5, 9.1, 9.9, # run 1
0, .6, 2.6, 3.1, 4.8, 6.9, 7.1, 7.5, 8.5, 9.9, 9.9, # 2
0, .2, 2.9, 2.9, 4.9, 5.2, 6.9, 7.1, 7.3, 9, 9.8) # 3
# Time passed for each run of measurements (cups), abbreviated:
time = c(0:10,0:10,0:10)
# Plot corresponding normal and standard norm. PDF:
# Normal PDF (non- standardized) of our data:
seq = seq(-10,20,by=.1)
plot(x=seq,y=prob_dens(seq,mean(cups),pop_var),type ="l")
# Translating all possible values on the x axis into our
# normal PDF into the format of a standard(!) normal PDF:
z_seq = z_score(seq,mean(cups),pop_var)
plot(x=z_seq,y=prob_dens(seq,mean(cups),pop_var),type ="l")
```

**We can also create and use a z-table via R, which is essentially again just our CDF in tabular form:** Here is some code to create a standard normal distribution table / z-table yourself (partially following this R tutorial).

```
# Standard normal distribution table / z-table:
# Initialize seq for z-scores (+/-)
z = seq(0,9.99,by = 0.01)
# Calculate all possible z-scores from 0 to 9.99:
p_z = pnorm(z)
# Plotting the above shows that we calculated the
# upper half of the normal CDF:
plot(x=z,y=p_z)
# Set as matrix/table:
z_table = matrix(p_z,ncol = 10,byrow = TRUE)
# Add row and column names to matrix:
rownames(z_table) = seq(0,9.99,b = .1)
colnames(z_table) = seq(0,.09,by = .01)
# Evaluate z and search table for corresponding p-value:
# [["col","row"]]
z_table[["3.2","0.03"]]
# ...or use the following way without having to separate the digits:
# Change class to data_frame:
table = as.data.frame(z_table)
# Define random z:
z_score = 3.23 # only use value up to two decimal places
# The following lines separates the first 3 digits into the
# [["col","row"]] structure, so you don't need to do yourself:
# substr(x,start,stop):
rowDigit = as.numeric(substr(z_score*100, 1,2))/10
# [1] 3.2
colDigit = as.numeric(substr(z_score*100, 3,3))/100
# [1] 0.03
# Execute the next line to find the respective value
# in the table:
table[which(rownames(table) == rowDigit),which(colnames(table) == colDigit)]
# [1] 0.999381
pnorm(3.23,0,1)
# [1] 0.999381
```

Only the upper part of a normal CDF is used for the table.

delivers a list for negative z-values for the left half of our Gaussian, so to speak (recall the symmetry of Gaussian functions). **1-pnorm(z)**

We can also **invert our CDF into a so-called quantile function**. We will not reproduce the whole math here for that, but a quantile function delivers an important perspective. We can use the

function in R. **qnorm()***In nuce:*** with the quantile function we can evaluate a specific z-score given a specific p-value or percentage (e.g., z-score for a confidence interval, see further tutorials). In other words: it is just a “tilted” CDF, where the z-score is now located on the y-axis and the probability on the x-axis (not vice versa):**

```
# Use the qnorm function for the inverse CDF:
# Note that we used a rounded score as example
# before:
qnorm(0.990,0,1) # for a p-value threshold of 1%
# [1] 3.229977
seq = seq(0,1,by=0.0001)
quant_func = c() # empty vector
for(i in 1:length(seq)){
quant_func[i] = qnorm(seq[i],mean=0,sd=1)
} # End for i
# Plot Quantile function:
plot(x=seq,y=quant_func, type = "l")
```

**Apart from a z-score table there are also so called t-tables. We will get to the t-test in the next part of this series, but we want to explain the main difference upfront: The main difference between a z-table and t-table is the fact that the t-score uses the sample variance/sd and the z-table uses the population variance/sd in its calculations.**

**The decision which test to choose also involves the size of a sample:**

A special case of the z-score is the effect size (Cohen’s d effect size), which does not standardize a whole set, but the mean of the sample as input only:

In the next part of this series we will further discover the difference and the relation between the z-score, the z-statistics and Cohen’s d effect size.

In the next part of this series we will go fully into discussing z- an t-tests for both regular sets of values and for linear regression models. In the last chapter we essentially touched the topic of one sample z-tests, i.e., comparing a sample with a population (different to two-sample tests, where two samples are compared with each other). We will also thoroughly discuss what a one-tail and two-tail test is and when to use them. Plenty of furry examples will await you, in order to get an overview of when and how to use certain configurations of z- and t-tests. ^^