10. Calculating p Values — R Tutorial (2024)

  • Docs »
  • 10. Calculating p Values

Contents

  • Calculating a Single p Value From a Normal Distribution
  • Calculating a Single p Value From a t Distribution
  • Calculating Many p Values From a t Distribution
  • The Easy Way

Here we look at some examples of calculating p values. The examplesare for both normal and t distributions. We assume that you can enterdata and know the commands associated with basic probability. We firstshow how to do the calculations the hard way and show how to do thecalculations. The last method makes use of the t.test command anddemonstrates an easier way to calculate a p value.

10.1. Calculating a Single p Value From a Normal Distribution

We look at the steps necessary to calculate the p value for aparticular test. In the interest of simplicity we only look at a twosided test, and we focus on one example. Here we want to show that themean is not close to a fixed value, a.

\[ \begin{align}\begin{aligned}H_o: \mu_x & = & a,\\H_a: \mu_x & \neq & a,\end{aligned}\end{align} \]

The p value is calculated for a particular sample mean. Here we assumethat we obtained a sample mean, x and want to find its p value. It isthe probability that we would obtain a given sample mean that isgreater than the absolute value of its Z-score or less than thenegative of the absolute value of its Z-score.

For the special case of a normal distribution we also need thestandard deviation. We will assume that we are given the standarddeviation and call it s. The calculation for the p value can be donein several of ways. We will look at two ways here. The first way is toconvert the sample means to their associated Z-score. The other way isto simply specify the standard deviation and let the computer do theconversion. At first glance it may seem like a no brainer, and weshould just use the second method. Unfortunately, when using thet-distribution we need to convert to the t-score, so it is a good ideato know both ways.

We first look at how to calculate the p value using the Z-score. TheZ-score is found by assuming that the null hypothesis is true,subtracting the assumed mean, and dividing by the theoretical standarddeviation. Once the Z-score is found the probability that the valuecould be less the Z-score is found using the pnorm command.

This is not enough to get the p value. If the Z-score that is found ispositive then we need to take one minus the associatedprobability. Also, for a two sided test we need to multiply the resultby two. Here we avoid these issues and insure that the Z-score isnegative by taking the negative of the absolute value.

We now look at a specific example. In the example below we will use avalue of a of 5, a standard deviation of 2, and a sample sizeof 20. We then find the p value for a sample mean of 7:

> a <- 5> s <- 2> n <- 20> xbar <- 7> z <- (xbar-a)/(s/sqrt(n))> z[1] 4.472136> 2*pnorm(-abs(z))[1] 7.744216e-06

We now look at the same problem only specifying the mean and standarddeviation within the pnorm command. Note that for this case we cannotso easily force the use of the left tail. Since the sample mean ismore than the assumed mean we have to take two times one minus theprobability:

> a <- 5> s <- 2> n <- 20> xbar <- 7> 2*(1-pnorm(xbar,mean=a,sd=s/sqrt(20)))[1] 7.744216e-06

10.2. Calculating a Single p Value From a t Distribution

Finding the p value using a t distribution is very similar to usingthe Z-score as demonstrated above. The only difference is that youhave to specify the number of degrees of freedom. Here we look at thesame example as above but use the t distribution instead:

> a <- 5> s <- 2> n <- 20> xbar <- 7> t <- (xbar-a)/(s/sqrt(n))> t[1] 4.472136> 2*pt(-abs(t),df=n-1)[1] 0.0002611934

We now look at an example where we have a univariate data set and wantto find the p value. In this example we use one of the data sets givenin the data input chapter. We use the w1.dat data set:

> w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)> summary(w1) vals Min. :0.130 1st Qu.:0.480 Median :0.720 Mean :0.765 3rd Qu.:1.008 Max. :1.760> length(w1$vals)[1] 54

Here we use a two sided hypothesis test,

\[ \begin{align}\begin{aligned}H_o: \mu_x & = & 0.7,\\H_a: \mu_x & \neq & 0.7.\end{aligned}\end{align} \]

So we calculate the sample mean and sample standard deviation in orderto calculate the p value:

> t <- (mean(w1$vals)-0.7)/(sd(w1$vals)/sqrt(length(w1$vals)))> t[1] 1.263217> 2*pt(-abs(t),df=length(w1$vals)-1)[1] 0.21204

10.3. Calculating Many p Values From a t Distribution

Suppose that you want to find the p values for many tests. This is acommon task and most software packages will allow you to do this. Herewe see how it can be done in R.

Here we assume that we want to do a one-sided hypothesis test for anumber of comparisons. In particular we will look at three hypothesistests. All are of the following form:

\[ \begin{align}\begin{aligned}H_o: \mu_1 - \mu_2 & = & 0,\\H_a: \mu_1 - \mu_2 & \neq & 0.\end{aligned}\end{align} \]

We have three different sets of comparisons to make:

Comparison1
MeanStd. Dev.Number(pop.)
Group I103300
Group II10.52.5230
Comparison2
MeanStd. Dev.Number(pop.)
Group I124210
Group II135.3340
Comparison3
MeanStd. Dev.Number(pop.)
Group I304.5420
Group II28.53400

For each of these comparisons we want to calculate a p value. For eachcomparison there are two groups. We will refer to group one as thegroup whose results are in the first row of each comparison above. Wewill refer to group two as the group whose results are in the secondrow of each comparison above. Before we can do that we must firstcompute a standard error and a t-score. We will find general formulaewhich is necessary in order to do all three calculations at once.

We assume that the means for the first group are defined in a variablecalled m1. The means for the second group are defined in a variablecalled m2. The standard deviations for the first group are in avariable called sd1. The standard deviations for the second group arein a variable called sd2. The number of samples for the first groupare in a variable called num1. Finally, the number of samples for thesecond group are in a variable called num2.

With these definitions the standard error is the square root of(sd1^2)/num1+(sd2^2)/num2. The associated t-score is m1 minus m2all divided by the standard error. The R comands to do this can befound below:

> m1 <- c(10,12,30)> m2 <- c(10.5,13,28.5)> sd1 <- c(3,4,4.5)> sd2 <- c(2.5,5.3,3)> num1 <- c(300,210,420)> num2 <- c(230,340,400)> se <- sqrt(sd1*sd1/num1+sd2*sd2/num2)> t <- (m1-m2)/se

To see the values just type in the variable name on a line alone:

> m1[1] 10 12 30> m2[1] 10.5 13.0 28.5> sd1[1] 3.0 4.0 4.5> sd2[1] 2.5 5.3 3.0> num1[1] 300 210 420> num2[1] 230 340 400> se[1] 0.2391107 0.3985074 0.2659216> t[1] -2.091082 -2.509364 5.640761

To use the pt command we need to specify the number of degrees offreedom. This can be done using the pmin command. Note that there isalso a command called min, but it does not work the same way. Youneed to use pmin to get the correct results. The numbers of degreesof freedom are pmin(num1,num2)-1. So the p values can be foundusing the following R command:

> pt(t,df=pmin(num1,num2)-1)[1] 0.01881168 0.00642689 0.99999998

If you enter all of these commands into R you should have noticed thatthe last p value is not correct. The pt command gives the probabilitythat a score is less that the specified t. The t-score for the lastentry is positive, and we want the probability that a t-score isbigger. One way around this is to make sure that all of the t-scoresare negative. You can do this by taking the negative of the absolutevalue of the t-scores:

> pt(-abs(t),df=pmin(num1,num2)-1)[1] 1.881168e-02 6.426890e-03 1.605968e-08

The results from the command above should give you the p values for aone-sided test. It is left as an exercise how to find the p values fora two-sided test.

10.4. The Easy Way

The methods above demonstrate how to calculate the p values directlymaking use of the standard formulae. There is another, more direct wayto do this using the t.test command. The t.test command takes adata set for an argument, and the default operation is to perform atwo sided hypothesis test.

> x = c(9.0,9.5,9.6,10.2,11.6)> t.test(x) One Sample t-testdata: xt = 22.2937, df = 4, p-value = 2.397e-05alternative hypothesis: true mean is not equal to 095 percent confidence interval: 8.737095 11.222905sample estimates:mean of x 9.98> help(t.test)>

That was an obvious result. If you want to test against adifferent assumed mean then you can use the mu argument:

> x = c(9.0,9.5,9.6,10.2,11.6)> t.test(x,mu=10) One Sample t-testdata: xt = -0.0447, df = 4, p-value = 0.9665alternative hypothesis: true mean is not equal to 1095 percent confidence interval: 8.737095 11.222905sample estimates:mean of x 9.98

If you are interested in a one sided test then you can specify whichtest to employ using the alternative option:

> x = c(9.0,9.5,9.6,10.2,11.6)> t.test(x,mu=10,alternative="less") One Sample t-testdata: xt = -0.0447, df = 4, p-value = 0.4833alternative hypothesis: true mean is less than 1095 percent confidence interval: -Inf 10.93434sample estimates:mean of x 9.98

The t.test() command also accepts a second data set to compare twosets of samples. The default is to treat them as independent sets, butthere is an option to treat them as dependent data sets. (Enterhelp(t.test) for more information.) To test two different samples,the first two arguments should be the data sets to compare:

> x = c(9.0,9.5,9.6,10.2,11.6)> y=c(9.9,8.7,9.8,10.5,8.9,8.3,9.8,9.0)> t.test(x,y) Welch Two Sample t-testdata: x and yt = 1.1891, df = 6.78, p-value = 0.2744alternative hypothesis true difference in means is not equal to 095 percent confidence interval: -0.6185513 1.8535513sample estimates:mean of x mean of y 9.9800 9.3625

Next Previous

Sponsorship

10. Calculating p Values — R Tutorial (2024)
Top Articles
Latest Posts
Article information

Author: Arielle Torp

Last Updated:

Views: 6264

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Arielle Torp

Birthday: 1997-09-20

Address: 87313 Erdman Vista, North Dustinborough, WA 37563

Phone: +97216742823598

Job: Central Technology Officer

Hobby: Taekwondo, Macrame, Foreign language learning, Kite flying, Cooking, Skiing, Computer programming

Introduction: My name is Arielle Torp, I am a comfortable, kind, zealous, lovely, jolly, colorful, adventurous person who loves writing and wants to share my knowledge and understanding with you.