## Finding out how much cookie rankings overlap uncovers whether there seems to be a real difference between recipes

This article is one of a series of Experiments meant to teach students about how science is done, from generating a hypothesis to designing an experiment to analyzing the results with statistics. You can repeat the steps here and compare your results — or use this as inspiration to design your own experiment.

In my quest to bake the perfect gluten-free cookie, I obtained a list of numbers representing how taste testers ranked the appeal of my chocolate-chip varieties. In the previous post, I showed how to find the mean, the central value of each set of responses. Now I have to find out whether the apparent differences that I see in the mean scores will hold up. To do that, I have to apply a second mathematical test.

To recap: I had started by organizing my taste testers’ responses on a scale of one to five. (A score of 5 indicated they really liked the cookie; a score of 1 indicated that they hated it.) For each cookie type — the control with normal flour, the gluten-free-flour blend and the rice-flour cookie — I calculated a mean, or average, ranking from the tasters’ responses. The controls had a mean ranking of 3.83; the two gluten-free types ranked just 2.85 each.

Those numbers look different. But I can’t tell from those numbers alone if people thought there was a meaningful difference in the taste of my cookies. To do that I have to compare the groups of data using a second statistical test.

You might think of it this way: Cookies spread as they bake. When I collect data, they too can show a spread. Each person will rank a cookie slightly differently. If nearly everybody rated the cookies’ likeability as a 2, 3 or 4, then I have a spread in their rankings that spans from 2 to 4.

The more that the spread of rankings for one cookie overlaps with that of another, the less likely it will be that any apparent differences are due to one cookie being gluten-free. So when I analyze the data I have to compare not only the means, but also how much each set of data spreads out on either side of the mean.

To compare the spread of data in my three groups, I need to use a statistical test called an analysis of variance, or ANOVA. Researchers use it to compare more than two test conditions. In my case, I have three — a control and two types of gluten-free cookies. The ANOVA will measure how likely it is that any ranking difference is due to something other than the type of flour used to make that cookie.

Calculating an ANOVA requires knowing the means (which I now have). I also need to know how much my data spread out on either side of that mean. This is represented by a value called the standard error of the mean. It extrapolates the likely spread of responses for the whole population, based on the spread seen in the smaller population that I sampled.

To calculate this standard error, I need to start with a standard deviation. This is the amount that each set of data varies — or spreads — outside the mean for cookies made from the original (control) recipe. To get this spread, we need to take a square root. When someone squares a number, he or she multiplies a number by itself. The square root is the number that had been multiplied by itself to get the squared result. As an example, 5 multiplied by 5 is 25. Multiplying 5 by 5 is squaring it. That means the square root of 25 — written as √25 is 5.

The standard deviation is the square root of the variance. The variance is the average distance of each number from the mean, multiplied by itself.

Here’s an example: Let’s assume that only three people ate my control cookie. One gave its taste a score of 3, another gave it a 4 and the last rated it a 5. The mean of this set of numbers is 4. To get the variance, I have to find out how different each number in the set is from the mean (or the number 4).

For each data point, I took the distance from the mean and multiplied each number by itself to square it. I then added those numbers together and divided them by the number of data points. This gives me the variance. In the table below, you can see how I did this.

 Ranking Mean Distance from mean Distance from mean, squared 3 4 1 12 4 4 0 02 5 4 1 12 Average = (1 + 0 + 1) ÷ 3 Variance 0.67

The variance of this data set is 0.67. To find the standard deviation, I take the square root of the variance, which is written √0.67. The result, which you can find using a calculator (or using the “SQRT” function in Microsoft Excel), is 0.82. That is my standard deviation, the number representing the spread of numbers in my data set.

Now I can use that number to get my standard error of the mean. To do that, I take the standard deviation (or 0.8165), then divide it by the square root of the number of samples (3). The result is 0.47. This number is the probable spread of cookie rankings outside the mean that I could expect for anyone eating my cookie.

Below I have calculated the means, standard deviations, standard errors and the number of samples in each data set for my real cookie experiment.

 Control Gluten Free Blend Rice Flour Mean (Sum of rankings/total number of subjects) 3.83 2.85 2.85 Standard Deviation 0.95 1.26 1.24 Standard Error of the Mean 0.15 0.20 0.19 Number of subjects 41 41 41

You’ve now seen a lot of calculations that I needed to perform if I could hope to understand the meaning of my cookie-tasting tests. Stay with it. There are only a few more, because the numbers in this last table provide all the numbers I need to calculate my ANOVA — which I will do in the next post.

### Power Words

ANOVA     The acronym for analysis of variance, a statistical test to probe for differences between more than two test conditions.

average  (in science) A word for the arithmetic mean, or the sum of a group of numbers divided by the size of the group.

control     A part of an experiment where nothing changes. The control is essential to scientific experiments. It shows that any new effect must be due to only the part of the test that a researcher has altered. For example, if scientists were testing different types of fertilizer in a garden, they would want one section of to remain unfertilized, as the control. Its area would show how plants in this garden grow under normal conditions. And that give scientists something against which they can compare their experimental data.

gluten  A pair of proteins — gliadin and glutenin — joined together and found in wheat, rye, spelt and barley. The bound proteins give bread, cake and cookie doughs their elasticity and chewiness. Some people may not be able to comfortably tolerate gluten, however, because of a gluten allergy or celiac disease.

hypothesis  A proposed explanation for a phenomenon. In science, a hypothesis is an idea that hasn’t yet been rigorously tested. Once a hypothesis has been extensively tested and is generally accepted to be the accurate explanation for an observation, it becomes a scientific theory.

Likert scale  One of the most commonly used ways for ranking opinions or statements in surveys involving people. A issues a series of statements, such as “I like X,” “the test was easy,” or “it was too loud.” Participants then rate how well they agree by choosing from a range options that might range from “strongly agree” to “strongly disagree.”

mean  One of several measures of the “average size” of a data set. Most commonly used is the arithmetic mean, obtained by adding the data and dividing by the number of data points.

square (In geometry) a rectangle with four sides of equal length. (In mathematics) A number multiplied by itself, or the verb meaning to multiply a number by itself. The square of 2 is 4; the square of 10 is 100.

square root  A number that has been multiplied by itself. As an example, 5 multiplied by 5 = 25. So 5 is the square root of 25. Similarly 1 is the square root of 1, and 10 is the square root of 100.

statistics  The practice or science of collecting and analyzing numerical data in large quantities and interpreting their meaning. Much of this work involves reducing errors that might be attributable to random variation. A professional who works in this field is called a statistician.

statistical analysis  Mathematical processes that allow a scientists to make conclusions from a set of data.

statistical significance  In research, a result is significant (from a statistical point of view) if the likelihood that an observed difference between two or more conditions would not be due to chance. Obtaining a result that is statistically significant means there is a very high likelihood that any difference that is measured was not the result of random accidents.

standard deviation    (in statistics) The amount that a set of data varies from the mean.

standard error of the mean    (in statistics)The likely distribution of numbers, in a data set, based on a random sample.

variable  (in mathematics) A letter used in a mathematical expression that may take on more than one different value. (in experiments) A factor that can be changed, especially one allowed to change in a scientific experiment. For instance, when measuring how much insecticide it might take to kill a fly, researchers might change the dose or the age at which the insect is exposed. Both the dose and age would be variables in this experiment.

variance  (in mathematics) The average of the squared distances of each number from the mean of the number list.