Cookie Science 15: Results aren’t always sweet

Confronting your data often can mean facing complicated results

Fresh baked cookies in my second experiment. Those with the most xanthan gum are in purple (far left). The control cookies are in blue (far right).

Fresh baked cookies in my second experiment. Those with the most xanthan gum are in purple (far left). The control cookies are in blue (far right).

B. Brookshire/SSP

This article is one of a series of Experiments meant to teach students about how science is done, from generating a hypothesis to designing an experiment to analyzing the results with statistics. You can repeat the steps here and compare your results — or use this as inspiration to design your own experiment. 

In research, it can be hard not to get your hopes up. Especially when, like me, you baked 400 cookies to answer a scientific question. I wanted to find out how to make a cookie that my friend Natalie could safely eat. But sometimes, research doesn’t give us a clear answer. And that’s what happened in my latest experiment.

I did end up making delicious cookies. And I now know how to tweak my original recipe to bring the chewiness back to my gluten-free cookies. But as with many experiments, new results can raise more questions than they answer.

My first set of tests, seven months ago, showed that the cookies I had baked without gluten were flat, dry and decidedly un-chewy. To put the chew back in my cookies, I decided to add something to make my cookie dough more elastic. Research I read about suggested xanthan (ZAN-thun) gum would work. It’s a polymer — or long chain of repeating bunches of atoms.

But how much xanthan gum to add? I decided to experiment, using three different amounts. I baked up one control batch with wheat flour and another control batch with gluten-free flour. Then I baked three more gluten-free batches of cookies, each a little different. I added 1/2 a teaspoon of xanthan gum to one batch, 1 teaspoon to another and 5 teaspoons to the last batch.

I invited people to eat my cookies and tell me what they thought of them. I asked these taste testers to rank different qualities of each cookie, such as how sweet, chewy or dry they were.

I needed at least 39 people to taste and rank five cookies apiece. This time, I ended up bringing my cookies to my choir, the Capitol Hill Chorale.  It turns out that singers are pretty hungry after rehearsal. Within 15 minutes, I had almost 50 people munching away.

With my cookie rankings in hand, I used statistics — a way to analyze and interpret the cookie rankings.

Holding the cookies together

Without gluten to hold my dough together, my earlier gluten-free cookies baked up flat and dry.  So one measure in my new tests was how much the dough spread during baking. From the photo at the top of this post, it certainly looks like the gluten-free cookies (second from right) spread more than the one with a lot of xanthan gum (in purple, far left). But I can’t say if the results are significant without a little math.

So I calculated the arithmetic mean, or average — as I described in Cookie Science 8. Then I figured out the likely spread of all the cookies in the batch on either side of the mean — a measurement known as the standard error, which I showed in Cookie Science 9.

Then I performed an analysis of variance, or ANOVA to find differences between groups of data. This test will give me a p value.There is more about the p value in Cookie Science 10and an explainer on statistics.In general, a p value of less than five percent (shown as p < 0.05) is considered statistically significant. 

In the table at right you can see the ANOVA for how much the cookies from my five batches spread. My p value is 0.0001, less than 0.1 percent. This suggests that 99.9 percent of the time, I will not see differences this large by chance. This indicates that the xanthan gum I added probably did affect how much my cookies spread.

But this is an overall difference. It doesn’t tell me how one gluten-free batch without xanthan gum compares to another batch with the additive. For that I need to do a post-hoc test. I performed that Tukey’s range test (which I explained back in Cookie Science 10).

I graphed my data so that it’s easy to see the differences. The x axis line (across the bottom) shows how much xanthan gum each cookie batch contained. They axis (left side) shows how much each batch spread on average (in centimeters).

My Tukey’s test confirms what my eyes told me: The gluten-free control cookies (in yellow, second from left in my graph) were significantly wider than my wheat cookie controls (in blue, p = 0.001).  With ½ teaspoon of xanthan gum, my red batch was also wider than my control (p = 0.001). My results also show that the batch with 5 teaspoons of xanthan gum (in purple) is narrower than my control (p = 0.001). 

Here is a graph of how much my cookies spread with each amount of xanthan gum. You can see the yellow batch spread the most, and the purple batch the least. The asterisks indicate p values. Three asterisks show that a batch has a p value of 0.001 (less than a 0.1 percent chance my results are due to chance) compared to the control wheat cookie. The plus signs indicate a batch has a p value of 0.001 compared to the yellow gluten-free cookie. The red circle shows that my green cookies were not wider than my controls.B. Brookshire/SSP

But 1 teaspoon of xanthan gum was just right. This green batch (second from right in my graph) was not statistically different from my original-recipe chocolate chip cookie.

Then again, looks aren’t everything. What about taste?

This graph represents how people responded to “this cookie is sweet.” The y axis shows how tasters rated each cookie. B. Brookshire/SSP

My tasters ranked the five cookies according to how sweet they judged them. You can see in the graph above that there are no asterisks to indicate that one bar is significantly different from another. Neither gluten-free flour nor xanthan gum affected how sweet people judged the cookies.

Dry or chewy

In my earlier experiment, months ago, people found the gluten-free cookies dryer and less chewy. So they didn’t like the texture of these cookies as much as that of the regular chocolate-chip cookies (made with wheat). So in the new trial, I again asked people to rate the cookies’ texture. Their responses are in the slideshow below.

(Story continues below slideshow)

This graph represents how people responded to “I like this cookie’s texture.” The gluten-free cookie with 1 teaspoon on xanthan gum received the best texture rating. B. Brookshire/SSP
Here people responded to “this cookie is chewy.” Gluten-free cookies with 1/2 teaspoon and 1 teaspoon of xanthan gum were each chewier than my control wheat cookie. B. Brookshire/SSP
Here people respond to “this cookie is dry.” They identified no real difference between the cookies. B. Brookshire/SSP

And these graphs make the data complicated. The gluten-free cookies containing 1 teaspoon of xanthan gum were chewier and less dry. So I didn’t end up with a cookie that was as good as my control. It was better. My taste testers preferred it to the regular chocolate chip cookie.

And while people scored gluten-free cookies (yellow, second from left) much lower in that first experiment several months ago, my tasters now detected no difference at all.  

This graph represents how people responded to “overall, I like this cookie.” The graphs show no real difference between the different recipes. B. Brookshire/SSP

What can I conclude?

So adding 1 teaspoon of xanthan gum per batch to a gluten-free recipe makes a cookie that spreads as much as a normal chocolate-chip cookie. It also makes a chewier cookie and gives it a texture that people actually prefer to the original cookie. So I know if I send cookies to my friend Natalie, I’d better use 1 teaspoon of xanthan gum in each batch.

But with my wheat cookie and gluten-free control cookie, I ran into a problem. I could not replicate the results of my previous experiment. Next time, I will discuss the limitations of my experiment — and why it did not go the way I hoped.

Follow Eureka! Lab on Twitter

Power Words

(for more about Power Words, click here)

ANOVA     The acronym for analysis of variance, a statistical test to probe for differences between more than two test conditions.

average     (in science) A term for the arithmetic mean, which is the sum of a group of numbers that is then divided by the size of the group.

atom   The basic unit of a chemical element. Atoms are made up of a dense nucleus that contains positively charged protons and neutrally charged neutrons. The nucleus is orbited by a cloud of negatively charged electrons.

control     A part of an experiment where there is no change from normal conditions. The control is essential to scientific experiments. It shows that any new effect is likely due only to the part of the test that a researcher has altered. For example, if scientists were testing different types of fertilizer in a garden, they would want one section of it to remain unfertilized, as the control. Its area would show how plants in this garden grow under normal conditions. And that give scientists something against which they can compare their experimental data.

gluten  A pair of proteins — gliadin and glutenin — joined together and found in wheat, rye, spelt and barley. The bound proteins give bread, cake and cookie doughs their elasticity and chewiness. Some people may not be able to comfortably tolerate gluten, however, because of a gluten allergy or celiac disease.

hypothesis  A proposed explanation for a phenomenon. In science, a hypothesis is an idea that hasn’t yet been rigorously tested. Once a hypothesis has been extensively tested and is generally accepted to be the accurate explanation for an observation, it becomes a scientific theory.

polymer  Substances whose molecules are made of long chains of repeating groups of atoms. Manufactured polymers include nylon, polyvinyl chloride (better known as PVC) and many types of plastics. Natural polymers include rubber, silk and cellulose (found in plants and used to make paper, for example).

p value  (in research and statistics) This is the probability of seeing a difference as big or bigger than the one observed if there is no effect of the variable being tested. Scientists generally conclude that a p value of less than five percent (written 0.05) is statistically significant, or unlikely to occur due to some factor other than the one tested.

post-hoc test  (In statistics) An analysis of a data set after an experiment has concluded. A post-hoc test looks for patterns that were not necessarily predicted when the scientists began the experiment.

statistics  The practice or science of collecting and analyzing numerical data in large quantities and interpreting their meaning. Much of this work involves reducing errors that might be attributable to random variation. A professional who works in this field is called a statistician.

statistical analysis   A mathematical process that allows scientists to draw conclusions from a set of data.

statistical significance   In research, a result is significant (from a statistical point of view) if the likelihood that an observed difference between two or more conditions would not be due to chance. Obtaining a result that is statistically significant means there is a very high likelihood that any difference that is measured was not the result of random accidents.

standard deviation (in statistics) The amount that each set of data varies from the mean.

standard error of the mean (in statistics)The probable distribution of numbers based on a random sample.

Tukey’s range test  (in statistics) A test that compares all possible pairs of means to determine if they are significantly different from each other.

variable (in experiments)  A factor that can be changed, especially one allowed to change in a scientific experiment. For instance, when measuring how much insecticide it might take to kill a fly, researchers might change the dose or the age at which the insect is exposed. Both the dose and age would be variables in this experiment.

x axis  (in mathematics) The horizontal line at the bottom of a graph, which can be labeled to give information about what the graph represents.

xanthan gum  A hydrocolloid made by the bacterium Xanthomonas campestris. It is a long-chained polymer often used in baking to make substances more elastic.

y axis (in mathematics) The vertical line to the left or right of a graph, which can be labeled to give information about what the graph represents.

Bethany Brookshire was a longtime staff writer at Science News Explores and is the author of the book Pests: How Humans Create Animal Villains. She has a Ph.D. in physiology and pharmacology and likes to write about neuroscience, biology, climate and more. She thinks Porgs are an invasive species.