Statistical Analysis with Excel® For Dummies®, 4^th Edition

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. Excel is a registered trademark of Microsoft Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2016943716

ISBN: 978-1-119-27115-4; 978-1-119-27116-1 (ebk); 978-1-119-27117-8 (ebk)

Statistical Analysis with Excel® For Dummies®

To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Statistical Analysis with Excel For Dummies Cheat Sheet” in the Search box.

Table of Contents
1. Cover
2. Introduction
  1. About This Book
  2. What You Can Safely Skip
  3. Foolish Assumptions
  4. How This Book Is Organized
  5. Icons Used in This Book
  6. Where to Go from Here
3. Part 1: Getting Started with Statistical Analysis with Excel: A Marriage Made in Heaven
  1. Chapter 1: Evaluating Data in the Real World
    1. The Statistical (and Related) Notions You Just Have to Know
    2. Inferential Statistics: Testing Hypotheses
    3. What’s New in Excel 2016?
    4. What’s Old in Excel 2016?
    5. Knowing the Fundamentals
    6. What’s New in This Edition?
  2. Chapter 2: Understanding Excel’s Statistical Capabilities
    1. Getting Started
    2. Setting Up for Statistics
    3. Accessing Commonly Used Functions
4. Part 2: Describing Data
  1. Chapter 3: Show and Tell: Graphing Data
    1. Why Use Graphs?
    2. Some Fundamentals
    3. Excel’s Graphics (Chartics?) Capabilities
    4. Becoming a Columnist
    5. Slicing the Pie
    6. Drawing the Line
    7. Adding a Spark
    8. Passing the Bar
    9. The Plot Thickens
    10. Finding Another Use for the Scatter Chart
    11. Tasting the Bubbly
    12. Taking Stock
    13. Scratching the Surface
    14. On the Radar
    15. Growing a Treemap and Bursting Some Sun
    16. Building a Histogram
    17. Ordering Columns: Pareto
    18. Of Boxes and Whiskers
    19. 3D Maps
  2. Chapter 4: Finding Your Center
    1. Means: The Lore of Averages
    2. Medians: Caught in the Middle
    3. Statistics à la Mode
  3. Chapter 5: Deviating from the Average
    1. Measuring Variation
    2. Back to the Roots: Standard Deviation
    3. Related Functions
  4. Chapter 6: Meeting Standards and Standings
    1. Catching Some Zs
    2. Where Do You Stand?
  5. Chapter 7: Summarizing It All
    1. Counting Out
    2. The Long and Short of It
    3. Getting Esoteric
    4. Tuning In the Frequency
    5. Can You Give Me a Description?
    6. Be Quick About It!
    7. Instant Statistics
  6. Chapter 8: What’s Normal?
    1. Hitting the Curve
    2. A Distinguished Member of the Family
    3. Graphing a Standard Normal Distribution
5. Part 3: Drawing Conclusions from Data
  1. Chapter 9: The Confidence Game: Estimation
    1. Understanding Sampling Distributions
    2. An EXTREMELY Important Idea: The Central Limit Theorem
    3. The Limits of Confidence
    4. Fit to a t
  2. Chapter 10: One-Sample Hypothesis Testing
    1. Hypotheses, Tests, and Errors
    2. Hypothesis Tests and Sampling Distributions
    3. Catching Some Z’s Again
    4. t for One
    5. Visualizing a t-Distribution
    6. Testing a Variance
    7. Visualizing a Chi-Square Distribution
  3. Chapter 11: Two-Sample Hypothesis Testing
    1. Hypotheses Built for Two
    2. Revisited
    3. t for Two
    4. A Matched Set: Hypothesis Testing for Paired Samples
    5. Testing Two Variances
    6. Visualizing the F-Distribution
  4. Chapter 12: Testing More Than Two Samples
    1. Testing More Than Two
    2. Another Kind of Hypothesis, Another Kind of Test
  5. Chapter 13: Slightly More Complicated Testing
    1. Cracking the Combinations
    2. Cracking the Combinations Again
    3. Two Kinds of Variables … at Once
    4. Using Excel with a Mixed Design
    5. Graphing the Results
    6. After the ANOVA
  6. Chapter 14: Regression: Linear and Multiple
    1. The Plot of Scatter
    2. Graphing Lines
    3. Regression: What a Line!
    4. Worksheet Functions for Regression
    5. Data Analysis Tool: Regression
    6. Juggling Many Relationships at Once: Multiple Regression
    7. Excel Tools for Multiple Regression
  7. Chapter 15: Correlation: The Rise and Fall of Relationships
    1. Scatterplots Again
    2. Understanding Correlation
    3. Correlation and Regression
    4. Testing Hypotheses About Correlation
    5. Worksheet Functions for Correlation
    6. Data Analysis Tool: Correlation
    7. Data Analysis Tool: Covariance
    8. Testing Hypotheses About Correlation
  8. Chapter 16: It’s About Time
    1. A Series and Its Components
    2. A Moving Experience
    3. How To Be a Smoothie, Exponentially
    4. One-Click Forecasting!
  9. Chapter 17: Non-Parametric Statistics
    1. Independent Samples
    2. Matched Samples
    3. Correlation: Spearman’s r_S
    4. A Heads-Up
6. Part 4: Probability
  1. Chapter 18: Introducing Probability
    1. What Is Probability?
    2. Compound Events
    3. Conditional Probability
    4. Large Sample Spaces
    5. Worksheet Functions
    6. Random Variables: Discrete and Continuous
    7. Probability Distributions and Density Functions
    8. The Binomial Distribution
    9. Worksheet Functions
    10. Hypothesis Testing with the Binomial Distribution
    11. The Hypergeometric Distribution
  2. Chapter 19: More on Probability
    1. Discovering Beta
    2. Poisson
    3. Working with Gamma
    4. Exponential
  3. Chapter 20: A Career in Modeling
    1. Modeling a Distribution
    2. A Simulating Discussion
7. Part 5: The Part of Tens
  1. Chapter 21: Ten Statistical and Graphical Tips and Traps
    1. Significant Doesn’t Always Mean Important
    2. Trying to Not Reject a Null Hypothesis Has a Number of Implications
    3. Regression Isn’t Always Linear
    4. Extrapolating Beyond a Sample Scatterplot Is a Bad Idea
    5. Examine the Variability Around a Regression Line
    6. A Sample Can Be Too Large
    7. Consumers: Know Your Axes
    8. Graphing a Categorical Variable as Though It’s a Quantitative Variable Is Just Wrong
    9. Whenever Appropriate, Include Variability in Your Graph
    10. Be Careful When Relating Statistics Textbook Concepts to Excel
  2. Chapter 22: Ten Things (Twelve, Actually) That Just Didn’t Fit in Any Other Chapter
    1. Graphing the Standard Error of the Mean
    2. Probabilities and Distributions
    3. Drawing Samples
    4. Testing Independence: The True Use of CHISQ.TEST
    5. Logarithmica Esoterica
    6. Sorting Data
  3. Appendix A: When Your Worksheet Is a Database
    1. Introducing Excel Databases
    2. Counting and Retrieving
    3. Arithmetic
    4. Statistics
    5. Pivot Tables
  4. Appendix B: The Analysis of Covariance
    1. Covariance: A Closer Look
    2. Why You Analyze Covariance
    3. How You Analyze Covariance
    4. ANCOVA in Excel
    5. And One More Thing
8. About the Author
9. Advertisement Page
10. Connect with Dummies
11. End User License Agreement

Introduction

What? Yet another statistics book? Well … this is a statistics book, all right, but in my humble (and thoroughly biased) opinion, it’s not just another statistics book.

What? Yet another Excel book? Same thoroughly biased opinion — it’s not just another Excel book. What? Yet another edition of a book that’s not just another statistics book and not just another Excel book? Well … yes. You got me there.

So here’s the story — for the previous three editions and for this one. Many statistics books teach you the concepts but don’t give you a way to apply them. That often leads to a lack of understanding. With Excel, you have a ready-made package for applying statistics concepts.

Looking at it from the opposite direction, many Excel books show you Excel’s capabilities but don’t tell you about the concepts behind them. Before I tell you about an Excel statistical tool, I give you the statistical foundation it’s based on. That way, you understand the tool when you use it — and you use it more effectively.

I didn’t want to write a book that’s just “select this menu” and “click this button.” Some of that is necessary, of course, in any book that shows you how to use a software package. My goal was to go way beyond that.

I also didn’t want to write a statistics “cookbook” — when-faced-with-problem-#310-use-statistical-procedure-#214. My goal was to go way beyond that, too.

Bottom line: This book isn’t just about statistics or just about Excel — it sits firmly at the intersection of the two. In the course of telling you about statistics, I cover every Excel statistical feature. (Well … almost. I left one out. I left it out of the first three editions, too. It’s called “Fourier Analysis.” All the necessary math to understand it would take a whole book, and you might never use this tool, anyway.)

About This Book

Although statistics involves a logical progression of concepts, I organized this book so you can open it up in any chapter and start reading. The idea is for you to find what you’re looking for in a hurry and use it immediately — whether it’s a statistical concept or an Excel tool.

On the other hand, cover to cover is okay if you’re so inclined. If you’re a statistics newbie and you have to use Excel for statistical analysis, I recommend you begin at the beginning — even if you know Excel pretty well.

What You Can Safely Skip

Any reference book throws a lot of information at you, and this one is no exception. I intend it all to be useful, but I don’t aim it all at the same level. So if you’re not deeply into the subject matter, you can avoid paragraphs marked with the Technical Stuff icon.

Every so often, you’ll run into sidebars. They provide information that elaborates on a topic, but they’re not part of the main path. If you’re in a hurry, you can breeze past them.

Because I wrote this book so you can open it up anywhere and start using it, step-by-step instructions appear throughout. Many of the procedures I describe have steps in common. After you go through some of the procedures, you can probably skip the first few steps when you come to a procedure you haven’t been through before.

Foolish Assumptions

This is not an introductory book on Excel or on Windows, so I’m assuming:

You know how to work with Windows. I don’t spell out the details of pointing, clicking, selecting, and so forth.
You have Excel 2016 installed on your Windows computer or on your Mac and you can work along with the examples. I don’t walk you through the steps of Excel installation.
You’ve worked with Excel, and you understand the essentials of worksheets and formulas.

If you don’t know much about Excel, consider looking into Greg Harvey’s excellent Excel books in the For Dummies series.

How This Book Is Organized

I’ve organized this book into five parts and four appendixes (including two that you can find on this book’s companion website at www.statisticalanalysiswexcel4e).

Part 1: Getting Started with Statistical Analysis with Excel: A Marriage Made In Heaven

In Part 1, I provide a general introduction to statistics and to Excel’s statistical capabilities. I discuss important statistical concepts and describe useful Excel techniques. If it’s a long time since your last course in statistics or if you’ve never had a statistics course at all, start here. If you haven’t worked with Excel’s built-in functions (of any kind), definitely start here.

Part 2: Describing Data

Part of statistics is to take sets of numbers and summarize them in meaningful ways. Here’s where you find out how to do that. We all know about averages and how to compute them. But that’s not the whole story. In this part, I tell you about additional statistics that fill in the gaps, and I show you how to use Excel to work with those statistics. I also introduce Excel graphics in this part.

Part 3: Drawing Conclusions from Data

Part 3 addresses the fundamental aim of statistical analysis: to go beyond the data and help decision-makers make decisions. Usually, the data are measurements of a sample taken from a large population. The goal is to use these data to figure out what’s going on in the population.

This opens a wide range of questions: What does an average mean? What does the difference between two averages mean? Are two things associated? These are only a few of the questions I address in Part 3, and I discuss the Excel functions and tools that help you answer them.

Part 4: Working with Probability

Probability is the basis for statistical analysis and decision-making. In Part 4, I tell you all about it. I show you how to apply probability, particularly in the area of modeling. Excel provides a rich set of built-in capabilities that help you understand and apply probability. Here’s where you find them.

Part 5: The Part of Tens

Part 5 meets two objectives. First, I get to stand on the soapbox and rant about statistical peeves and about helpful hints. The peeves and hints total up to ten. Also, I discuss ten (okay, 12) Excel things I couldn’t fit into any other chapter. They come from all over the world of statistics. If it’s Excel and statistical, and if you can’t find it anywhere else in the book, you’ll find it here.

As I said in the first three editions — pretty handy, this Part of Tens.

Appendix A: When Your Worksheet Is a Database

In addition to performing calculations, Excel serves another purpose: recordkeeping. Although it’s not a dedicated database, Excel does offer some database functions. Some of them are statistical in nature. I introduce Excel database functions in Appendix A, along with pivot tables that allow you to turn your database inside out and look at your data in different ways.

Appendix B: The Analysis of Covariance

The Analysis of Covariance (ANCOVA) is a statistical technique that combines two other techniques: analysis of variance and regression analysis. If you know how two variables are related, you can use that knowledge in some nifty ways, and this is one of the ways. The kicker is that Excel doesn’t have a built-in tool for ANCOVA — but I show you how to use what Excel does have so you can get the job done.

Bonus Appendix B1: When Your Data Live Elsewhere

This appendix is all about importing data into Excel — from the web, from databases, from text, and from PDF documents.

Bonus Appendix B2: Tips for Teachers (and Learners)

Excel is terrific for managing, manipulating, and analyzing data. It’s also a great tool for helping people understand statistical concepts. This appendix covers some ways for using Excel to do just that.

Icons Used in This Book

As is the case with all For Dummies books, icons appear all over the place. Each one is a little picture in the margin that lets you know something special about the paragraph it’s next to.

This icon points out a hint or a shortcut that can help you in your work and make you an all-around better human being.

This one points out timeless wisdom to take with you long after you finish this book, young Jedi.

Pay attention to this icon. It’s a reminder to avoid something that might gum up the works for you.

As I mention earlier, in the section “What You Can Safely Skip,” this icon indicates material you can blow right past if statistics and Excel aren’t your passion.

Where to Go from Here

You can start the book anywhere, but here are a few hints. Want to learn the foundations of statistics? Turn the page. Introduce yourself to Excel’s statistical features? That’s Chapter 2. Want to start with graphics? Hit Chapter 3. For anything else, find it in the table of contents or in the index and go for it.

In addition to what you’re reading right now, this book also comes with a free, access-anywhere Cheat Sheet that will help you quickly use the tools I discuss. To get this Cheat Sheet, visit www.dummies.com and search for “Statistical Analysis with Excel For Dummies Cheat Sheet” in the Search box. And don’t forget to check out the bonus content on this book’s companion website at www.dummies.com/go/statisticalanalysiswexcel4e.

Part 1

Getting Started with Statistical Analysis with Excel: A Marriage Made in Heaven

IN THIS PART …

Find out about Excel’s statistical capabilities

Explore how to work with populations and samples

Test your hypotheses

Understand errors in decision making

Determine independent and dependent variables

Chapter 1

Evaluating Data in the Real World

IN THIS CHAPTER

Introducing statistical concepts

Generalizing from samples to populations

Getting into probability

Making decisions

New and old features in Excel 2016

Understanding important Excel fundamentals

The field of statistics is all about decision-making — decision-making based on groups of numbers. Statisticians constantly ask questions: What do the numbers tell us? What are the trends? What predictions can we make? What conclusions can we draw?

To answer these questions, statisticians have developed an impressive array of analytical tools. These tools help us to make sense of the mountains of data that are out there waiting for us to delve into, and to understand the numbers we generate in the course of our own work.

The Statistical (and Related) Notions You Just Have to Know

Because intensive calculation is often part and parcel of the statistician’s tool set, many people have the misconception that statistics is about number crunching. Number crunching is just one small part of the path to sound decisions, however.

By shouldering the number-crunching load, software increases our speed of traveling down that path. Some software packages are specialized for statistical analysis and contain many of the tools that statisticians use. Although not marketed specifically as a statistical package, Excel provides a number of these tools, which is why I wrote this book.

I said that number crunching is a small part of the path to sound decisions. The most important part is the concepts statisticians work with, and that’s what I talk about for most of the rest of this chapter.

Samples and populations

On election night, TV commentators routinely predict the outcome of elections before the polls close. Most of the time they’re right. How do they do that?

The trick is to interview a sample of voters after they cast their ballots. Assuming the voters tell the truth about whom they voted for, and assuming the sample truly represents the population, network analysts use the sample data to generalize to the population of voters.

This is the job of a statistician — to use the findings from a sample to make a decision about the population from which the sample comes. But sometimes those decisions don’t turn out the way the numbers predicted. History buffs are probably familiar with the memorable picture of President Harry Truman holding up a copy of the Chicago Daily Tribune with the famous, but wrong, headline “Dewey Defeats Truman” after the 1948 election. Part of the statistician’s job is to express how much confidence he or she has in the decision.

Another election-related example speaks to the idea of the confidence in the decision. Pre-election polls (again, assuming a representative sample of voters) tell you the percentage of sampled voters who prefer each candidate. The polling organization adds how accurate it believes the polls are. When you hear a newscaster say something like “accurate to within 3 percent,” you’re hearing a judgment about confidence.

Here’s another example. Suppose you’ve been assigned to find the average reading speed of all fifth-grade children in the United States but you haven’t got the time or the money to test them all. What would you do?

Your best bet is to take a sample of fifth-graders, measure their reading speeds (in words per minute), and calculate the average of the reading speeds in the sample. You can then use the sample average as an estimate of the population average.

Estimating the population average is one kind of inference that statisticians make from sample data. I discuss inference in more detail in the upcoming section “Inferential Statistics: Testing Hypotheses.”

Here’s some terminology you have to know: Characteristics of a population (like the population average) are called parameters, and characteristics of a sample (like the sample average) are called statistics. When you confine your field of view to samples, your statistics are descriptive. When you broaden your horizons and concern yourself with populations, your statistics are inferential.

And here’s a notation convention you have to know: Statisticians use Greek letters (μ, σ, ρ) to stand for parameters, and English letters , s, r) to stand for statistics. Figure 1-1 summarizes the relationship between populations and samples, and parameters and statistics.

FIGURE 1-1: The relationship between populations, samples, parameters, and statistics.

Variables: Dependent and independent

Simply put, a variable is something that can take on more than one value. (Something that can have only one value is called a constant.) Some variables you might be familiar with are today’s temperature, the Dow Jones Industrial Average, your age, and the value of the dollar against the euro.

Statisticians care about two kinds of variables: independent and dependent. Each kind of variable crops up in any study or experiment, and statisticians assess the relationship between them.

For example, imagine a new way of teaching reading that’s intended to increase the reading speed of fifth-graders. Before putting this new method into schools, it would be a good idea to test it. To do that, a researcher would randomly assign a sample of fifth-grade students to one of two groups: One group receives instruction via the new method, and the other receives instruction via traditional methods. Before and after both groups receive instruction, the researcher measures the reading speeds of all the children in this study. What happens next? I get to that in the upcoming section “Inferential Statistics: Testing Hypotheses.”

For now, understand that the independent variable here is Method of Instruction. The two possible values of this variable are New and Traditional. The dependent variable is reading speed — which you might measure in words per minute.

In general, the idea is to find out if changes in the independent variable are associated with changes in the dependent variable.

In the examples that appear throughout the book, I show you how to use Excel to calculate various characteristics of groups of scores. Keep in mind that each time I show you a group of scores, I’m really talking about the values of a dependent variable.

Types of data

Data come in four kinds. When you work with a variable, the way you work with it depends on what kind of data it is.

The first variety is called nominal data. If a number is a piece of nominal data, it’s just a name. Its value doesn’t signify anything. A good example is the number on an athlete’s jersey. It’s just a way of identifying the athlete and distinguishing him or her from teammates. The number doesn’t indicate the athlete’s level of skill.

Next come ordinal data. Ordinal data are all about order, and numbers begin to take on meaning over and above just being identifiers. A higher number indicates the presence of more of a particular attribute than a lower number. One example is the Mohs scale: Used since 1822, it’s a scale whose values are 1 through 10; mineralogists use this scale to rate the hardness of substances. Diamond, rated at 10, is the hardest. Talc, rated at 1, is the softest. A substance that has a given rating can scratch any substance that has a lower rating.

What’s missing from the Mohs scale (and from all ordinal data) is the idea of equal intervals and equal differences. The difference between a hardness of 10 and a hardness of 8 is not the same as the difference between a hardness of 6 and a hardness of 4.

Interval data provide equal differences. Fahrenheit temperatures provide an example of interval data. The difference between 60 degrees and 70 degrees is the same as the difference between 80 degrees and 90 degrees.

Here’s something that might surprise you about Fahrenheit temperatures: A temperature of 100 degrees is not twice as hot as a temperature of 50 degrees. For ratio statements (twice as much as, half as much as) to be valid, zero has to mean the complete absence of the attribute you’re measuring. A temperature of 0 degrees F doesn’t mean the absence of heat — it’s just an arbitrary point on the Fahrenheit scale.

The last data type, ratio data, includes a meaningful zero point. For temperatures, the Kelvin scale gives ratio data. One hundred degrees Kelvin is twice as hot as 50 degrees Kelvin. This is because the Kelvin zero point is absolute zero, where all molecular motion (the basis of heat) stops. Another example is a ruler. Eight inches is twice as long as four inches. A length of zero means a complete absence of length.

Any of these data types can form the basis for an independent variable or a dependent variable. The analytical tools you use depend on the type of data you’re dealing with.

A little probability

When statisticians make decisions, they express their confidence about those decisions in terms of probability. They can never be certain about what they decide. They can only tell you how probable their conclusions are.

So what is probability? The best way to attack this is with a few examples. If you toss a coin, what’s the probability that it comes up heads? Intuitively, you know that if the coin is fair, you have a 50-50 chance of heads and a 50-50 chance of tails. In terms of the kinds of numbers associated with probability, that’s ½.

How about rolling a die? (That’s one member of a pair of dice.) What’s the probability that you roll a 3? Hmmm… . A die has six faces and one of them is 3, so that ought to be , right? Right.

Here’s one more. You have a standard deck of playing cards. You select one card at random. What’s the probability that it’s a club? Well … a deck of cards has four suits, so that answer is ¼.

I think you’re getting the picture. If you want to know the probability that an event occurs, figure out how many ways that event can happen and divide by the total number of events that can happen. In each of the three examples, the event we are interested in (head, 3, or club) only happens one way.

Things can get a bit more complicated. When you toss a die, what’s the probability you roll a 3 or a 4? Now you’re talking about two ways the event you’re interested in can occur, so that’s (1 + 1)/6 = = . What about the probability of rolling an even number? That has to be 2, 4, or 6, and the probability is (1 + 1 + 1)/6 = = .

On to another kind of probability question. Suppose you roll a die and toss a coin at the same time. What’s the probability you roll a 3 and the coin comes up heads? Consider all the possible events that could occur when you roll a die and toss a coin at the same time. Your outcome could be a head and 1-6 or a tail and 1-6. That’s a total of 12 possibilities. The head-and-3 combination can happen only one way, so the answer is .

In general, the formula for the probability that a particular event occurs is

I begin this section by saying that statisticians express their confidence about their decisions in terms of probability, which is really why I brought up this topic in the first place. This line of thinking leads me to conditional probability — the probability that an event occurs given that some other event occurs. For example, suppose I roll a die, take a look at it (so that you can’t see it), and tell you I’ve rolled an even number. What’s the probability that I’ve rolled a 2? Ordinarily, the probability of a 2 is , but I’ve narrowed the field. I’ve eliminated the three odd numbers (1, 3, and 5) as possibilities. In this case, only the three even numbers (2, 4, and 6) are possible, so now the probability of rolling a 2 is .

Exactly how does conditional probability play into statistical analysis? Read on.

Inferential Statistics: Testing Hypotheses

In advance of doing a study, a statistician draws up a tentative explanation — a hypothesis — as to why the data might come out a certain way. After the study is complete and the sample data are all tabulated, he or she faces the essential decision a statistician has to make: whether or not to reject the hypothesis.

That decision is wrapped in a conditional probability question — what’s the probability of obtaining the data, given that this hypothesis is correct? Statistical analysis provides tools to calculate the probability. If the probability turns out to be low, the statistician rejects the hypothesis.

Suppose you’re interested in whether or not a particular coin is fair — whether it has an equal chance of coming up heads or tails. To study this issue, you’d take the coin and toss it a number of times — say, 100. These 100 tosses make up your sample data. Starting from the hypothesis that the coin is fair, you’d expect that the data in your sample of 100 tosses would show around 50 heads and 50 tails.

If it turns out to be 99 heads and 1 tail, you’d undoubtedly reject the fair coin hypothesis. Why? The conditional probability of getting 99 heads and 1 tail given a fair coin is very low. Wait a second. The coin could still be fair and you just happened to get a 99-1 split, right? Absolutely. In fact, you never really know. You have to gather the sample data (the results from 100 tosses) and make a decision. Your decision might be right, or it might not.

Juries face this dilemma all the time. They have to decide among competing hypotheses that explain the evidence in a trial. (Think of the evidence as data.) One hypothesis is that the defendant is guilty. The other is that the defendant is not guilty. Jury members have to consider the evidence and, in effect, answer a conditional probability question: What’s the probability of the evidence given that the defendant is not guilty? The answer to this question determines the verdict.

Null and alternative hypotheses

Consider once again the coin-tossing study I mention in the preceding section. The sample data are the results from the 100 tosses. Before tossing the coin, you might start with the hypothesis that the coin is a fair one so that you expect an equal number of heads and tails. This starting point is called the null hypothesis. The statistical notation for the null hypothesis is H₀. According to this hypothesis, any heads-tails split in the data is consistent with a fair coin. Think of it as the idea that nothing in the results of the study is out of the ordinary.

An alternative hypothesis is possible — that the coin isn’t a fair one, and it’s loaded to produce an unequal number of heads and tails. This hypothesis says that any heads-tails split is consistent with an unfair coin. The alternative hypothesis is called, believe it or not, the alternative hypothesis. The statistical notation for the alternative hypothesis is H₁.

With the hypotheses in place, toss the coin 100 times and note the number of heads and tails. If the results are something like 90 heads and 10 tails, it’s a good idea to reject H₀. If the results are around 50 heads and 50 tails, don’t reject H₀.Similar ideas apply to the reading-speed example I give earlier, in the section “Samples and populations.” One sample of children receives reading instruction under a new method designed to increase reading speed, and the other learns via a traditional method. Measure the children’s reading speeds before and after instruction, and tabulate the improvement for each child. The null hypothesis, H₀, is that one method isn’t different from the other. If the improvements are greater with the new method than with the traditional method — so much greater that it’s unlikely that the methods aren’t different from one another — reject H₀. If they’re not greater, don’t reject H₀.

Notice that I didn’t say “accept H₀.” The way the logic works, you never accept a hypothesis. You either reject H₀ or don’t reject H₀.

Here’s a real-world example to help you understand this idea. When a defendant goes on trial, he or she is presumed innocent until proven guilty. Think of innocent as H₀. The prosecutor’s job is to convince the jury to reject H₀. If the jurors reject, the verdict is guilty. If they don’t reject, the verdict is not guilty. The verdict is never innocent. That would be like accepting H₀.

Back to the coin-tossing example. Remember I said “around 50 heads and 50 tails” is what you could expect from 100 tosses of a fair coin. What does around mean? Also, I said if it’s 90-10, reject H₀. What about 85-15? 80-20? 70-30? Exactly how much different from 50-50 does the split have to be for you reject H₀? In the reading-speed example, how much greater does the improvement have to be to reject H₀?

I won’t answer these questions now. Statisticians have formulated decision rules for situations like this, and you explore those rules throughout the book.

Two types of error

Whenever you evaluate the data from a study and decide to reject H₀ or to not reject H₀, you can never be absolutely sure. You never really know what the true state of the world is. In the context of the coin-tossing example, that means you never know for certain if the coin is fair or not. All you can do is make a decision based on the sample data you gather. If you want to be certain about the coin, you’d have to have the data for the entire population of tosses — which means you’d have to keep tossing the coin until the end of time.

Because you’re never certain about your decisions, it’s possible to make an error regardless of what you decide. As I mention earlier in this chapter, the coin could be fair and you just happen to get 99 heads in 100 tosses. That’s not likely, and that’s why you reject H₀. It’s also possible that the coin is biased, yet you just happen to toss 50 heads in 100 tosses. Again, that’s not likely and you don’t reject H₀ in that case.

Although not likely, those errors are possible. They lurk in every study that involves inferential statistics. Statisticians have named them Type I and Type II.

If you reject H₀ and you shouldn’t, that’s a Type I error. In the coin example, that’s rejecting the hypothesis that the coin is fair, when in reality it is a fair coin.

If you don’t reject H₀ and you should have, that’s a Type II error. That happens if you don’t reject the hypothesis that the coin is fair and in reality it’s biased.

How do you know if you’ve made either type of error? You don’t — at least not right after you make your decision to reject or not reject H₀. (If it’s possible to know, you wouldn’t make the error in the first place!) All you can do is gather more data and see if the additional data are consistent with your decision.

If you think of H₀ as a tendency to maintain the status quo and not interpret anything as being out of the ordinary (no matter how it looks), a Type II error means you missed out on something big. Looked at in that way, Type II errors form the basis of many historical ironies.

Here’s what I mean: In the 1950s, a particular TV show gave talented young entertainers a few minutes to perform on stage and a chance to compete for a prize. The audience voted to determine the winner. The producers held auditions around the country to find people for the show. Many years after the show went off the air, the producer was interviewed. The interviewer asked him if he had ever turned down anyone at an audition whom he shouldn’t have.

“Well,” said the producer, “once a young singer auditioned for us and he seemed really odd.”

“In what way?” asked the interviewer.

“In a couple of ways,” said the producer. “He sang really loud, gyrated his body and his legs when he played the guitar, and he had these long sideburns. We figured this kid would never make it in show business, so we thanked him for showing up, but we sent him on his way.”

“Wait a minute — are you telling me you turned down …?”

“That’s right. We actually said no … to Elvis Presley!”

Now that’s a Type II error.

What’s New in Excel 2016?

Microsoft has made a few changes to Excel’s Ribbon (the tabbed band across the top), reflecting changes in Excel. The most obvious addition is the light bulb, at the top to the right of Add-ins. It’s labeled “Tell me what you want to do.” This is called the Tell Me box, and it’s a new way to connect to Excel Help. Type a phrase like Insert a chart into the Tell Me box, and Excel opens a menu whose choices include icons that you click to insert charts and to find help with inserting charts. Figure 1-2 shows this capability.

FIGURE 1-2: The interface in Excel 2016, showing the Tell Me box.

Sadly, this feature is not part of Excel 2016 for the Mac. This is the case for a number of other capabilities, too (like a couple I mention in the next paragraph). Overall, however, Mac users will find greater consistency across platforms than in previous editions.

Figure 1-2 shows the Insert tab, which incorporates a couple of changes in the Charts area. One addition is a set of Statistical Charts (which are not in the Mac version). Another is 3D Map, the new and improved Power View (which first appeared in Excel 2013 and will not be appearing in a Mac near you). I discuss these features in Chapter 3.

What’s Old in Excel 2016?

Each tab on the Ribbon presents groups of icon-labeled command buttons separated into categories. When you’re trying to figure out the capability a particular button activates, you can move the cursor to the button (without clicking) and helpful information pops up.

Clicking a button typically opens a whole category of possibilities. Buttons that do this are called category buttons.

Microsoft has developed shorthand for describing a mouse-click on a command button on the Ribbon, and I use that shorthand throughout this book. The shorthand is

Tab | Command Button

To indicate clicking on the Insert tab’s Recommended Charts category button, for example, I write

Insert | Recommended Charts

When I click that button (with some data-containing cells selected), the Insert Chart dialog box, shown in Figure 1-3, appears.

FIGURE 1-3: Clicking Insert | Recommended Charts opens this box.

Notice that its Recommended Charts tab is open. Clicking the All Charts tab (which is not in the Mac version) changes the box to what you see in Figure 1-4, a gallery of all possible Excel charts.

FIGURE 1-4: The All Charts tab in the Insert Chart dialog box.

Chart is Excel’s name for graph.

Incidentally, the All Charts tab shows five of the six charts new in Excel 2016: Waterfall, Treemap, Sunburst, Histogram, and Box & Whisker. (Pareto, the sixth new chart, is buried a bit deeper.) The last three are called “statistical charts. I cover statistical charts (and others) in Chapter 3.

To find the bulk of Excel’s statistical functionality, select

Formulas | More Functions | Statistical

This is an extension of the shorthand. It means, “Select the Formulas tab, click the More Functions button, and then select the Statistical Functions choice from the pop-up menu that opens.” Figure 1-5 shows what I mean.

FIGURE 1-5: Accessing the Statistical Functions menu.

In Chapter 2, I show you how to make the Statistical Functions menu more accessible.

In the 2010 version, Microsoft changed the way Excel names its functions. The objective was to make a function’s purpose as obvious as possible from its name. Excel also changed some of the programming behind these functions to make them more accurate.

Excel 2016 continues this naming style, and maintains the older statistical functions (pre-2010 vintage, and one – FORECAST – from 2013) for compatibility with older versions of Excel. So if you’re creating a spreadsheet for users of older Excel versions, use the older functions.

You won’t find them on the Statistical Functions menu. They have their own menu. To find it, select Formulas | More Functions | Compatibility.

I provide Table 1-1 to help you transition from older Excel versions. The table lists the old functions, their replacements, and the chapter in which I discuss the new function.

TABLE 1-1 Older Excel Statistical Functions, Their Replacements, and the Chapter That Deals with the New Function

Old Function	New Function	Chapter
`BETADIST`	`BETA.DIST`	19
`BETAINV`	`BETA.INV`	19
`BINOMDIST`	`BINOM.DIST`	18
`CRITBINOM`	`BINOM.INV`	18
`CHIDIST`	`CHISQ.DIST.RT`	10
`CHIINV`	`CHISQ.INV.RT`	10
`CHITEST`	`CHISQ.TEST`	20
`CONFIDENCE`	`CONFIDENCE.NORM`	9
`COVAR`	`COVARIANCE.P`	15
`EXPONDIST`	`EXPON.DIST`	19
`FDIST`	`F.DIST.RT`	11
`FINV`	`F.INV.RT`	11
`FTEST`	`F.TEST`	11
`FORECAST`	`FORECAST.LINEAR, FORECAST.ETS, FORECAST.ETS.CONFINT, FORECAST.ETS.SEASONALITY, FORECAST.ETS.STAT`	16
`GAMMADIST`	`GAMMA.DIST`	19
`GAMMAINV`	`GAMMA.INV`	19
`HYPGEOMDIST`	`HYPGEOM.DIST`	18
`LOGNORMDIST`	`LOGNORM.DIST`	22
`LOGINV`	`LOGNORM.INV`	22
`MODE`	`MODE.SNGL, MODE.MULT`	4
`NEGBINOMDIST`	`NEGBINOM.DIST`	18
`NORMDIST`	`NORM.DIST`	8
`NORMINV`	`NORM.INV`	8
`NORMSDIST`	`NORM.S.DIST`	8
`NORMSINV`	`NORM.S.INV`	8
`PERCENTILE`	`PERCENTILE.INC`	6
`PERCENTRANK`	`PERCENTRANK.INC`	6
`POISSON`	`POISSON.DIST`	19
`QUARTILE`	`QUARTILE.INC`	6
`RANK`	`RANK.EQ`	6
`STDEVP`	`STDEV.P`	5
`STDEV`	`STDEV.S`	5
`TDIST`	`T.DIST.2T`	10
`TDIST`	`T.DIST.RT`	10
`TINV`	`T.INV.2T`	9
`TTEST`	`T.TEST`	11
`VARP`	`VAR.P`	5
`VAR`	`VAR.S`	5
`WEIBULL`	`WEIBULL.DIST`	22
`ZTEST`	`Z.TEST`	10

The table shows that the FORECAST function has morphed into five functions in Excel 2016: FORECAST.LINEAR, FORECAST.ETS, FORECAST.ETS.CONFINT, FORECAST.ETS.SEASONALITY, and FORECAST.ETS.STAT. Along with Excel’s new one-click forecasting capability, I cover these functions in Chapter 16.

The most important addition in Excel 2016 is on the Macintosh side: After a long absence, the Analysis ToolPak returns to Excel 2016 for the Mac. Available in all Windows versions of Excel, the Analysis ToolPak is a free add-in that supplies analytic tools often found in dedicated statistical software packages. In previous Mac versions, intrepid users accessed a similar set of tools by downloading a third-party application that did not integrate with Excel in the same way as the Analysis ToolPak.

Mac users are a hearty lot, however, and they’ll be happy with this change in Excel 2016. (Have I done enough … Apple-polishing? Sorry.)

I cover the Analysis ToolPak in Chapter 2.