Introduction

Perhaps you’re taking a statistics class, or you’re about to take one. You may understand some of the basic ideas, but you have questions and want a place to go for a little extra help to give you an edge. And you also want a heads-up as to what instructors really think about when they write their exams. Well, look no further; help has arrived in the form of Statistics Workbook For Dummies, 2nd Edition.

This workbook helps you become more comfortable with and confident about statistics. Through plenty of practical problems that take you from step one all the way to your final exam, you review the concepts you know, identify areas where you need to focus more work, and address the little things that can make the difference between a B and an A.

As a statistics professor who has taught tens of thousands of students over the years, I have noticed that certain problems keep cropping up and causing my pen to take points off exams over and over again. And believe me: I want nothing more than to put my red pen away. So I give you all my secrets about what professors really want you to know, the kinds of questions they ask, and the types of answers they love and hate to see (so you can avoid the latter). And I focus only on the topics that you absolutely need to know, with minimal background information.

About This Book

The major objectives of this workbook are for you to understand, calculate, and interpret the most common statistical formulas and techniques; get a handle on basic probability; gain confidence with difficult statistical topics such as the central limit theorem and p-values; know which statistical technique to use in different situations (for example, when to employ what kind of confidence interval); and evaluate and pinpoint problems with studies, polls, and experiments.

Although I wrote this workbook to serve as a companion to Statistics for Dummies, 2nd Edition (also published by Wiley and written by yours truly), this workbook works quite well with any introductory statistics textbook.

You may be asking how this workbook is different from other workbooks on the shelf. Well, here are a few ways, listed in order of importance:

Plenty of excellent practice problems to lead you down the path of examination success, chosen by me, a card-carrying member of the “million statistics exam question writing and grading” club. I provide all answers at the end of each chapter.
Workspace for you to work through the problems directly in the section you’re working on, so you can easily refer to your notes later when you need them.
Not only answers, but also clear, complete explanations to go with them. Explanations help you know exactly how to approach a problem, what information you need to solve it, and common problems you need to avoid.
A view inside a professor’s mind to help you determine the most popular questions, the answers we look for, and the answers that make us pull our hair out.
Tips, strategies, and warnings based on my vast experience with students of all backgrounds and learning styles (and my grading experience).
An example accompanying each section, directly followed by the solution. Use the example as a reference when you work the other problems.
A focus on problem-solving skills to help you develop a problem-solving strategy when you take exams. I don’t show you how I would do the problems; I help you see how you can do the problems. And believe me, there’s a big difference!
The nonlinear approach allows you to skip around in the workbook and still have easy access to and understanding of any given topic.
Understandable language to help you process, remember, and put into practice statistical definitions, techniques, and processes.
Clear and concise step-by-step procedures that intuitively explain how to work through statistics problems and remember the process.

I also used a few conventions while writing this book that you should be aware of:

The most important convention that you need to be aware of deals with my dual use of the word “statistics.” In some situations, I refer to statistics as a subject of study or as a field of research. For example, “Statistics is really quite an interesting subject!” (Note I said statistics “is” in this case.) In other situations, I refer to statistics as the plural of statistic, in a numerical sense. For example, “The most common statistics are the mean and the standard deviation.” (Notice my use of the word “are” in this case.)
I also use data in a plural form (“the data are” rather than “the data is”). The battle rages on between statisticians over which way is right, but I go with the plural form.
I use Ho to represent the null hypothesis in a hypothesis test. Although this is a commonly used notation, others might use the notation H_o to mean the same thing.
I use * to indicate a multiplication sign.

Foolish Assumptions

This book is for you if you have some exposure to statistics already and want more opportunities to enjoy success through additional practice of the skills and techniques. Or perhaps you’re taking a statistics class and could use some extra support (and insider information). Or maybe you just really want to understand p-values because they keep you awake at night (been there, done that).

Note: If you’re totally new to the subject of statistics, I suggest that you first read Statistics for Dummies, 2nd Edition, (Wiley), because I cover the various concepts of statistics in much more detail in that book (but any introductory text will suffice). After you feel comfortable and confident with the material, you can try the problems in this workbook. Or, as an alternative, you can use this workbook to practice along with what you read in Statistics For Dummies, 2nd Edition.

Icons Used in This Book

Icons in this workbook draw your attention to certain features that occur on a regular basis. Think of them as road signs that you encounter on a trip. Here are the road signs you encounter on your journey through this workbook.

Each section of this workbook begins with a brief overview of the topic. After the intro, you see an example problem with a fully worked solution for use as a reference as you work the practice problems. You can quickly locate the example problems by looking for this icon.

I use this icon for particular ideas that I hope you’ll remember long after you read this workbook.

This icon points out helpful hints, ideas, or shortcuts that save you time or give you alternative ways to think about a particular concept. I also use this icon to “get down to the nitty-gritty” discussing the types of questions your instructor may ask you and why, revealing what instructors really look for in your answers, and giving you a heads-up on the types of errors that really make them nuts (so you can avoid them at all costs).

This icon refers to specific ways that you may get tripped up while working a certain kind of problem and how to avoid those problems. Commit these items to memory while it still doesn’t cost you any points (in other words, before the exam takes place).

Beyond the Book

Be sure to check out the free Cheat Sheet for a handy guide that covers tips and tricks for answering statistics questions. To get this Cheat Sheet, simply go to www.dummies.com and enter “Statistics Workbook For Dummies” in the Search box.

You also have the opportunity to complete online quizzes for Chapters 1 through 18 that test your knowledge of the concepts in each chapter. To gain access to the online practice, all you have to do is register by following these simple steps:

Register your book or ebook at Dummies.com to get your PIN. Go to www.dummies.com/go/getaccess.
Select your product from the dropdown list on that page.
Follow the prompts to validate your product, and then check your email for a confirmation message that includes your PIN and instructions for logging in.

If you do not receive this email within two hours, please check your spam folder before contacting us through our Technical Support website at http://support.wiley.com or by phone at 877-762-2974.

Now you’re ready to go! You can come back to the practice material as often as you want — simply log on with the username and password you created during your initial login. No need to enter the access code a second time.

Your registration is good for one year from the day you activate your PIN.

Where to Go from Here

I wrote this workbook in a nonlinear way, so you can start anywhere and still understand what’s happening. However, I can make some recommendations to readers who are interested in knowing where to start:

If you want to get right into the number-crunching aspects of statistics (finding the mean, median, standard deviation, and so on), I suggest starting with Part 1.
If you want to break down the normal distribution or the central limit theorem, go to Part 2.
If you’re most worried about confidence intervals and hypothesis tests, jump to Part 3.
If you want to develop your skills evaluating and making sense of the results of medical studies, polls, surveys, and experiments, start with Part 4.
If you want to nail down data collected on two variables (correlation and the like), head directly to Part 4.
If you want tips on math, common statistical formulas, or ways to spot statistical mistakes, head to Part 5.

Chapter 1

Summarizing Categorical Data: Counts and Percents

IN THIS CHAPTER

Making tables to summarize categorical data

Highlighting the difference between frequencies and relative frequencies

Interpreting and evaluating tables

Categorical data is data in which individuals are placed into groups or categories — for example gender, region, or type of movie. Summarizing categorical data involves boiling down all the information into just a few numbers that tell its basic story. Because categorical data involves pieces of data that belong in categories, you have to look at how many individuals fall into each group and summarize the numbers appropriately. In this chapter, you practice making, interpreting, and evaluating frequency and relative frequency tables for categorical data.

Counting On the Frequency

One way to summarize categorical data is to simply count, or tally up, the number of individuals that fall into each category. The number of individuals in any given category is called the frequency (or count) for that category. If you list all the possible categories along with the frequency for each, you create a frequency table. The total of all the frequencies should equal the size of the sample (because you place each individual in one category).

See the following for an example of summarizing data by using a frequency table.

Q. Suppose that you take a sample of 10 people and ask them all whether they own a cellphone. Each person falls into one of two categories: yes or no. The data are shown in the following table.

Person #	Cellphone	Person #	Cellphone
1	Y	6	Y
2	N	7	Y
3	Y	8	Y
4	N	9	N
5	Y	10	Y

Summarize this data in a frequency table.
What’s an advantage of summarizing categorical data?

A. Data summaries boil down the data quickly and clearly.

The frequency table for this data is shown in the following table.
A data summary allows you to see patterns in the data, which aren’t clear if you look only at the original data.

Own a Cellphone?	Frequency
Y	7
N	3
Total	10

1 You survey 20 shoppers to see what type of soft drink they like best, Brand A or Brand B. The results are: A, A, B, B, B, B, B, B, A, A, A, B, A, A, A, A, B, B, A, A. Which brand do the shoppers prefer? Make a frequency table and explain your answer.

2 A local city government asks voters to vote on a tax levy for the local school district. A total of 18,726 citizens vote on the issue. The yes count comes in at 10,479, and the rest of the voters said no.

Show the results in a frequency table.
Why is it important to include the total number at the bottom of a frequency table?

3 A zoo asks 1,000 people whether they’ve been to the zoo in the last year. The surveyors count that 592 say yes, 198 say no, and 210 don’t respond.

Show the results in a frequency table.
Explain why you need to include the people who don’t respond.

4 Suppose that instead of showing the number in each group, you show just the percentage (called a relative frequency). What’s one advantage a relative frequency table has over a frequency table?

Relating with Percentages

Another way to summarize categorical data is to show the percentage of individuals who fall into each category, thereby creating a relative frequency. The relative frequency of a given category is the frequency (number of individuals in that category) divided by the total sample size, multiplied by 100 to get the percentage. For example, if you survey 50 people and 10 are in favor of a certain issue, the relative frequency of the “in-favor” category is times 100, which gives you 20 percent. If you list all the possible categories along with their relative frequencies, you create a relative frequency table. The total of all the relative frequencies should equal 100 percent (subject to possible round-off error).

See the following for an example of summarizing data by using a relative frequency table.

Q. Using the cellphone data from the following table, make a relative frequency table and interpret the results.

Person #	Cellphone	Person #	Cellphone
1	Y	6	Y
2	N	7	Y
3	Y	8	Y
4	N	9	N
5	Y	10	Y

A. The following table shows a relative frequency table for the cellphone data. Seventy percent of the people sampled reported owning cellphones, and 30 percent admitted to being technologically behind the times.

Own a Cellphone?	Relative Frequency
Y	70%
N	30%

You get the 70 percent by taking , and you calculate the 30 percent by taking .

5 You survey 20 shoppers to see what type of soft drink they like best, Brand A or Brand B. The results are: A, A, B, B, B, B, B, B, A, A, A, B, A, A, A, A, B, B, A, A. Which brand do the shoppers prefer?

Use a relative frequency table to determine the preferred brand.
In general, if you had to choose, which is easier to interpret: frequencies or relative frequencies? Explain.

6 A local city government asked voters in the last election to vote on a tax levy for the local school district. A record 18,726 voted on the issue. The yes count came in at 10,479, and the rest of the voters checked the no box. Show the results in a relative frequency table.

7 A zoo surveys 1,000 people to find out whether they’ve been to the zoo in the last year. The surveyors count that 592 say yes, 198 say no, and 210 don’t respond. Make a relative frequency table and use it to find the response rate (percentage of people who respond to the survey).

8 Name one disadvantage that comes with creating a relative frequency table compared to using a frequency table.

Interpreting Counts and Percents with Caution

Not all summaries of categorical data are fair and accurate. Knowing what to look for can help you keep your eyes open for misleading and incomplete information.

Instructors often ask you to “interpret the results.” In this case, your instructor wants you to use the statistics available to talk about how they relate to the given situation. In other words, what do the results mean to the person who collects the data?

With relative frequency tables, don’t forget to check whether all categories sum to 1 or 100 percent (subject to round-off error), and remember to look for some indicator as to total sample size.

See the following for an example of critiquing a data summary.

Q. You watch a commercial where the manufacturer of a new cold medicine (“Nocold”) compares it to the leading brand. The results are shown in the following table.

How Nocold Compares	Percentage
Much better	47%
At least as good	18%

What kind of table is this?
Interpret the results. (Did the new cold medicine beat out the leading brand?)
What important details are missing from this table?

A. Much like the cold medicines I always take, the table about “Nocold” does “Nogood.”

This table is an incomplete relative frequency table. The remaining category is “not as good” for the Nocold brand, and the advertiser doesn’t show it. But you can do the math and see that of the people say that the leading brand is better.
If you put the two groups together, 65% of the patients say that Nocold is at least as good as the leading brand, and almost half of the patients say Nocold is much better.
What’s missing? The remaining percentage (to keep all possible results in perspective). But more importantly, the total sample size is missing. You don’t know whether the surveyors sampled 10 people, 100 people, or 1,000 people. This means that the precision of the results is unknown. (Precision means how consistent the results will be from sample to sample; it’s related to sample size, as you see in Chapter 10.)

9 Suppose that you ask 1,000 people to identify from a list of five vacation spots which ones they’ve already visited. The frequencies you receive are Disney World: 216; New Orleans: 312; Las Vegas: 418; New York City: 359; and Washington, D.C.: 188.

Explain why creating a traditional relative frequency table doesn’t make sense here.
How can you summarize this data with percents in a way that makes sense?

10 If you have only a frequency table, can you find the corresponding relative frequency table? Conversely, if you have only a relative frequency table, can you find the corresponding frequency table? Explain.

Answers to Problems in Summarizing Categorical Data

1 Eleven shoppers prefer Brand A, and nine shoppers prefer Brand B. The frequency table is shown in the following table. Brand A got more votes, but the results are pretty close.

Brand Preferred	Frequency
A	11
B	9
Total	20

2 Frequencies are fine for summarizing data as long as you keep the total number in perspective.

The results are shown in the following table. Because the total is 18,726, and the yes count is 10,479, the no count is the difference between the two, which is .
The total is important because it helps keep the frequencies in perspective when you compare them to each other.

Vote	Frequency
Y	10,479
N	8,247
Total	18,726

3 This problem shows the importance of reporting not only the results of participants who respond but also what percentage of the total actually respond.

The results are shown in the following table.
If you don’t show the nonrespondents, the total doesn’t add up to 1,000 (the number surveyed). An alternative way to show the data is to base it on only the respondents, but the results would be biased. You can’t definitively say that the nonrespondents would respond the same way as the respondents.

Gone to the Zoo in the Last Year?	Frequency
Y	592
N	198
Nonrespondents	210
Total	1,000

4 Showing the percents rather than counts means making a relative frequency table rather than a frequency table. One advantage of a relative frequency table is that everything sums to 100 percent, making it easier to interpret the results, especially if you have a large number of categories.

5 Relative frequencies do just what they say: They help you relate the results to each other (by finding percentages).

Eleven shoppers out of the 20 prefer Brand A, and nine shoppers out of the 20 prefer Brand B. The relative frequency table is shown in the following table. Brand A got more votes, but the results are pretty close, with 55 percent of the shoppers preferring Brand A, and 45 percent preferring Brand B.

Brand Preferred

Relative Frequency

A

55%

B

45%
You often have an easier time interpreting percents, because when you need to interpret counts, you have to put them in perspective in terms of “out of how many?”

Brand Preferred	Relative Frequency
A	55%
B	45%

6 The results are shown in the following table. The yes percentage is . Because the total is 100%, the no percentage is .

Vote	Relative Frequency
Y	55.96%
N	44.04%

7 You can see the relative frequency table that follows this answer. Knowing the response rate is critical for interpreting the results of a survey. The higher the response rate, the better. The response rate is – the total percentage of people who responded in any way (yes or no) to the survey. (Note that 21% is the nonresponse rate.)

Gone to the Zoo in the Last Year?	Relative Frequency
Y
N	19.8%
Nonrespondents	21.0%

8 One disadvantage of a relative frequency table is that if you see only the percents, you don’t know how many people participated in the study; therefore, you don’t know how precise the results are. You can get around this problem by putting the total sample size somewhere at the top or bottom of your relative frequency table.

When making a relative frequency table, include the total sample size somewhere on the table.

9 Be careful about how you interpret tables where an individual can be in more than one category at the same time.

The frequencies don’t sum to 1,000, because people have the option to choose multiple locations or none at all, so each person doesn’t end up in exactly one group. If you take the grand total of all the frequencies (1,493) and divide each frequency by 1,493 to get a relative frequency, the relative frequencies sum to 1 (or 100 percent). But what does that mean? It makes it hard to interpret these percents because they don’t account for the total number of people.
One way you can summarize this data is by showing the percentage of people who have been at each location separately (compared to the percentage who haven’t been there before). These percents add up to 1 for each location. The following table shows the results summarized with this method. Note: The table isn’t a relative frequency table; however, it uses relative frequencies.

Location	% Who Have Been There	% Who Haven’t Been There
Disney World
New Orleans		68.8%
Las Vegas		58.2%
New York City		64.1%
Washington, D.C.		81.2%

Not all tables involving percents should sum to 1. Don’t force tables to sum to 1 when they shouldn’t; do make sure you understand whether each individual can fall under more than one category. In those cases, a typical relative frequency table isn’t appropriate.

10 You can always sum all the frequencies to get a total and then find each relative frequency by taking the frequency divided by the total. However, if you have only the percents, you can’t go back and find the original counts unless you know the total number of individuals. Suppose that you know that 80 percent of the people in a survey like ice cream. How many people in the survey like ice cream? If the total number of respondents is 100, people like ice cream. If the total is 50, you’re looking at positive answers. If the total is 5, you deal only with . This illustrates why relative frequency tables need to have the total sample size somewhere.

Watch for total sample sizes when given a relative frequency table. Don’t be misled by percentages alone, thinking they’re always based on large sample sizes, because many are not.

Statistics Workbook For Dummies®

To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Statistics Workbook For Dummies Cheat Sheet” in the Search box.