Contents
Previous:Statistics 112 Next: Probability

TOPICS



Exploring and Comparing Data

Mean, Median, and Mode

[return to top]

This section deals with the examination of data sets. The following set of numbers represent exam grades from a statistics class.

85, 96, 75, 84, 65, 91, 78, 82, 80, 70, 80, 58, 71, 78, 98, 99, 75, 62, 75

Definitions:

A Frequency Table lists classes (or categories) of values along with counts of the number of values that fall within each class.

Lower class limits are the smallest numbers that CAN belong to different classes.

Upper class limits are the largest numbers that CAN belong to different classes.

Example:

The following frequency table was created from the following grades:
85, 96, 75, 84, 65, 91, 78, 82, 80, 70, 80, 58, 71, 78, 98, 99, 75, 62, 75
(Can you see why?)
Grades Frequency
50 - 59 1
60 - 69 2
70 - 79 7
80 - 89 5
90 - 100 4
What are the lower and upper class limits, the class midpoints?
Solution: The lower class limits are 50, 60, 70, 80, and 90. The upper class limits are 59, 69 ,79, 89, and 100. The mid points for each class are 54.5, 64.5, 74.5, 84.5, and 95.

Definition:

A histogram is a bar graph, in which the horizontal axis represents the class and the vertical axis represents the frequency.

Example:

Construct a histogram for the data in the table
Grades Frequency
50 - 59 1
60 - 69 2
70 - 79 7
80 - 89 5
90 - 100 4

Solution:

Definition:

The mean, denoted by x, is found by adding all the values and dividing by the number of values.

The median is the middle value.

The mode is the value that appears most. If two values occur with the same largest frequency then the set of values is bimodal. If there are more than two values that occur with the largest frequency then the set of values is multinomial.

Example:

What is the mean, median and mode of the numbers in
85, 96, 75, 84, 65, 91, 78, 82, 80, 70, 80, 58, 71, 78, 98, 99, 75, 62, 75 ?

Solution: The sum of all the values is 1502 and there are 19 values. The mean is: 1502/19 = 79.1. To find the median, the values MUST be listed in order. The list written from lowest to highest:
58, 62, 65, 70, 71, 75, 75, 75, 78, 78, 80, 80, 82, 84, 85, 91, 96, 98, 99
Since there are 19 numbers, the middle value is in the tenth position. The median is 78.

The value 75 occurs the most frequently (3 times). The mode is 75.

Try this!

Standard deviation

[return to top]

The standard deviation of a data set is a value that measures the ``spread" of the data. The standard deviation is a VERY powerful value that is used in many areas of statistics. When teachers say they are going to grade on the ``bell curve", they are talking about standard deviation. Here is what they do. They take the mean score and make that a grade of C. A "letter" grade for a particular score is found by using a formula that involves how far above or below the mean the score is. Next is an example that illustrates this.

Grading on the bell curve

A Statistics class takes an exam and receives the following grades:
85, 96, 75, 84, 65, 91, 78, 82, 80, 70, 80, 58, 71, 78, 98, 99, 75, 62, 75 .
The teacher writes on the board that the mean, x, is 79.1 and the standard deviation, s, is 11.5. The teacher says that a letter grade will be determined as follows:
If your score is between:          Letter Grade
x + s and 100 A
x + (s/2) and x + s B
x - (s/2) and x + (s/2) C
x -s and x - (s/2) D
0 and x - s F
How many students get a B?
Solution: Since x + (s/2) = 84.85 and x + s = 90.6, and there is only 1 grade (85) that falls between 84.85 and 90.6, there is 1 student who receives a B.
How was the value of 11.5 determine?

The 11.5 was determined by using the formula below.

Formula for calculating the standard deviation (of a sample)

$\displaystyle s=\sqrt{\dfrac{\sum(x-\bar{x})^2}{n-1}}$ (1.1)

This formula can be very lengthy to use if there are many data values. The TI-83 has a nice built in feature that will do these calculation for us.

Computing the standard deviation using the TI-83

  1. First enter the data in list L1.
  2. Press the STAT and select CALC.
  3. Select 1-Var Stats and press Enter.
  4. The value of Sx is the sample standard deviation.

The next example illustrates how to use the formula for standard deviation. In addition, it shows that two sets of values can have the same means, medians and modes, but different standard deviations.

Example:

Consider the sets of values:
Data1 = { 85, 96, 75, 84, 65, 91, 78, 82, 80, 70, 80, 58, 71, 78, 98, 99, 75, 62, 75 }

Data2 = { 61, 53, 54, 75, 99, 98, 98, 96, 78, 57, 90, 75, 93, 51, 75, 96, 99, 59, 95 }
Compute the mean, median, mode, and standard deviation for each set of values. Draw a histogram for each set of values. How are the Histograms related to the standard deviations?
Solution: The mean, median, and mode are 79.1, 78, and 75 for both data sets. To compute the standard deviation we can use formula (1.1), but using the TI-83 makes quick work of the calculations. Entering the data1 into the TI-83 and doing a 1-Var Stats you should get the following screen.

\includegraphics{ti-data1}

Similarly, for Data2 the screen looks like:

\includegraphics{ti-data2}

Notice in this example is that the mean, median and mode for Data1 and Data2 are exactly the same, while the standard deviations are different. The standard deviation for Data1 and Data2 are 11.5 and 18.3, respectively. This can be seen by the spread in the histograms for each data set. The histogram for Data1 is more centrally located than the histogram for Data2.

Grading on the bell curve (cont.)

Using the grading scale from the previous example with the Data2 grades, 61, 53, 54, 75, 99, 98, 98, 96, 78, 57, 90, 75, 93, 51, 75, 96, 99, 59, 95 how many people get a B in this class?
Solution: Using a mean of 79.1 and a standard deviation 18.3, a grade of B will be given if the test score is in between 88.25(= x + s/2) and 97.4(= x + s). Looking at the data set we see that are 5 (96, 90, 93, 96, 95) scores that get a B.

Try this!

The standard deviation has a very useful meaning when the histogram for the data has a Bell-Shaped appearance. As you saw with the two sets of grades that the histograms for Data2 was not Bell-Shaped. Using the "grading scale" was not appropriate to use in this case. (Would you want a B if your test grade was a 97?) Below is a rule that that can be used if the data has a bell shaped distribution.

Empirical Rule (68-95-99.7) - Some facts about data that has a Bell-Shaped Distribution

About 68% of data is within 1 standard deviation of the mean.

About 95% of data is within 2 standard deviations of the mean.

About 99.7% of data is within 3 standard deviations of the mean.

What does the Empirical rule mean?

The heights of men have a bell-shaped distribution with a mean of 69.0 inches and a standard deviation of 2.8 inches. What percentage of men have heights between 63.4 inches and 74.6 inches?
Solution: Since 63.4 is 2 standard deviations below the mean (69 - 63.4 = 5.8 = 2*(2.8)) and 74.6 is 2 standard deviations above the mean (why?), it follows from the Empirical Rule that 95% of the men have heights between 63.4 and 74.6 inches.

What happens to the percentage if the heights were 65.4 and 72.3 instead of 63.4 and 74.2?

The method to deal with intervals whose endpoints may not be 1, 2, or 3 standard deviations above or below the mean involves a concept of Z-scores. (See Normal Probability Distributions)


Created by Jim Beuerle