// Numbas version: exam_results_page_options {"name": "Statistics - Frequency Tables, Measures of central tendency and Spread", "extensions": ["stats"], "custom_part_types": [], "resources": [], "navigation": {"allowregen": true, "showfrontpage": false, "preventleave": false}, "question_groups": [{"questions": [{"statement": "

Once we have gathered data two questions naturally arise:

\n
\n
1. Where is most of the data centralised?
2. \n
3. How spread out is the data?
4. \n
\n

Frequency distribution tables allow us to quickly and easily organise data and allow us to easily detemine the measures of central tendency and measures of spread. These measures allow us to gain valuable insight into our data set.

\n

\n

\n

30 random students were asked about the number of siblings they have. These are their responses:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
 $\\var{a[0]}$ $\\var{a[1]}$ $\\var{a[2]}$ $\\var{a[3]}$ $\\var{a[4]}$ $\\var{a[5]}$ $\\var{a[6]}$ $\\var{a[7]}$ $\\var{a[8]}$ $\\var{a[9]}$ $\\var{a[10]}$ $\\var{a[11]}$ $\\var{a[12]}$ $\\var{a[13]}$ $\\var{a[14]}$ $\\var{a[15]}$ $\\var{a[16]}$ $\\var{a[17]}$ $\\var{a[18]}$ $\\var{a[19]}$ $\\var{a[20]}$ $\\var{a[21]}$ $\\var{a[22]}$ $\\var{a[23]}$ $\\var{a[24]}$ $\\var{a[25]}$ $\\var{a[26]}$ $\\var{a[27]}$ $\\var{a[28]}$ $\\var{a[29]}$
\n

", "functions": {}, "preamble": {"js": "", "css": ""}, "extensions": ["stats"], "ungrouped_variables": ["a1", "modea1", "a2", "modea2", "a3", "modea3"], "contributors": [{"name": "Christian Lawson-Perfect", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/7/"}, {"name": "Stanislav Duris", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/1590/"}, {"name": "Paul Hancock", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/1738/"}], "metadata": {"licence": "Creative Commons Attribution 4.0 International", "description": "

Given a table of data, complete a frequency distribution table and use it to calculate the mean, mode, median and range.

Complete the following frequency table:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Number of siblingsffxcf
$0$[[0]][[7]][[15]]
$1$[[1]][[8]][[16]]
$2$[[2]][[9]][[17]]
$3$[[3]][[10]][[18]]
$4$[[4]][[11]][[19]]
$5$[[5]][[12]][[20]]
$6$[[6]][[13]][[21]]
Total$30$[[14]]$30$
\n

Find the mean, mode and median for this data.

\n

Mean = [[0]]

\n

Mode =  [[1]]

\n

Median =  [[2]]

", "showFeedbackIcon": true, "type": "gapfill", "showCorrectAnswer": true, "unitTests": []}, {"variableReplacementStrategy": "originalfirst", "gaps": [{"mustBeReduced": false, "variableReplacementStrategy": "originalfirst", "variableReplacements": [], "allowFractions": false, "marks": 1, "scripts": {}, "mustBeReducedPC": 0, "maxValue": "max(a)-min(a)", "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "minValue": "max(a)-min(a)", "correctAnswerFraction": false, "showFeedbackIcon": true, "type": "numberentry", "notationStyles": ["plain", "en", "si-en"], "showCorrectAnswer": true, "correctAnswerStyle": "plain", "unitTests": []}], "variableReplacements": [], "marks": 0, "sortAnswers": false, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "scripts": {}, "prompt": "

Range=[[0]]

If we added a score of 19 to the set of data at the top, calculate the following:

\n

mean=[[0]]

\n

mode=[[1]]

\n

median=[[2]]

\n

range=[[3]]

", "showFeedbackIcon": true, "type": "gapfill", "showCorrectAnswer": true, "unitTests": []}], "rulesets": {}, "variablesTest": {"maxRuns": "1000", "condition": ""}, "advice": "

#### a)

\n

Organising the data in a frequency table helps to make mistakes less likely when calculating statistics from our data, summarising the responses all in one place with fewer numbers.

\n

Each row of the frequency column gives the number of students with the corresponding number of siblings.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Number of siblingsffxcf
$0$$\\var{freq[0]}$$\\simplify{{freq[0]*scores[0]}}$$\\var{cf[0]} 1$$\\var{freq[1]}$$\\simplify{{freq[1]*scores[1]}}$$\\var{cf[1]}$
$2$$\\var{freq[2]}$$\\simplify{{freq[2]*scores[2]}}$$\\var{cf[2]} 3$$\\var{freq[3]}$$\\simplify{{freq[3]*scores[3]}}$$\\var{cf[3]}$
$4$$\\var{freq[4]}$$\\simplify{{freq[4]*scores[4]}}$$\\var{cf[4]} 5$$\\var{freq[5]}$$\\simplify{{freq[5]*scores[5]}}$$\\var{cf[5]}$
6$$\\var{freq[6]}$$\\simplify{{freq[6]*scores[6]}}$$\\var{cf[6]} Total30$$\\var{fx}30 \n Always remember to check whether your frequency column adds up to the total (here, it is 30) to make sure you have not left out any responses. \n #### b) \n #### Mean \n The mean number of siblings is the total number of siblings, \\sum x, divided by the number of students in the sample, n. \n \\begin{align} \\sum x &= 0 \\times \\var{freq[0]} + 1 \\times \\var{freq[1]} + 2 \\times \\var{freq[2]} + 3 \\times \\var{freq[3]} + 4 \\times \\var{freq[4]} + 5 \\times \\var{freq[5]} + 6 \\times \\var{freq[6]} \\\\ &= 0 + \\var{1*freq[1]} + \\var{2*freq[2]} + \\var{3*freq[3]} + \\var{4*freq[4]} + \\var{5*freq[5]} + \\var{6*freq[6]} \\\\&= \\var{sum(a)} \\text{.} \\end{align} \n The total number of students n is 30. \n Therefore the mean is \n \\begin{align} \\bar{x} &= \\frac{\\sum x}{n} \\\\ &= \\frac{\\var{sum(a)}}{30} \\\\ &= \\var{mean} \\text{.} \\end{align} \n Rounding the answer to 2 decimal places, we get \\var{precround(mean, 2)}. \n #### Mode \n The mode is the value with the highest frequency. Here, the mode is \\var{mode} siblings, with frequency \\var{freq[mode]}. \n #### Median \n The median is the \"middle\" value in the sample, when arranged in numerical order. \n Since n = 30, we have two middle values in this data (15th and 16th place). We can count from the top of the table until we locate rows where these middle values lie, as the numbers in the table are already sorted by order. \n Here, both 15th and 16th value lie in the row \\var{as[14]}.Here, the 15th value lies in the row \\var{as[14]} while the 16th value lies in the row \\var{as[15]}. \n As 15th value = 16th value = \\var{as[14]}, the median is \\var{as[14]}.As 15th value = \\var{as[14]} and 16th value = \\var{as[15]}, we need to find their mean. \n \\\displaystyle \\begin{align} \\frac{\\var{as[14]} + \\var{as[15]}}{2} &= \\frac{\\var{as[14] + as[15]}}{2} \\\\&= \\var{median} \\text{.} \\end{align}\ \n This is the median for this data. \n \n Range \n The range gives us an idea of the spread of the data and is simply the difference between the largest score and the smallest score. It can easily be found from our table by looking for the largest score with a non-zero frequency and subtracting the smallest score with a non-zero frequency. \n \n range = \\var{max(a)}-\\var{min(a)}=\\simplify{{max(a)-min(a)}} \n \n Adding an outlier \n \n An outlier is a score that is \"much\" smaller or \"much\" larger than the majority of the other scores in a data set, exactly what we mean by \"much\" will be looked at in later years. We will now adjust our frequency distribution table to include the extra 19. \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n Number of siblingsffxcf 0\\var{freq[0]}$$\\simplify{{freq[0]*scores[0]}}$$\\var{cf[0]}
$1$$\\var{freq[1]}$$\\simplify{{freq[1]*scores[1]}}$$\\var{cf[1]} 2$$\\var{freq[2]}$$\\simplify{{freq[2]*scores[2]}}$$\\var{cf[2]}$
$3$$\\var{freq[3]}$$\\simplify{{freq[3]*scores[3]}}$$\\var{cf[3]} 4$$\\var{freq[4]}$$\\simplify{{freq[4]*scores[4]}}$$\\var{cf[4]}$
$5$$\\var{freq[5]}$$\\simplify{{freq[5]*scores[5]}}$$\\var{cf[5]} 6$$\\var{freq[6]}$$\\simplify{{freq[6]*scores[6]}}$$\\var{cf[6]}$
$19$$1$$19$$31 Total31$$\\simplify{{fx+19}}$$31$
\n

\n

mean = $\\frac{\\simplify{{fx+19}}}{31}$ = $\\var{precround((fx+19)/31,2)}$

\n

mode = $\\var{mode}$

\n

Since there are now 31 scores the median is the 16th score, so:

\n

median=$\\var{median(a+19)}$

\n

range=$19$

\n

\n

So we can see that of the 3 possible measures of central tendency, the mean is the most sensitive to outliers, that is, it changed the most. This is the reason we have more than one choice to measure central tendency, sometimes one is better than the others. Similarly in future years we will introduce some choices for measures of spread as the range is also highly sensitive to outliers.

"}], "pickingStrategy": "all-ordered"}], "contributors": [{"name": "Christian Lawson-Perfect", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/7/"}, {"name": "Stanislav Duris", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/1590/"}, {"name": "Paul Hancock", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/1738/"}]}