// Numbas version: exam_results_page_options {"name": "Bivariate categorical data and chi squared test", "extensions": [], "custom_part_types": [], "resources": [], "navigation": {"allowregen": true, "preventleave": false, "showfrontpage": false}, "question_groups": [{"questions": [{"metadata": {"licence": "All rights reserved", "description": "This question provides students with an example that requires them to fill in missing quantities in a two-way frequency table for bivariate categorical data, calculate percentages from that table, and to test for independence between the variables using a chi square test."}, "rulesets": {}, "tags": [], "type": "question", "extensions": [], "statement": "

In this question, you will be completing two way tables and calculating associated probabilities. Then you will calculate the expected frequencies under the assumption of independence between the two variables before finding the chi squared statistic for the original observations in the first table. This will then be used to test the assumption that the two variables presented in the data table are independent.

\n

\n

As you progress through the question, you might like to check your working after each stage, as the subsequent stages rely on correct answers to previous stages.

Calculate and fill in the missing quantities in the two-way table.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
 Preference A B TOTAL Group C [[0]] [[1]] {A11+a12} D {A21} [[2]] {A21+a22} TOTAL [[3]] {A12+a22} [[4]]
\n

\n

Now complete the row percentage version of the two-way table. Round your answers to 1 decimal place.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
 Preference A B TOTAL Group C [[0]] [[1]] 100% D {precround(A21/(a21+a22)*100,1)} [[2]] 100% TOTAL [[3]] {precround((A12+a22)/(a11+a21+a22+a12)*100,1)} 100%
\n

\n

Under the assumption of independence, that is that there is no relationship between group membership and preference, what would the expected frequencies be in the two-way table? Calculate them and put your answers in the table below.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
 Preference A B TOTAL Group C [[0]] [[1]] {a11+a12} D [[2]] [[3]] {a21+a22} TOTAL {a11+a21} {a12+a22} {total}
\n

\n

", "unitTests": [], "extendBaseMarkingAlgorithm": true, "variableReplacementStrategy": "originalfirst", "variableReplacements": [], "marks": 0, "sortAnswers": false, "scripts": {}, "customName": "Expected frequencies", "useCustomName": true}, {"customMarkingAlgorithm": "", "correctAnswerStyle": "plain", "showCorrectAnswer": true, "type": "numberentry", "strictPrecision": false, "unitTests": [], "precisionPartialCredit": 0, "precisionType": "dp", "notationStyles": ["plain", "en", "si-en"], "extendBaseMarkingAlgorithm": true, "showPrecisionHint": false, "variableReplacements": [], "allowFractions": false, "mustBeReduced": false, "maxValue": "chisq", "precisionMessage": "You have not given your answer to the correct precision.", "customName": "chi-squared statistic", "minValue": "chisq", "precision": "2", "mustBeReducedPC": 0, "variableReplacementStrategy": "originalfirst", "marks": 1, "correctAnswerFraction": false, "scripts": {}, "showFeedbackIcon": true, "prompt": "

Now that you have the expected and observed frequencies, calculate the chi-squared statistic and enter the value (round to two decimal places) below.

\n

\n

", "useCustomName": true}, {"customMarkingAlgorithm": "", "shuffleChoices": false, "showCorrectAnswer": true, "type": "1_n_2", "prompt": "

Based on what you have calculated in the parts so far, use the following chi square table to make a conclusion about the independence of the two variables represented in the data.

\n

The chi square statistic calculated in the previous part of the question lies between

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
 Degrees of freedom Probability of a larger chi squared value 0.25 0.1 0.05 0.01 1 1.32 2.71 3.84 6.63 2 2.77 4.61 5.99 9.21 3 4.11 6.25 7.81 11.34 4 5.39 7.78 9.49 13.28
", "unitTests": [], "extendBaseMarkingAlgorithm": true, "displayColumns": 0, "variableReplacements": [], "choices": ["greater than or equal to 0.25", "less than 0.25 but greater than or equal to 0.1", "less than 0.1 but greater than or equal to 0.05", "less than 0.05 but greater than or equal to 0.01", "less than 0.01"], "showCellAnswerState": true, "scripts": {}, "minMarks": 0, "showFeedbackIcon": true, "distractors": ["", "", "", "", ""], "displayType": "radiogroup", "variableReplacementStrategy": "originalfirst", "matrix": ["if(chisq<=1.32,1,0)", "if(chisq>1.32,if(chisq<=2.71,1,0),0)", "if(chisq>2.71,if(chisq<=3.84,1,0),0)", "if(chisq>3.84,if(chisq<=6.63,1,0),0)", "if(chisq>6.63,1,0)"], "marks": 0, "maxMarks": 0, "customName": "How likely is that?", "useCustomName": true}, {"customMarkingAlgorithm": "", "shuffleChoices": false, "showCorrectAnswer": true, "type": "1_n_2", "prompt": "

Given the calculated chi squared statistic, and how likely it is to obtain this under the assumption that the variables are independent, which of the followin is the correct conclusion? Assume we require a confidence level of 0.05.

", "unitTests": [], "extendBaseMarkingAlgorithm": true, "displayColumns": 0, "variableReplacements": [], "choices": ["The assumption of independence should be rejected because it is so unlikely that this data would be observed if the variables were truly independent.", "The assumption of independence should be maintained as an assumption as it's not that unlikely to observe this data, given the variables are independent."], "showCellAnswerState": true, "scripts": {}, "minMarks": 0, "showFeedbackIcon": true, "distractors": ["", ""], "displayType": "radiogroup", "variableReplacementStrategy": "originalfirst", "matrix": ["if(chisq>3.84,1,0)", "if(chisq<=3.84,1,0)"], "marks": 0, "maxMarks": 0, "customName": "Conclusion", "useCustomName": true}], "name": "Bivariate categorical data and chi squared test", "variablesTest": {"condition": "", "maxRuns": 100}, "variable_groups": [], "functions": {}, "ungrouped_variables": ["Total", "A11", "A12", "A21", "A22", "e11", "e12", "e21", "e22", "chisq"], "contributors": [{"name": "Dann Mallet", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/800/"}], "variables": {"A11": {"name": "A11", "definition": "random(30 .. 120#1)", "group": "Ungrouped variables", "templateType": "randrange", "description": ""}, "e22": {"name": "e22", "definition": "(a21+a22)*(a12+a22)/total", "group": "Ungrouped variables", "templateType": "anything", "description": ""}, "Total": {"name": "Total", "definition": "a11+a12+a21+a22", "group": "Ungrouped variables", "templateType": "anything", "description": "

The total of the sample for the 2 way table.

"}, "A22": {"name": "A22", "definition": "random(40 .. 160#1)", "group": "Ungrouped variables", "templateType": "randrange", "description": ""}, "e12": {"name": "e12", "definition": "(a11+a12)*(a12+a22)/total", "group": "Ungrouped variables", "templateType": "anything", "description": ""}, "chisq": {"name": "chisq", "definition": "(a11-e11)^2/e11+(a12-e12)^2/e12+(a21-e21)^2/e21+(a22-e22)^2/e22", "group": "Ungrouped variables", "templateType": "anything", "description": ""}, "A21": {"name": "A21", "definition": "random(40 .. 130#1)", "group": "Ungrouped variables", "templateType": "randrange", "description": ""}, "e11": {"name": "e11", "definition": "(a11+a12)*(a11+a21)/total", "group": "Ungrouped variables", "templateType": "anything", "description": ""}, "A12": {"name": "A12", "definition": "random(30 .. 120#1)", "group": "Ungrouped variables", "templateType": "randrange", "description": ""}, "e21": {"name": "e21", "definition": "(a11+a21)*(a21+a22)/total", "group": "Ungrouped variables", "templateType": "anything", "description": ""}}, "preamble": {"css": "", "js": ""}}], "pickingStrategy": "all-ordered"}], "contributors": [{"name": "Dann Mallet", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/800/"}]}