// Numbas version: exam_results_page_options {"name": "Adaptive marking: Independent two sample t-test", "extensions": ["stats"], "custom_part_types": [], "resources": [], "navigation": {"allowregen": true, "showfrontpage": false, "preventleave": false, "typeendtoleave": false}, "question_groups": [{"pickingStrategy": "all-ordered", "questions": [{"name": "Adaptive marking: Independent two sample t-test", "tags": ["average", "data analysis", "differences", "elementary statistics", "hypothesis testing", "mean", "standard deviation", "statistics", "stats", "t-test", "two sample t-test", "variance"], "metadata": {"description": "

Two sample t-test to see if there is a difference between scores on questions between two groups when the questions are asked in a different order.

", "licence": "Creative Commons Attribution 4.0 International"}, "statement": "

An educational psychologist claimed that the order in which questions were asked affected the student’s ability to answer them correctly and hence their total score. In order to test this, $20$ students were randomly divided into two groups of $10$. The first group were given questions in increasing order of difficulty and the second group in decreasing order of difficulty. The ordered test scores obtained were:

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Group 1{r1[0]}{r1[1]}{r1[2]}{r1[3]}{r1[4]}{r1[5]}{r1[6]}{r1[7]}{r1[8]}{r1[9]}
Group 2{r2[0]}{r2[1]}{r2[2]}{r2[3]}{r2[4]}{r2[5]}{r2[6]}{r2[7]}{r2[8]}{r2[9]}
\n

Carry out a two-sample t-test to decide if there is evidence of a difference in the average test scores for the two sets of students.

", "advice": "

We test the following hypothesis,

\n

$H_0:\\; \\mu_1=\\mu_2$ versus $H_1:\\; \\mu_1 \\neq \\mu_2$

\n

We find that the mean score of Group 1 is $\\overline{x}_1=\\var{mean1}$ with standard deviation $s_1=\\var{sd1}$ and the mean score of Group 2 is $\\overline{x}_2=\\var{mean2}$ with standard deviation $s_2=\\var{sd2}$.

\n

(All calculated to 3 decimal places.)

\n

Using the formula for the two-sample $t$-statistic as  shown above with $n_1=n_2=10$:

\n

The estimate of the pooled variance is calculated to be:

\n

\\[s^2=\\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}= \\frac{\\var{n1-1}\\times \\var{sd1}^2+\\var{n2-1}\\times \\var{sd2}^2}{\\var{n1+n2-2}}=\\var{s^2}.\\] 

\n

Hence $s = \\sqrt{\\var{s^2}}=\\var{s}$ to 3 decimal places.

\n

We find that the t-statistic has value:

\n

\\begin{align}
T &= \\frac{(\\overline{x}_1-\\overline{x}_2)-(\\mu_1-\\mu_2)}{s\\sqrt{\\frac{1}{n_1}+\\frac{1}{n_2}}} \\\\
&= \\frac{(\\var{mean1}-\\var{mean2})-(0)}{\\var{s}\\sqrt{\\frac{1}{\\var{n1}}+\\frac{1}{\\var{n2}}}} \\\\
&= \\var{t_statistic}
\\end{align}

\n

Our test statistic is $|T|=\\var{abs(t_statistic)}$.

\n

Given that we have $n_1+n_2-2=18$ degrees of freedom, we look up this value on the T-distribution table for $t_{18}$

\n

\\[\\begin{array}{r|rrrrr}&0.10&0.05&0.01&0.001\\\\\\hline18&1.734&2.101&2.878&3.922\\end{array}\\]

\n

We see that the t-statistic {t_statistic_range} and the table tells us that the $p$ value {p_value_range}.

\n

Hence we conclude that we {reject} the null hypothesis. There is {evidence_strength} evidence of a difference between the average scores of the two groups.

", "rulesets": {"std": ["all", "fractionNumbers", "!collectNumbers", "!noLeadingMinus"]}, "extensions": ["stats"], "builtin_constants": {"e": true, "pi,\u03c0": true, "i": true}, "constants": [], "variables": {"p_value_range": {"name": "p_value_range", "group": "Advice messages", "definition": "['is less than $0.001$','lies between $0.001$ and $0.01$','lies between $0.01$ and $0.05$','lies between $0.05$ and $0.10$','is greater than $0.10$'][scenario]", "description": "

Describe where the p-value lies in relation to the critical values

", "templateType": "anything", "can_override": false}, "sigma2": {"name": "sigma2", "group": "Setup", "definition": "random(8..10#0.2)", "description": "

Population standard deviation of sample 2

", "templateType": "anything", "can_override": false}, "t99": {"name": "t99", "group": "Critical t-values", "definition": "2.878", "description": "", "templateType": "anything", "can_override": false}, "n2": {"name": "n2", "group": "Setup", "definition": "10", "description": "

Size of sample 2

", "templateType": "anything", "can_override": false}, "t999": {"name": "t999", "group": "Critical t-values", "definition": "3.922", "description": "", "templateType": "anything", "can_override": false}, "t_statistic_range": {"name": "t_statistic_range", "group": "Advice messages", "definition": "['is greater than $\\\\var{t999}$','lies between $\\\\var{t99}$ and $\\\\var{t999}$','lies between $\\\\var{t95}$ and $\\\\var{t99}$','lies between $\\\\var{t90}$ and $\\\\var{t95}$','is less than $\\\\var{t90}$'][scenario]", "description": "

Describe where the t-statistic lies in relation to the critical values

", "templateType": "anything", "can_override": false}, "mu2": {"name": "mu2", "group": "Setup", "definition": "random(65..75#0.5)", "description": "

Population mean of sample 2

", "templateType": "anything", "can_override": false}, "evidence_strength": {"name": "evidence_strength", "group": "Advice messages", "definition": "['very strong','strong','slight','no','no'][scenario]", "description": "

How much evidence is there against the null hypothesis?

", "templateType": "anything", "can_override": false}, "n1": {"name": "n1", "group": "Setup", "definition": "10", "description": "

Size of sample 1

", "templateType": "anything", "can_override": false}, "mu1": {"name": "mu1", "group": "Setup", "definition": "random(55..75#0.5)", "description": "

Population mean of sample 1 (we'll generate samples from different distributions to produce different outcomes)

", "templateType": "anything", "can_override": false}, "reject": {"name": "reject", "group": "Advice messages", "definition": "if(scenario<2,'do reject','do not reject')", "description": "

Do we reject the null hypothesis?

", "templateType": "anything", "can_override": false}, "p_value": {"name": "p_value", "group": "Stats", "definition": "ttest(abs(t_statistic),19,2)", "description": "

p-value corresponding to the t-statistic

", "templateType": "anything", "can_override": false}, "t_statistic": {"name": "t_statistic", "group": "Stats", "definition": "(mean1-mean2)*sqrt(n1*n2)/(s*sqrt(n1+n2))", "description": "", "templateType": "anything", "can_override": false}, "t90": {"name": "t90", "group": "Critical t-values", "definition": "1.734", "description": "", "templateType": "anything", "can_override": false}, "decision_marking_matrix": {"name": "decision_marking_matrix", "group": "Advice messages", "definition": "[\n [1,0,0,0,0],\n [0,1,0,0,0],\n [0,0,1,0,0],\n [0,0,0,1,0],\n [0,0,0,0,1]\n][scenario]", "description": "

Marking matrix for the multiple choice questions

", "templateType": "anything", "can_override": false}, "sd2": {"name": "sd2", "group": "Stats", "definition": "precround(pstdev(r2),3)", "description": "

Sample standard deviation of sample 2

", "templateType": "anything", "can_override": false}, "t95": {"name": "t95", "group": "Critical t-values", "definition": "2.101", "description": "", "templateType": "anything", "can_override": false}, "r1": {"name": "r1", "group": "Samples", "definition": "repeat(round(normalsample(mu1,sigma1)),n1)", "description": "

Sample 1

", "templateType": "anything", "can_override": false}, "mean1": {"name": "mean1", "group": "Stats", "definition": "mean(r1)", "description": "

Sample mean of sample 1

", "templateType": "anything", "can_override": false}, "mean2": {"name": "mean2", "group": "Stats", "definition": "mean(r2)", "description": "

Sample mean of sample 1

", "templateType": "anything", "can_override": false}, "r2": {"name": "r2", "group": "Samples", "definition": "repeat(round(normalsample(mu2,sigma2)),n2)", "description": "

Sample 2

", "templateType": "anything", "can_override": false}, "scenario": {"name": "scenario", "group": "Advice messages", "definition": "sum(map(award(1,abs(t_statistic)Which scenario are we in - how many critical values of the t distribution does t_statistic exceed?

", "templateType": "anything", "can_override": false}, "sigma1": {"name": "sigma1", "group": "Setup", "definition": "random(8..10#0.2)", "description": "

Population standard deviation of sample 1

", "templateType": "anything", "can_override": false}, "sd1": {"name": "sd1", "group": "Stats", "definition": "precround(pstdev(r1),3)", "description": "

Sample standard deviation of sample 1

", "templateType": "anything", "can_override": false}, "s": {"name": "s", "group": "Stats", "definition": "precround(sqrt(((n1-1)*sd1^2+(n2-1)*sd2^2)/(n1+n2-2)),3)", "description": "

Used in the formula for the t statistic

", "templateType": "anything", "can_override": false}}, "variablesTest": {"condition": "", "maxRuns": 100}, "ungrouped_variables": [], "variable_groups": [{"name": "Setup", "variables": ["n1", "n2", "mu1", "sigma1", "mu2", "sigma2"]}, {"name": "Samples", "variables": ["r1", "r2"]}, {"name": "Stats", "variables": ["mean1", "sd1", "mean2", "sd2", "s", "t_statistic", "p_value"]}, {"name": "Advice messages", "variables": ["scenario", "decision_marking_matrix", "reject", "evidence_strength", "t_statistic_range", "p_value_range"]}, {"name": "Critical t-values", "variables": ["t90", "t95", "t99", "t999"]}], "functions": {"pstdev": {"parameters": [["l", "list"]], "type": "number", "language": "jme", "definition": "sqrt(len(l)/(len(l)-1))*stdev(l)"}}, "preamble": {"js": "", "css": ""}, "parts": [{"type": "gapfill", "useCustomName": false, "customName": "", "marks": 0, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "prompt": "

Find the mean and standard deviations of the scores of the two groups. Round your answers to 3 decimal places.

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
MeanStandard deviation
Group 1[[0]][[1]]
Group 2[[2]][[3]]
\n

Now find the two sample t-test statistic $T$ using the values you have just calculated and enter it here: [[4]]

", "stepsPenalty": 0, "steps": [{"type": "information", "useCustomName": false, "customName": "", "marks": 0, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "prompt": "

The two-sample t-statistic for two independent sets of data where one set has $n_1$ data points and the other set $n_2$ data points is calculated as follows:

\n

\\[T = \\frac{(\\overline{x}_1-\\overline{x}_2)-(\\mu_1-\\mu_2)}{s\\times\\sqrt{\\frac{1}{n_1}+\\frac{1}{n_2}}}\\;\\;\\;\\]

\n

where $\\overline{x}_1,\\;\\overline{x}_2$ are the sample means and 

\n

\\[s^2=\\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}\\]

\n

where $s_1,\\;s_2$ are the sample standard deviations.

\n

Use the values you calculated to 3 decimal places in order to find $T$.

"}], "gaps": [{"type": "numberentry", "useCustomName": false, "customName": "", "marks": 0.5, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "minValue": "mean1", "maxValue": "mean1", "correctAnswerFraction": false, "allowFractions": false, "mustBeReduced": false, "mustBeReducedPC": 0, "displayAnswer": "", "precisionType": "dp", "precision": "3", "precisionPartialCredit": 0, "precisionMessage": "You have not given your answer to the correct precision.", "strictPrecision": false, "showPrecisionHint": false, "notationStyles": ["plain", "en", "si-en"], "correctAnswerStyle": "plain"}, {"type": "numberentry", "useCustomName": false, "customName": "", "marks": 0.5, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "minValue": "sd1", "maxValue": "sd1", "correctAnswerFraction": false, "allowFractions": false, "mustBeReduced": false, "mustBeReducedPC": 0, "displayAnswer": "", "precisionType": "dp", "precision": "3", "precisionPartialCredit": 0, "precisionMessage": "You have not given your answer to the correct precision.", "strictPrecision": false, "showPrecisionHint": false, "notationStyles": ["plain", "en", "si-en"], "correctAnswerStyle": "plain"}, {"type": "numberentry", "useCustomName": false, "customName": "", "marks": 0.5, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "minValue": "mean2", "maxValue": "mean2", "correctAnswerFraction": false, "allowFractions": false, "mustBeReduced": false, "mustBeReducedPC": 0, "displayAnswer": "", "precisionType": "dp", "precision": "3", "precisionPartialCredit": 0, "precisionMessage": "You have not given your answer to the correct precision.", "strictPrecision": false, "showPrecisionHint": false, "notationStyles": ["plain", "en", "si-en"], "correctAnswerStyle": "plain"}, {"type": "numberentry", "useCustomName": false, "customName": "", "marks": 0.5, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "minValue": "sd2", "maxValue": "sd2", "correctAnswerFraction": false, "allowFractions": false, "mustBeReduced": false, "mustBeReducedPC": 0, "displayAnswer": "", "precisionType": "dp", "precision": "3", "precisionPartialCredit": 0, "precisionMessage": "You have not given your answer to the correct precision.", "strictPrecision": false, "showPrecisionHint": false, "notationStyles": ["plain", "en", "si-en"], "correctAnswerStyle": "plain"}, {"type": "numberentry", "useCustomName": false, "customName": "", "marks": 1, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [{"variable": "mean1", "part": "p0g0", "must_go_first": false}, {"variable": "sd1", "part": "p0g1", "must_go_first": false}, {"variable": "mean2", "part": "p0g2", "must_go_first": false}, {"variable": "sd2", "part": "p0g3", "must_go_first": false}], "variableReplacementStrategy": "originalfirst", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "minValue": "t_statistic", "maxValue": "t_statistic", "correctAnswerFraction": false, "allowFractions": false, "mustBeReduced": false, "mustBeReducedPC": 0, "displayAnswer": "", "precisionType": "dp", "precision": "3", "precisionPartialCredit": 0, "precisionMessage": "

You have not given your answer to the correct precision.

", "strictPrecision": false, "showPrecisionHint": false, "notationStyles": ["plain", "en", "si-en"], "correctAnswerStyle": "plain"}], "sortAnswers": false}, {"type": "1_n_2", "useCustomName": false, "customName": "", "marks": 0, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [{"variable": "t_statistic", "part": "p0g4", "must_go_first": false}], "variableReplacementStrategy": "alwaysreplace", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "prompt": "

Given the value $|T|$ of the t-statistic you have found, choose the range for the $p$ value by looking up the t tables:

", "minMarks": 0, "maxMarks": 0, "shuffleChoices": false, "displayType": "radiogroup", "displayColumns": "1", "showCellAnswerState": true, "choices": ["

$p$ is less than $0.1\\%$

", "

$p$ lies between $0.1\\%$ and $1\\%$

", "

$p$ lies between $1 \\%$ and $5\\%$

", "

$p$ lies between $5 \\%$ and $10\\%$

", "

$p$ is greater than $10\\%$

"], "matrix": "decision_marking_matrix"}, {"type": "1_n_2", "useCustomName": false, "customName": "", "marks": 0, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [{"variable": "scenario", "part": "p1", "must_go_first": false}], "variableReplacementStrategy": "alwaysreplace", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "prompt": "

Given the $p$-value and the range you have found, what is the strength of evidence against the null hypothesis that there is no difference in the average times for the left and right hands?

", "minMarks": 0, "maxMarks": 0, "shuffleChoices": false, "displayType": "radiogroup", "displayColumns": 0, "showCellAnswerState": true, "choices": ["

Very Strong Evidence

", "

Strong Evidence

", "

Evidence

", "

Weak Evidence

", "

No Evidence

"], "matrix": "decision_marking_matrix"}, {"type": "1_n_2", "useCustomName": false, "customName": "", "marks": 0, "scripts": {}, "customMarkingAlgorithm": "", "extendBaseMarkingAlgorithm": true, "unitTests": [], "showCorrectAnswer": true, "showFeedbackIcon": true, "variableReplacements": [{"variable": "scenario", "part": "p2", "must_go_first": false}], "variableReplacementStrategy": "alwaysreplace", "nextParts": [], "suggestGoingBack": false, "adaptiveMarkingPenalty": 0, "exploreObjective": null, "prompt": "

What do you decide based on the above analysis?

", "minMarks": 0, "maxMarks": 0, "shuffleChoices": false, "displayType": "radiogroup", "displayColumns": "1", "showCellAnswerState": true, "choices": ["

We reject the null hypothesis at the $0.1\\%$ level

", "

We reject the null hypothesis at the $1\\%$ level.

", "

We reject the null hypothesis at the $5\\%$ level.

", "

We do not reject the null hypothesis but consider further investigation.

", "

We do not reject the null hypothesis.

"], "matrix": "decision_marking_matrix"}], "partsMode": "all", "maxMarks": 0, "objectives": [], "penalties": [], "objectiveVisibility": "always", "penaltyVisibility": "always", "contributors": [{"name": "Christian Lawson-Perfect", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/7/"}]}]}], "contributors": [{"name": "Christian Lawson-Perfect", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/7/"}]}