// Numbas version: exam_results_page_options {"name": "Quadratic regression", "extensions": ["stats", "jsxgraph"], "custom_part_types": [], "resources": [], "navigation": {"allowregen": true, "showfrontpage": false, "preventleave": false, "typeendtoleave": false}, "question_groups": [{"pickingStrategy": "all-ordered", "questions": [{"functions": {"regfun": {"definition": "\nvar div = Numbas.extensions.jsxgraph.makeBoard('400px','400px',\n{boundingBox:[-15,maxy,maxx,-400],\n axis:false,\n showNavigation:false,\n grid:false,\n });\n\n var board = div.board; \nvar xaxis = board.create('line',[[0,0],[1,0]], { strokeColor: 'black', fixed: true,name:\"Advertising spend\",withLabel:false});\nvar xticks = board.create('ticks',[xaxis,10],{\n drawLabels: true,\n label: {offset: [-4, -10]},\n minorTicks: 0\n});\n\n// create the y-axis\nvar yaxis = board.create('line',[[0,0],[0,1]], { strokeColor: 'black', fixed: true,name:\"Revenue\",withLabel:false });\nvar yticks = board.create('ticks',[yaxis,500],{\ndrawLabels: true,\nlabel: {offset: [-20, 0]},\nminorTicks: 0\n});\nfor (j=0;ja) We have  $\\displaystyle a = \\frac{SPXY}{SSX}$ where:

\n

$\\displaystyle SPXY=\\sum xy - \\frac{(\\sum x)\\times (\\sum y)}{\\var{n}}=\\var{sxy}-\\frac{\\var{t[0]}\\times \\var{t[1]}}{\\var{n}}=\\var{spxy}$

\n

$\\displaystyle SSX=\\sum x^2 - \\frac{(\\sum x)^2}{\\var{n}}=\\var{ssq[0]}- \\frac{\\var{t[0]}^2}{\\var{n}}=\\var{ss[0]}$

\n

Hence $\\displaystyle a=\\frac{\\var{spxy}}{\\var{ss[0]}}=\\var{spxy/ss[0]}$.

\n

\n

Then $\\displaystyle b = \\frac{1}{\\var{n}}\\left[\\sum y-a \\sum x\\right]=\\frac{1}{\\var{n}}\\left[\\var{t[1]}-\\var{atrue}\\times\\var{t[0]}\\right]=\\var{btrue}$

\n

", "rulesets": {"std": ["all", "fractionNumbers", "!collectNumbers", "!noLeadingMinus"]}, "parts": [{"stepsPenalty": 0, "prompt": "

{regfun(r1,r2,max(r1)+30,max(r2)+30,rsquared,sumr,n,1)}

\n

First  we perform a linear regression $y=ax+b$ to find the equation of the line fitted as shown in the above diagram. 

\n

You have to calculate the coefficients $a,\\;b$

\n

If you input your line (before submitting) it will be drawn on the above graph to see how it compares with the fitted line. 

\n

Note carefully that the line drawn only gives a rough guide to the fitted line and you should always calculate $a$ and $b$ to ensure you obtain the required result.

\n

Input your line in the form $y=ax+b$ for values of $a$ and $b$ accurate to 3 decimal places,

\n

$y=\\;$[[0]].    

\n

Click on Show steps if you want more information on calculating $a$ and $b$. You will not lose any marks by doing so.

\n

\n

 

", "marks": 0, "gaps": [{"expectedvariablenames": [], "checkingaccuracy": 0.001, "vsetrange": [0, 1], "showpreview": true, "vsetrangepoints": 5, "showCorrectAnswer": true, "answersimplification": "all,!noLeadingMinus", "scripts": {}, "answer": "{a}*x+{b}", "marks": 1, "checkvariablenames": false, "checkingtype": "absdiff", "type": "jme"}], "showCorrectAnswer": true, "scripts": {}, "steps": [{"type": "information", "showCorrectAnswer": true, "scripts": {}, "prompt": "

To find $a$ and $b$ you first find  $\\displaystyle a = \\frac{SPXY}{SSX}$ where:

\n

$\\displaystyle SPXY=\\sum xy - \\frac{(\\sum x)\\times (\\sum y)}{\\var{n}}$

\n

$\\displaystyle SSX=\\sum x^2 - \\frac{(\\sum x)^2}{\\var{n}}$

\n

Then $\\displaystyle b = \\frac{1}{\\var{n}}\\left[\\sum y-a \\sum x\\right]$

\n

Note that it is advisable to do all your calculations to at least 5 decimal places in order to ensure that $a$ and $b$ are accurate to 3 decimal places at the end of the calulations. This means that when you calculate $b$ using $a$ then $a$ should be accurate to at least 5 decimal places.

\n

\n

 

", "marks": 0}], "type": "gapfill"}, {"type": "information", "showCorrectAnswer": true, "scripts": {}, "prompt": "

Next we fit the data by using quadratic regression, that is we look for a function of the form $y=ax^2+bx+c$ for constants $a,\\;b\\;c$ which best fit the data.

\n

You are not expected to find this quadratic and we display it below.

\n

Note the SSE is displayed, which is a measure of how well the quadratic regression fits the data. Compare this with the SSE for the linear regression, also displayed.

\n

{regfun(r1,r2,max(r1)+30,max(r2)+30,rsquared,sumr,n,2)}

\n

You can increase the degree of the polynomial fitting the data by moving the slider and the measure of how well it fits is given by the updated SSE. But note that it doesn't make much sense in doing this if there are only marginal gains in fitting the polynomial as measured by the SSE. The rule is that you should always go for the simplest model!

", "marks": 0}], "statement": "

It is believed that there is a quadratic relationship between the amount a company spends on advertising ( $X$, in thousands of pounds) and their revenue ( $Y$, also in thousands of pounds).

\n

To investigate, $X$ and $Y$ were collected for a randomly selected number $\\var{n}$ of companies in the U.K. 

\n

\n

\n

\n

\n

\n

\n

\n

Your first task is to find the equation of the linear regression fitted as shown as the black line. You are given the information below in order to complete the task.

\n\n\n\n\n\n\n\n\n\n\n\n
$X$$\\sum x=\\;\\var{t[0]}$$\\sum x^2=\\;\\var{ssq[0]}$
$Y$$\\sum y=\\;\\var{t[1]}$$\\sum y^2=\\;\\var{ssq[1]}$
\n

Also you are given $\\sum xy = \\var{sxy}$.

\n

\n

", "variable_groups": [], "progress": "ready", "preamble": {"css": "", "js": ""}, "variables": {"ch": {"definition": "random(0..n-1)", "templateType": "anything", "group": "Ungrouped variables", "name": "ch", "description": ""}, "atrue": {"definition": "spxy/ss[0]", "templateType": "anything", "group": "Ungrouped variables", "name": "atrue", "description": ""}, "b1": {"definition": "random(2..5#0.1)", "templateType": "anything", "group": "Ungrouped variables", "name": "b1", "description": ""}, "sxy": {"definition": "sum(map(r1[x]*r2[x],x,0..n-1))", "templateType": "anything", "group": "Ungrouped variables", "name": "sxy", "description": ""}, "res": {"definition": "map(precround(r2[x]-(b+a*r1[x]),2),x,0..n-1)", "templateType": "anything", "group": "Ungrouped variables", "name": "res", "description": ""}, "spxy": {"definition": "sxy-t[0]*t[1]/n", "templateType": "anything", "group": "Ungrouped variables", "name": "spxy", "description": ""}, "ls": {"definition": "precround(b+a*sc,2)", "templateType": "anything", "group": "Ungrouped variables", "name": "ls", "description": ""}, "tol": {"definition": 0.001, "templateType": "anything", "group": "Ungrouped variables", "name": "tol", "description": ""}, "a": {"definition": "precround(atrue,3)", "templateType": "anything", "group": "Ungrouped variables", "name": "a", "description": ""}, "btrue": {"definition": "1/n*(t[1]-spxy/ss[0]*t[0])", "templateType": "anything", "group": "Ungrouped variables", "name": "btrue", "description": ""}, "ssq": {"definition": "[sum(map(x^2,x,r1)),sum(map(x^2,x,r2))]", "templateType": "anything", "group": "Ungrouped variables", "name": "ssq", "description": ""}, "sumr": {"definition": "round(sum(map(res[x]^2,x,0..n-1)))", "templateType": "anything", "group": "Ungrouped variables", "name": "sumr", "description": ""}, "a1": {"definition": "random(2..5#0.5)", "templateType": "anything", "group": "Ungrouped variables", "name": "a1", "description": ""}, "c1": {"definition": "random(0.1..1#0.1)", "templateType": "anything", "group": "Ungrouped variables", "name": "c1", "description": ""}, "tsqovern": {"definition": "[t[0]^2/n,t[1]^2/n]", "templateType": "anything", "group": "Ungrouped variables", "name": "tsqovern", "description": ""}, "b": {"definition": "precround(btrue,3)", "templateType": "anything", "group": "Ungrouped variables", "name": "b", "description": ""}, "obj": {"definition": "['A','B','C','D','E','F','G','H']", "templateType": "anything", "group": "Ungrouped variables", "name": "obj", "description": ""}, "r1": {"definition": "repeat(round(normalsample(50,10)),n)", "templateType": "anything", "group": "Ungrouped variables", "name": "r1", "description": ""}, "r2": {"definition": "map(round((a1+b1*x+c1*x^2+normalsample(0,20))),x,r1)", "templateType": "anything", "group": "Ungrouped variables", "name": "r2", "description": ""}, "ss": {"definition": "[ssq[0]-t[0]^2/n,ssq[1]-t[1]^2/n]", "templateType": "anything", "group": "Ungrouped variables", "name": "ss", "description": ""}, "n": {"definition": "random(20..35)", "templateType": "anything", "group": "Ungrouped variables", "name": "n", "description": ""}, "t": {"definition": "[sum(r1),sum(r2)]", "templateType": "anything", "group": "Ungrouped variables", "name": "t", "description": ""}, "sc": {"definition": "r1[ch]", "templateType": "anything", "group": "Ungrouped variables", "name": "sc", "description": ""}, "rsquared": {"definition": "precround(spxy^2/(ss[0]*ss[1]),3)", "templateType": "anything", "group": "Ungrouped variables", "name": "rsquared", "description": ""}}, "metadata": {"notes": "

10/02/2014:

\n

Created. Based on the Numbas question JSXGraph line of best fit plot 5

\n

14/02/2014:

\n

Improved display and modified so that $20 \\le n \\le 35$. Also sharpened up the error term to be N(0,20) rather than N(0,100) (see variable r2)  so that the quadratic regression would clearly be best.

", "description": "

The data is fitted by linear and quadratic regression.  First, find a linear regression equation for the $n$ data points, $20 \\le n \\le 35$.

\n

They then are shown that the quadratic regression is often a better fit as measured by SSE. Also users can experiment with fitting polynomials of higher degree.

\n

", "licence": "Creative Commons Attribution 4.0 International"}, "type": "question", "showQuestionGroupNames": false, "question_groups": [{"name": "", "pickingStrategy": "all-ordered", "pickQuestions": 0, "questions": []}], "contributors": [{"name": "Bill Foster", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/6/"}]}]}], "contributors": [{"name": "Bill Foster", "profile_url": "https://numbas.mathcentre.ac.uk/accounts/profile/6/"}]}