Final exam data (Winter 2022).xlsx

DATA

Sales manager ID #	Sales	Wonder	SCIIT	Experience (yrs)
798	96	27	42	5
178	90	35	46	8
264	113	30	55	8
589	98	26	47	2
392	76	28	45	7
476	117	24	56	11
620	118	35	63	4
653	101	33	50	9
237	95	27	54	4
333	94	38	41	8
497	119	31	62	3
257	120	31	79	1
378	115	32	52	9
260	131	31	62	4
514	99	34	45	3
343	102	25	59	0
213	66	26	40	6
754	129	25	64	11
696	100	25	39	6
132	111	33	52	2
820	128	39	74	5
615	104	28	45	9
676	133	33	61	5
905	125	37	66	8
861	99	23	46	8
944	90	31	46	5
890	122	36	63	9
158	62	32	54	11
468	98	37	46	11
421	100	25	49	3
993	123	30	62	0
640	120	36	57	5
298	83	28	41	2
724	71	24	34	9
388	102	34	54	4
212	89	35	48	8
690	75	31	53	1
304	106	30	54	5
559	80	30	36	0
149	99	25	49	8
290	104	38	56	11
220	105	26	55	4
283	87	24	43	13
686	105	26	50	5
535	90	37	41	5

Final exam (Winter 2022).docx

MGMT 2262

Final exam

Winter 2022

Contents

General Information2Rules2Outside sources3Scenario4What you need to do4Part 1 – Exploratory data analysis5Table 15Part 2 – Training and testing set (sample)5Table 27Part 3a – Simple linear regression9Table 310Part 3b – Choosing between models12Table 412Part 4 – Multiple linear regression13Table 513Submission Guidelines14Breakdown of marks15Notes on plagiarism and cheating (and how to avoid it)17

Two very important notes:

1. This is a statistics course and the goal of this final exam is to demonstrate your understanding of the whole course. When you are reviewing your work, ask yourselves “are we demonstrating our understanding of relevant topics?”

2. Related to 1, though the rubric is in the middle of the document, it is the most important part of the exam as it specifically tells you what you are being grade on. As you complete each step, ensure that you have checked your work against the rubric to make sure you are maximizing your grade. Also it indicates where to put most of your effort (i.e. the portion of the exam that is worth the most should be where you put most of your work).

) for help that involves clarification. For example, if you do not understand what an instruction means, you can ask for clarification.

· Similar to assignments 1 and 3, a has been created. Please check there for questions and answers.

4. You cannot ask your instructor for help doing the exam because it is expected that you know how to do it. For example, if you do not know how to make a histogram, you need to figure it out on your own. Or if you are not sure what model to use in Step 3 of a hypothesis test, you need to figure it out on your own.

· This relates to the majority of Excel issues as well. For example, if you don’t have the Data Analysis Toolpak properly installed prior to the final exam, that suggests you aren’t prepared to write the final and need to figure out the problem yourself. As another example, it is expected that you have actually used Excel to do a similar analysis prior to the final exam. Therefore, if you are having problems with doing the analysis, you need to figure it out on your own.

5. You cannot ask your instructor for feedback.

6. For all parts, you can work as much or as little on it as you want. As long as it is completed by April 20th end of day.

· You have been given over ten days to complete this exam. It is expected that you work on the exam throughout this period. If you choose to wait until late on Wednesday to start the exam and run into problems, then you need to accept the consequences.

· If you studied for the final exam prior to writing it, it will take 3 to 4 hours to complete. But most of you will study as you are writing it (because it is an open book exam). Therefore, plan to spend at least 12 hours working on the exam. Therefore, starting this exam three hours before it is due is like showing up to an exam two hours after it has started.

· This is a final exam. It is worth 25% of your mark. Behave accordingly.

7. This is not a complete list of rules as that is hard to do. Instead, please keep in mind the spirit of the rules which is an open book, individual exam.

for more details.

? Sounds like a super hard area of computer science that is way too hard for a first class. But actually, you’ve already engaged in machine learning! How? you ask. Well, a type of supervised machine learning is linear regression.

The goal of machine learning is to build a model that learns or changes as new information is provided. In regression, the model is built from data and it can be improved upon as new data is provided. For example, if we build a regression model to predict the sales index for sales managers, as we hire new sales managers, we can add their information to the model, re-run the regression analysis, and get an even better prediction model.

Another big part of machine learning is testing the accuracy of our model. We often do this by taking our data set and dividing it into two parts: a training set and a testing set. The training set is used to build the model, which in our case means using the data analysis toolpak to get the regression values. Then we plug the values from the testing set into the model to see how good the model is at making predictions for a different set of data. In short, the training data set is used to build the model (in this case the regression model), while the testing data set is used to test the ability of the model to make predictions. If you are interested in finding out more, check out this (note: this isn’t needed to do this exam but is provided purely for interest).

The common rule for dividing the data is called the 80/20 split. That is, the training set is made up of 80% of the data while the testing set is made up of 20% of the data.

In this first step, divide the data set to make the training and testing set.

· Goal: Divide the data into two random samples. The first sample is called the training set and will contain 80% of the data values. The second sample is called the testing set and will contain 20% of the data values.

· How: Collect a random sample.

· Step 1: Choose a random sampling technique.

· Step 2: Apply the random sampling technique to the data set to randomly select 20% of the sales managers and their associated data. Copy and paste those into the “Testing set” part of the table below.

· Though this is the “second sample”, we are collecting it first for efficiency – it is faster to collect a 20% sample instead of collecting an 80% sample.

· Step 3: Then take the remaining 80% of the sales managers and their associated data, and copy and paste those into the “Training set” part of the table below.

· Step 4: At the top of the table, briefly explain how you collected your sample in the row provided in the table.

). One sentence makes sense to you “Standard deviation measures the dispersion of a dataset relative to its mean.” What is the right way to deal with it, so you are not engaging in plagiarism?

Options	Result
We found the standard deviation of income to be $4000. Standard deviation measures the dispersion of a dataset relative to its mean.	Plagiarism! This is a direct copy and paste without any indication of the source. This is work presented as your own when it is not.
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean.	Plagiarism! Though it is not a direct copy, it is still close to the websites wording and it is still presented as your work when it is not as there is no citation.
We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean (Hargrave & Westfall, 2020).	Not obviously plagiarism but still borderline. A correct in-text citation was used, but the quote was insufficiently paraphrased. Changing one word is not paraphrasing.
We found the standard deviation of income to be $4000. This measure indicates how much the incomes vary from the mean (Hargrave & Westfall, 2020).	Not plagiarism : ) There is a correct APA in-text citation and the sentence was paraphrased.
We found the standard deviation of income to be $4000. “Standard deviation measures the dispersion of a dataset relative to its mean” (Hargrave & Westfall, 2020, para. 2).	Not plagiarism : ) Direct quote is used (and indicated by quotation marks) and the a correct APA in-text citation was used. BUT in this exam, you should avoid using direct quotes and instead she focus on what these definitions mean in the context.

Note: An APA proper reference at the end of the document needs to be included if outside sources are used. For this example, the APA reference would look like:

Hargrave, M. & Westfall, P. (2020, July 21). Standard deviation definition. Investopedia.

Here are some good habits:

· Never copy and paste a sentence straight into your exam document. Instead, immediately paraphrase it and include the reference. A lot of students copy and paste and then forget to change it –it is still plagiarism.

· If you spend any time on a website as you are doing this exam, write down the websites name and URL in a document (use the Outside Sources table for this exam).

Scenario 2: Your friend asks to see your exam because they just want some ideas on what they could do.

There are not good options on this one. Probably a more accurate way to write the scenario is: Your “friend” asks to see your exam because they just want to copy and paste your work.

Do NOT share your work with anyone. If you do share your work and your “friend” copy and pastes it (even if you don’t know), you have committed academic misconduct. Friends don’t ask to borrow your work because they get how unfair it is to put you in that spot. Also simply sharing work between friends even when no copying is done is cheating (but not plagiarism) as the groups involved are getting an unfair advantage.

In previous assignments, you have been allowed to help each other. This is NOT the case for this exam.

School Graders

statsassignmentinstructions.zip

Final exam data (Winter 2022).xlsx

DATA

Final exam (Winter 2022).docx