statsassignmentinstructions.zip

Final exam data (Winter 2022).xlsx

DATA

Sales manager ID # Sales Wonder SCIIT Experience (yrs)
798 96 27 42 5
178 90 35 46 8
264 113 30 55 8
589 98 26 47 2
392 76 28 45 7
476 117 24 56 11
620 118 35 63 4
653 101 33 50 9
237 95 27 54 4
333 94 38 41 8
497 119 31 62 3
257 120 31 79 1
378 115 32 52 9
260 131 31 62 4
514 99 34 45 3
343 102 25 59 0
213 66 26 40 6
754 129 25 64 11
696 100 25 39 6
132 111 33 52 2
820 128 39 74 5
615 104 28 45 9
676 133 33 61 5
905 125 37 66 8
861 99 23 46 8
944 90 31 46 5
890 122 36 63 9
158 62 32 54 11
468 98 37 46 11
421 100 25 49 3
993 123 30 62 0
640 120 36 57 5
298 83 28 41 2
724 71 24 34 9
388 102 34 54 4
212 89 35 48 8
690 75 31 53 1
304 106 30 54 5
559 80 30 36 0
149 99 25 49 8
290 104 38 56 11
220 105 26 55 4
283 87 24 43 13
686 105 26 50 5
535 90 37 41 5

Final exam (Winter 2022).docx

MGMT 2262

Final exam

Winter 2022

Contents

General Information2Rules2Outside sources3Scenario4What you need to do4Part 1 – Exploratory data analysis5Table 15Part 2 – Training and testing set (sample)5Table 27Part 3a – Simple linear regression9Table 310Part 3b – Choosing between models12Table 412Part 4 – Multiple linear regression13Table 513Submission Guidelines14Breakdown of marks15Notes on plagiarism and cheating (and how to avoid it)17

Two very important notes: 

1. This is a statistics course and the goal of this final exam is to demonstrate your understanding of the whole course. When you are reviewing your work, ask yourselves “are we demonstrating our understanding of relevant topics?”

2. Related to 1, though the rubric is in the middle of the document, it is the most important part of the exam as it specifically tells you what you are being grade on. As you complete each step, ensure that you have checked your work against the rubric to make sure you are maximizing your grade. Also it indicates where to put most of your effort (i.e. the portion of the exam that is worth the most should be where you put most of your work). 

) for help that involves clarification. For example, if you do not understand what an instruction means, you can ask for clarification. 

· Similar to assignments 1 and 3, a has been created. Please check there for questions and answers.

4. You cannot ask your instructor for help doing the exam because it is expected that you know how to do it. For example, if you do not know how to make a histogram, you need to figure it out on your own. Or if you are not sure what model to use in Step 3 of a hypothesis test, you need to figure it out on your own.

· This relates to the majority of Excel issues as well. For example, if you don’t have the Data Analysis Toolpak properly installed prior to the final exam, that suggests you aren’t prepared to write the final and need to figure out the problem yourself. As another example, it is expected that you have actually used Excel to do a similar analysis prior to the final exam. Therefore, if you are having problems with doing the analysis, you need to figure it out on your own.

5. You cannot ask your instructor for feedback. 

6. For all parts, you can work as much or as little on it as you want. As long as it is completed by April 20th end of day. 

· You have been given over ten days to complete this exam. It is expected that you work on the exam throughout this period. If you choose to wait until late on Wednesday to start the exam and run into problems, then you need to accept the consequences.

· If you studied for the final exam prior to writing it, it will take 3 to 4 hours to complete. But most of you will study as you are writing it (because it is an open book exam). Therefore, plan to spend at least 12 hours working on the exam. Therefore, starting this exam three hours before it is due is like showing up to an exam two hours after it has started.

· This is a final exam. It is worth 25% of your mark. Behave accordingly.

7. This is not a complete list of rules as that is hard to do. Instead, please keep in mind the spirit of the rules which is an open book, individual exam.  

for more details.

? Sounds like a super hard area of computer science that is way too hard for a first class. But actually, you’ve already engaged in machine learning! How? you ask. Well, a type of supervised machine learning is linear regression.

The goal of machine learning is to build a model that learns or changes as new information is provided. In regression, the model is built from data and it can be improved upon as new data is provided. For example, if we build a regression model to predict the sales index for sales managers, as we hire new sales managers, we can add their information to the model, re-run the regression analysis, and get an even better prediction model.

Another big part of machine learning is testing the accuracy of our model. We often do this by taking our data set and dividing it into two parts: a training set and a testing set. The training set is used to build the model, which in our case means using the data analysis toolpak to get the regression values. Then we plug the values from the testing set into the model to see how good the model is at making predictions for a different set of data. In short, the training data set is used to build the model (in this case the regression model), while the testing data set is used to test the ability of the model to make predictions. If you are interested in finding out more, check out this (note: this isn’t needed to do this exam but is provided purely for interest).

The common rule for dividing the data is called the 80/20 split. That is, the training set is made up of 80% of the data while the testing set is made up of 20% of the data.

In this first step, divide the data set to make the training and testing set.

· Goal: Divide the data into two random samples. The first sample is called the training set and will contain 80% of the data values. The second sample is called the testing set and will contain 20% of the data values.

· How: Collect a random sample.

· Step 1: Choose a random sampling technique.

· Step 2: Apply the random sampling technique to the data set to randomly select 20% of the sales managers and their associated data. Copy and paste those into the “Testing set” part of the table below.

· Though this is the “second sample”, we are collecting it first for efficiency – it is faster to collect a 20% sample instead of collecting an 80% sample.

· Step 3: Then take the remaining 80% of the sales managers and their associated data, and copy and paste those into the “Training set” part of the table below.

· Step 4: At the top of the table, briefly explain how you collected your sample in the row provided in the table.

). One sentence makes sense to you “Standard deviation measures the dispersion of a dataset relative to its mean.” What is the right way to deal with it, so you are not engaging in plagiarism?

Options

Result

We found the standard deviation of income to be $4000. Standard deviation measures the dispersion of a dataset relative to its mean.

Plagiarism! This is a direct copy and paste without any indication of the source. This is work presented as your own when it is not. 

We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean.

Plagiarism! Though it is not a direct copy, it is still close to the websites wording and it is still presented as your work when it is not as there is no citation.

We found the standard deviation of income to be $4000. Standard deviation measures the scatter of a dataset relative to its mean (Hargrave & Westfall, 2020).

Not obviously plagiarism but still borderline. A correct in-text citation was used, but the quote was insufficiently paraphrased. Changing one word is not paraphrasing. 

We found the standard deviation of income to be $4000. This measure indicates how much the incomes vary from the mean (Hargrave & Westfall, 2020).

Not plagiarism : ) There is a correct APA in-text citation and the sentence was paraphrased. 

We found the standard deviation of income to be $4000. “Standard deviation measures the dispersion of a dataset relative to its mean” (Hargrave & Westfall, 2020, para. 2).

Not plagiarism : ) Direct quote is used (and indicated by quotation marks) and the a correct APA in-text citation was used. BUT in this exam, you should avoid using direct quotes and instead she focus on what these definitions mean in the context. 

Note: An APA proper reference at the end of the document needs to be included if outside sources are used. For this example, the APA reference would look like: 

Hargrave, M. & Westfall, P. (2020, July 21). Standard deviation definition. Investopedia.

Here are some good habits: 

· Never copy and paste a sentence straight into your exam document. Instead, immediately paraphrase it and include the reference. A lot of students copy and paste and then forget to change it –it is still plagiarism. 

· If you spend any time on a website as you are doing this exam, write down the websites name and URL in a document (use the Outside Sources table for this exam). 

Scenario 2: Your friend asks to see your exam because they just want some ideas on what they could do. 

There are not good options on this one. Probably a more accurate way to write the scenario is: Your “friend” asks to see your exam because they just want to copy and paste your work.

Do NOT share your work with anyone. If you do share your work and your “friend” copy and pastes it (even if you don’t know), you have committed academic misconduct. Friends don’t ask to borrow your work because they get how unfair it is to put you in that spot. Also simply sharing work between friends even when no copying is done is cheating (but not plagiarism) as the groups involved are getting an unfair advantage. 

In previous assignments, you have been allowed to help each other. This is NOT the case for this exam.