Assignment must not be less than 2100 words
It is due within 12 hrs from now
Stats 1 Assignment
The purpose of the statistics project is to go through the process that a statistician goes through when analyzing new data. You will be asked to do some Exploratory Data Analysis, describe the data using descriptive statistics and plots, answer some questions about probability of future information, create confidence intervals, and describe relations of the data through the use of correlations and linear regressions.
For this project you will use StatCrunch (or any other software you are comfortable using.). You will use the data from the StatCrunch Website entitled “Sample College Data.” This is data sampled from universities from a few states on the East Coast and you will use it to work through the sections below. To find the data follow the link to StatCrunch from MyMathLab. In the Data box click the link to “Data sets” and use the search bar to find the data set “Sample College Data”, then open StatCrunch.
Section 1 Exploratory Data Analysis: In this section of your analysis you will need to open the data, describe the data by data type, understand the sample size, the number of data points for each row etc.
· Think about displaying the categorical (qualitative data) using pie charts or bar charts
· For numerical (quantitative) look to show the data using histograms or other appropriate plots.
For every plot or piece of information about the data you uncover, please summarize what the plots/information says about the “shape” of the data.
Section 2 Descriptive Statistics: In this section you will describe the data using methods we learned for qualitative and quantitative data. This includes frequencies, measures of central tendency, measures of dispersion and any other statistics you may find within the data (percentiles, outlies, etc.).
For all descriptive statistics that you find ensure that you summarize each and how/what is describes in the data.
Section 3 Probabilities: In this section you will find the probability of several possible outcomes listed below. You will need to use the data to create these probabilities.
1. Find the probability that a randomly selected university will be public.
2. Find the probability that a randomly selected university will have an Admissions rate equal to 70% but less than 80%.
3. Find the probability that a randomly selected university will have an admissions rate below 60% given that it is a private institution.
Section 4 Confidence Intervals: In this section you will need to construct 95% Confidence intervals of the mean and variance of all the numerical (quantitative) data. For each please interpret the confidence interval and explain any surprising results.
Section 5 Hypothesis Testing: In this section you will make some hypotheses about the data and then use what we learned in lecture/homework to test these hypotheses.
1. Is the average Admissions Rate greater than 70?
2. Is the average Retention Rate less than 80?
3. Is the average Graduation Rate equal to 55?
4. Is the average Admissions Rate for Private universities the same as that for Public?
Section 6 Correlation and Regression: In this section you will calculate the correlation matrix and discuss any interesting findings or expected findings. You will also build a regression model and compare the coefficients to the correlation matrix.
1. Calculate the correlation matrix for the qualitative columns.
2. Build a regression model that takes the dependent variable Admissions Rate:
a. Do a simple linear regression with SAT Reading as the independent variable.
b. Do a simple linear regression with Average Financial Aid as the independent variable.
c. Compare the two above outputs to the correlation matrix entries and discuss what you see.
3. Build a regression model that takes the Variable Graduation Rate:
a. Repeat a-c from part 2 and discuss the results.