Quantitative variables
1. Quantitative variables can be discrete or continuous. Explain the difference between discrete data and continuous data, and give one example of each.
2. A measure of location is a quantity which is typical’ of the data. Give the names of three such measures, and explain (in words, not formulae) how each is found.
3. What is a measure of spread? Give the names of three such measures.
4. A random sample of a particular attribute yields the histogram shown in figure 1. Suggest a suitable measure of location and a suitable measure of spread for these data.
5. The probability that a ship has a defective radar is 0.05. The probability that a ship has a defective echo is 0.06. Three in one hundred ships have both a defective echo anda defective radar. Find the probability that a randomly chosen ship has either a defective echo ora defective radar.
6. Under what conditions might we use a binomial distributionas a probability model for our data?
7. Under what conditions might we use a normal distribution as a probability model for our data?
8. In hypothesis testing, the p-value can be thought of as the chance of obtaining the observed results, or more extreme results, if the alternative hypothesis is correct. TRUE or FALSE?
9. Write down the assumptions implicit in ANOVA.
10. The scatterplot shown in figure 2 is obtained when observations from two variables are plotted against each other. Choose two words from the following list which might be used to explain the relationship between theses two variables:
Linear Negative Significant Positive Regression Indirect
Section B: Data response
Questions 1, 2, 3 and 4 require you to use your own personal dataset. You should have already made a note of your personal dataset number; if not, follow the instructions on the front of this paper.
As above,enter Blackboard and click on MAR8001. Click on Topic 3: Statistics, click on Project’ and click the file Project data 2011′, then,if your personal number is say 21, scroll through till you get to the part beginning
Data Display
K1 21.0000
Copy the following data until you reach the next set of data (K1 22.0000 in the above case) into a file in your workspace.
1. Input the data entitled ship speed andice thrust into Minitab. The data are the results obtained from 30 ice breaker trials, where the ice thrust, y (in thousand Newtons), was recorded for various ship speeds, x (in metres per second). We are interested in making predictions of ice thrust based on ship speed.
(a) Use a scatterplot to determine whether there is any association between ice thrust and ship speed. Include the plot in your solutions, and make appropriate comments. (4 marks)
(b) Calculate the correlation coefficient between ship speed and ice thrust. Does your correlation coefficient support what you see in the plot in part (a)? Is your correlation coefficient significantly different from zero? (4 marks)
(c) Perform a regression analysis on these data, and include the regression table in your solutions. State the estimated regression equation, and interpret your R2 statistic. (4 marks)
(d) Using the Minitab output in (c), test the null hypothesis that the population slope, ?, is equal to zero. (2 marks)
(e) Check the assumptions implicit in your regression, i.e. that the residuals are normally distributed and independent of fitted values. (5 marks)
(f) Use your estimated regression equation to predict the ice thrust of a ship traveling at a speed of 6.8 metres per second. (1 mark)
2. Enter the data entitled muzzle into another column in Minitab. The data are the muzzle velocities (in metres per second) of 50 shells tested with a new gunpowder.
(a) Summarise the data numerically using suitable measures of location and spread. Produce appropriate graphical summaries of your data, and include this in your solutions. Comment on the shape of the sample distribution. Does the data appear to be approximately normally distributed?(6 marks)
(b) Assuming approximate normality, produce a 95% confidence interval for the mean muzzle velocity. From your interval, is there any evidence of a departure from the target mean of 3000 metres per second?(3 marks)
(c) Use a one-sample t-test to test the null hypothesis that the population mean muzzle velocity is equal to 3000 metres per second. Use your graphs in part (a) (or otherwise) to check the assumption implicit in this test, and clearly state your null and alternative hypotheses.
(6 marks)
3. In Minitab after the MTB >’, type Read 8 8 M1. Then type in the following numbers:
0.0 8.7 59.3 50.0 12.5 30.2 53.2 10.1
8.7 0.0 55.3 44.5 12.0 27.9 49.0 10.1
59.3 55.3 0.0 48.6 60.9 62.7 11.9 55.3
50.0 44.5 48.6 0.0 50.7 26.0 45.3 49.7
12.5 12.0 60.9 50.7 0.0 27.5 56.7 10.0
30.2 27.9 62.7 26.0 27.5 0.0 28.6 26.0
53.2 49.0 11.9 45.3 56.7 28.6 0.0 49.0
10.1 10.1 55.3 49.7 10.0 26.0 49.0 0.0
Now type Print M1 . The data in the session window are in the form of a distance matrix, the values in this matrix representing the distance (in metres) between eight colonies of tropical plants found in the Maracay region of Venezuela. For reference, the names of these plants are:
1:Abuta 2:Cascarilla 3: Brazilian Pepper Tree 4:Cedro Rosa
5:Maracuza 6:Zanga Tempo 7:Tiririca 8:Gervão
(a) From the distance matrix in Minitab,
i. write down the distance between the Abuta and Maracuza colonies;
ii. write down the distance between the Gervão and Brazilian Pepper Tree colonies;
iii. write down the names of the two colonies closest together. (3 marks)
(b) Which multivariate analysis technique could be used to recover the co-ordinates of each colony, and so produce a map of the locations of all the colonies? (don’t attempt this analysis) (1 mark)
(c) Perform a nearest neighbour (or single linkage) cluster analysis to identify clusters of colonies at various distances. Include the Minitab output and dendrogram in your solutions. (5 marks)
(d) By referring to the dendrogram in (c),
i. at what distance do the Abuta, Cascarilla, Maracuza and Gervão colonies form a single
cluster? (1 mark)
ii. if we choose a distance of 15 metres to separate clusters, how many clusters do we have? Name the plant colonies within each cluster. (2 marks)
iii. find the minimum distance that should be chosen in order to obtain exactly two clusters of colonies. Name the plant colonies within each cluster. (3 marks)
4. Read the four datasets entitled Palms: ¦’ into four further columns and name them by where they come from. These data correspond to the yields (in kilograms per hectare) of the Cocosnucifera, or coconut palm, for plantations across four locations in the Caribbean Jamaica, Turks & Caicos Islands, Granada and Puerto Rico. Perform a one-way ANOVA in Minitab to test the null hypothesis that there is no difference between population mean yields for the four locations. Remember to
¢ check the assumption that each sample is drawn from a normal distribution (by using normal probability plots, for example);
¢ check the assumption of equal population variances;
¢ clearly state your null and alternative hypotheses;
¢ interpret your p-value using table 3 in the lecture notes.
If you find that there is a significant difference between the mean yields observed at the four locations, use the follow-up procedure of multiple comparisons (or Tukey’s test) to find out between which pair(s) of locations these differences lie. You should write this question up as a report, with a short introduction and conclusion whichsummarises your findings. Include any relevant Minitab output, and remember to make appropriate comments as you go along. (20 marks)
Section C: Critical appraisal
Find a technical paper of interest to you that has some statistical analysis in it. Describe the statistical methods used. What assumptions have the authors made in doing this analysis? Comment critically on the data collection, data presentation, statistical analysis and presentation of the results. You should try to cover these details in no more than two sides of A4 paper. Attach a copy of the paper to your report. Please try to ensure that you are describing a different paper to your colleagues.
(20 marks)