CML
May 25, 2020
Effects of the current economic crisis and the state of global economy on the environment
May 25, 2020

sales data

analyze sales data from the Executive Express Café using simple and multiple regression as well as time series and forecasting techniques. It is strongly recommended that you complete the project in parts as we complete each unit of material.

Groups: It is recommended that you work with a partner; however, you may choose to work by yourself.

Assignment: Follow the instructions below for the three project parts:
€¢ Part 1: Simple Regression
€¢ Part 2: Multiple Regression and
€¢ Part 3: Time Series & Forecasting
Produce a report in Word (only one per team needed) that:
€¢ Clearly states all team members at the top of the page
€¢ Shows results of each analysis. You should paste the Excel and/or StatCrunch output directly into your Word file with identification of which question you are answering. For example:
2. Results of regression analysis to predict ____ based on _____
)
€¢ Answers any questions or includes any discussion that the assignment specifies
€¢ Uses high quality, professional writing standards. See SCOB writing guidelines posted on Blackboard.

Submitting your Project
1. Submit your REPORT (Word file) on paper in class on the due date.
2. Name your REPORT (Word file) lastname_lastname_project.docx, where the last names of the group members (up to two) are substituted as appropriate, then submit this REPORT electronically, no later than 11:59 pm on the due date, to Turnitin using the Project View/Submit link in Blackboard shown in the Project folder in Assignments.
3. Submit your work in Excel (or other applications) using the Project Work link in Blackboard shown in the Project folder in Assignments. This file should be named lastname_lastname_work.

Evaluation: Your report will be evaluated on these criteria:
€¢ Correctness of your statistical analyses and conclusions
€¢ Clarity and content of discussion or answers to questions posed
€¢ Quality of the writing (grammar, punctuation, spelling, etc.)
€¢ Point value: 60 points total

 

Part 1: Simple Regression Analysis of Trend

Note: for problems with a *, paste Excel or StatCrunch outputs into your report.

[22 points total] In these analyses, you will consider factors that influence trends in sales, both overall and in certain products, and therefore the business aspects of these café data.

1. [3 points]* Create the following scatterplots and write one or two sentences describing any patterns that appear in the graphs:
€¢ Number of sodas sold (Y) vs. time (X)
€¢ Number of coffees sold (Y) vs. time (X)
2. [3 points]* Use separate regression models to describe the relationship between:
€¢ number of sodas and time and
€¢ number of coffees sold and time
What do you notice about the slopes in each of these regression models?
3. [3 points] A slope over time is called a trend. Trends can be either positive or negative. Compare the two regression models you constructed above. Are the trends significant (check p-values for slope)? Examine the data and comment on why the r-squared values are low. Hint: look at patterns in the data that repeat each week.
4. [2 points] Consider the time period (January to April) over which these data were collected. What factors could be affecting coffee and soda sales?
5. [3 points]* Run an analysis to calculate all of the correlations between the variables soda, coffee and maximum daily temperature. Are there significant correlations (check p-values)? Between which variables? Are these variables positively or negatively correlated?
6. [4 points]* If sodas and coffees are combined (the variable names Total Soda and Coffee), does the regression analysis indicate a significant trend (check p-value for slope) in the number of drinks sold per day? From a business perspective, use these and earlier results to try to explain what is happening with your coffee and soda sales over the course of the semester.
7. [4 points]* Does a regression analysis indicate a significant trend (check p-value for slope) in overall café Sales? Comment on the r-squared value and the desirability of using only time to predict overall sales figures. If the café managers wanted to use such a model to predict sales, would you recommend it?
HINTS:
€¢ When it says is there a trend it indicates that you should use time (t) as the independent (X) variable. This applies to, for example, #3, #6 and # 7.
€¢ A correlation analysis can be run in StatCrunch using Stat > Summary Stats > Correlation. Then choose all three of the variables specified. The output will tell you the correlation coefficient, r, between each pair of variables (i.e. the correlation between soda and time, the correlation between coffee and time, and the correlation between coffee and soda.)

Part 2: Multiple Regression Analysis with Dummy Variables for Day of Week

Note: for problems with a *, paste Excel or StatCrunch outputs into your report.

[22 points total] In this analysis, we will to incorporate day of the week into a multiple regression model and then try to predict sales using time and day of week. We will then add the temperature variable to see how much value it adds to the model.

8. [4 points]* Create a multiple regression model that uses time and days of the week to predict sales. You will need to use four dummy variables (Monday, Tuesday, Wednesday, and Thursday) instead of using the DayofWeek variable as it currently appears in the data set. How good is this model?
9. [3 points] Interpret the regression coefficient for time. That is, by how much do you estimate that sales change as the time (day) increases by 1? Does the time variable have a significant relationship with Sales (check p-value)? Why or why not?
10. [2 points] Note that the dummy variables for each day of the week are all equal to 0 for Fridays. What this means is that the coefficient of Monday tells you how much sales differ from Monday to Friday (with Monday = 1 and Friday = 0), and the coefficient of Tuesday tells you how much sales differ from Tuesdays to Fridays, etc. Using this method, describe:
a. How much higher or lower sales are each day of the week, compared to Friday.
b. Which day (M-F) are sales highest? Which days (M-F) are lowest? How do the others compare?
11. [3 points] Calculate predictions (forecasts) for Sales for each period t = 1, 2, €¦, 48 using the Multiple Regression equation you developed.
€¢ Show all of your work in Excel.
€¢ In your report, use one or two sentences to explain how the predictions were calculated.
12. [4 points] Use your model to predict Sales for time periods t = 49 through 53.
€¢ Show your work in Excel but briefly describe or show equations to demonstrate how the predictions were calculated in your report.
€¢ Remember that when you have, for example, a day that is, for example, Monday, you will treat the X variable representing Monday as a 1 and the other X variables representing the other days of the week as 0s. Note that when you get to Fridays, all X variables for days of the weeks will be entered as 0.
€¢ Based on your evaluation of the quality of your multiple regression model, discuss how accurate you think these forecasts might be.
13. [3 points]* Try adding Max Daily Temperature to your regression model that already includes time and the four day of the week dummy variables. How much more variation in Sales can be explained if the Max Daily Temperature variable is included?
14. [3 points] Which model (either the Multiple Regression model from # 8 or the one in # 13 that adds max daily temperature) would you recommend to the café’s managers? What did you learn about the café’s business from this analysis that you could share with its managers to help them better manage the business? State at least two observations that would be meaningful to them.

HINTS:
€¢ In #8, a lot of students are using the wrong dependent variable. It is asking you to predict sales, so sales is the dependent (Y) variable. The independent variables are time (X1), Mon (X2), Tues (X3), Wed (X4), and Thurs (X5).
€¢ On #13, remember the exam question that asked you to compare two models and use the one that is most efficient. In general, insignificant variables that do not increase r-squared very much or at all are generally not desirable.
Part 3: Using Time Series Analysis to Forecast Café Sales

[16 points total] In this analysis, we will use time series and forecasting techniques to forecast café sales. We will lastly compare the accuracy of these time series forecasts with the accuracy of the multiple regression forecasts to determine which method works best for this data set.

Time Series Forecasting Approach

15. [5 points] Develop a time series decomposition model to forecast Sales using trend and seasonality for days of the week. That is, calculate the seasonal indices for each day (MTWTF) and specify the trend equation. Hints: use Applet # 16, with 5 days of the week to get seasonal indices. Remember that the FIRST index will refer to a Tuesday, since that is the first data point. Also, remember that you must do something to the original sales data before computing a trend line.
€¢ Show your work in Excel or otherwise submit output from applets or StatCrunch.
€¢ In your report, state the trend equation and the seasonal indices.
16. [1 point] Discuss the trend component. Are total sales increasing or decreasing over time? If so, by how much, on average?
17. [1 point] Discuss any seasonality that is present. Which days appear to have higher or lower sales and by how much? (Note that data, t = 1, starts on a Tuesday so this should be indicated in the model that the first observation corresponds to the 2nd period during the week.)
18. [2 points] Calculate the Seasonally Adjusted Trend Forecasts for Sales periods t= 1, 2, €¦, 48 using the results of your decomposition in #1. Hint: you will need to use trend and seasonality to generate these forecasts.
€¢ Show your work in Excel.
€¢ In your report, use one or two sentences to explain how the forecasts were calculated.
19. [2 points] Calculate the MAD for the Seasonally Adjusted Forecasts for t = 1, 2, €¦, 48 (from #4) and discuss how close the forecasts were to actual observed values. (Unlike exponential smoothing, you should use all 48 error values to calculate the MAD.)
€¢ Show your work in Excel.
€¢ In your report, report the value of the MAD and discuss.
20. [2 points] Calculate the Seasonally Adjusted Forecasts for Sales for one week into the future (t = 49, 50, 51, 52, 53. Be careful, t = 49 is a Friday). Based on your MAD, discuss how accurate you think these forecasts might be, and how might this information be used by the café’s managers?
€¢ Show your calculations in Excel.
€¢ In your report, display your five forecasts and briefly explain how they were calculated.

Comparing to Multiple Regression Forecasts for Café Sales

21. [2 points] Compute the MAD for the Multiple Regression predictions/forecasts for Sales from # 11 above for the periods t = 1, 2, 3, €¦ 48. (Unlike exponential smoothing, you should use all 48 error values to calculate the MAD.)
€¢ Show your work in Excel.
€¢ In your report, report the value of the MAD.
22. [1 point] Compare the MAD of the Multiple Regression forecasts (calculated in #21 above) to the MAD for your Seasonally Adjusted forecasts (calculated in # 19 above). Comment on which method appears to produce more accurate forecasts and why.