Home Page > > Details





There are 4 tasks on this assignment (20 points). The dataset is “GroceryPurchase.sav” (spss datafile).

Task 1: Data Descriptive (2 points)

Task 2: Clustering k-means (5.5 points)

Task 3: Regression Analysis (6 points)

Task 4: Logistic Regression Analysis (6.5 points)

Note: 10 points will be deducted if you do not submitspss output file. As an alternative, if you are coding in R, submit your R script.

The Problem: A grocery retailer wants to implement price promotion on products during Christmas and wants to understand purchase likelihood of shoppers. To make a solid evidence-based decision on whether or not to run a Christmas promotion campaign, the retailer entrusts the business analyst to analyzeshopper information in the retailer’s database and shopping statistics over the last year, including how shoppers responded to the Christmas promotion campaign last year. As the  business analyst entrusted with the analysis for the grocery retailer, perform. the following five tasks to help understand consumer categories and responses. The dataset is named “GroceryPurchase.sav” (spss datafile) and contains demographics and past behavior of 2100 shoppers.

Data Description: Following is the description of all shopper-related variables (including continuous and categorical variables in the dataset):

Variable 1: Age: shopper’sage in years

Variable 2: Education: shopper’seducation level; 1 = "High School Graduate", 2 = "College Graduate", 3 = "Master’s Degree", 4 = "PhD"

Variable 3: Married: shopper’smarital status; 1 = “Married”, 0 = “Not married”

Variable 4: Income: shopper’s annual income in $

Variable 5: Kids: are there kids in the shopper’s household: 1 = “household has kids”, 0 = “household has no kids”

Variable 6: Membership: number of days from shopper’s membership at the grocery chain

Variable 7: Days_last: number of days from shopper’s last shopping trip

Variable 8: DealsPurchased: number of purchases made on deals by shopper in last year

Variable 9: WebPurchases: number of purchases shopper made online in last year

Variable 10: StorePurchases: number of purchases shopper made in-store in last year

Variable 11: WebVisits: number of shopper visits to grocery website in last year

Variable 12: Seafood_and_Bakery: seafood and bakery purchase tier of customer in last year; 1: “low purchase in seafood and bakery”, 2: “average purchase in seafood and bakery”, 3: “high purchase in seafood and bakery”

Variable 13: Purchase: shopper’s purchase decision in response to Christmas products promotion by grocery retailer last year; 1: purchase, 0: no purchase

Task 1: Data Descriptive (3 tasks: 2 points)

a) Discuss descriptive statistics for all continuous variables in the dataset. Also create histogram plots for all the continuous variables and discuss approximately how many observations have the highest value for each of the continuous variables. (0.5 point)

b) Check frequencies for all categorical variables in the dataset. (0.5 point)

c) Check correlations between all variables (including the continuous variables and categorical variables) in the dataset and present results in a correlation table. Discuss which variables are  significantly correlated above correlation level 0.5. (1 point)

Task 2: Clustering for Market Segmentation (4 tasks: 5.5 points)

The goal here is to segment the customers of the grocery chain based on their past behavior and demographics. Consider variables from Variable 1 to Variable 12.

a) For the pair of variables with correlation > 0.5 (as per results in Task 1:c), discuss which variable you would drop from clustering analysis and which variable would you retain for clustering the Airbnb listings. (1 point)

b) For the continuous variables, calculate the standardized variables (z-scores). In your submission, print a screenshot of the first few rows of calculated z-scores. (0.5 point)

c) Identify customer segments/clusters based on the categorical variables and the standardized continuous variables. For this purpose, run k-means clustering with 3 clusters. Do you get at least 2 good-sized clusters of customers (with sizeable number of members in at least 2 clusters)? What is  the difference in mean values of the clustering variables for the 2 sizeable clusters identified? (2 points)

d) Repeat the above task 4 with the categorical variables and unstandardized continuous variables (original values of continuous variables). Are there differences in result from the above task?

Discuss the differences, if any. (2 points)

Task 3: Regression Analysis for Shopper Purchase Volume Prediction (2 tasks: 6 points)

The goal in this task is to understand how demographics and shopping patterns affect ashopper’s purchase volume of products on deals.

The outcome variable of interest is variable 8, “DealsPurchased” and the predictor variables of interest are Variable 1, Variable 3 to Variable 7, Variable 10 to Variable 11 (i.e., variables 1, 3, 4, 5, 6, 7, 10, 11).

a) Run Regression Variable Plots for plotting relation between the dependent/outcome variable, DealsPurchased,i.e., number of purchases made on deals by shopper in last year (input as vertical axis variable) and each of the continuous predictor variables (input as horizontal axis variable).

Discuss the overall trend of the relations. (3 points)

b) Run regression analysis with dependent variable, DealsPurchased,i.e., number of purchases made on deals by shopper in last year and the above-mentioned predictor variables. Looking at the Model Summary, what is the value of R square and what does that denote? From the ANOVA table  is the F-test, and thus the model statistically significant, and if so, significant at what level, p < .001, or p < .01, or p < .05? Discuss which predictor variables significantly affect the dependent variable, DealsPurchased,i.e., number of purchases made on deals by shopper and interpret the significant effects. Are there lessons or managerial implications for the grocery retailer from the findings on understanding what type of customers to target for deals? (3 points)

Task 4: Binary Logistic Regression for predicting Shopper’s Purchase Choice (2 tasks: 6.5 points)

The goal in this task is to understand how demographics and shopping patterns affect ashopper’s choice to purchase products during the retailer’s Christmas promotion.

The outcome variable of interest is variable 13, “Purchase”, i.e., ashopper’s purchase decision in response to Christmas products promotion and the predictor variables of interest are Variable 1,  Variable 3 to Variable 7, Variable 10 to Variable 11 (i.e., variables 1, 3, 4, 5, 6, 7, 10, 11).

a) Run binary logistic regression analysis to predict whether shopper purchased in response to Christmas promotion in the past year, based on demographics and other behaviors. Looking at the results, which predictor variables significantly determine log-odds of purchase in response to the    Christmas promotions at the grocery retail? If so, by how much and what is the percentage change (increase or decrease) in probability of purchase based on the significant predictor variables? Are there lessons or managerial implications for the retail store? (3.5 points)

b) Comment on the accuracy of the above task (task a) on binary logistic regression. More specifically, answer the following: (3 points)

i. What is the number of true positive cases classified?

ii. What is the number of false positive cases classified?

iii. What is the number of true negative cases classified?

iv. What is the number of false negative cases classified?

v. What is the overall prediction accuracy rate

Purchase: 1 indicates positive case (purchased in past year) and 0 indicates negative case (did not purchase in the past year).

Contact Us - Email:99515681@qq.com    WeChat:codinghelp
Programming Assignment Help!