# Descriptive Data Mining

CH 4 – Descriptive Data Mining

As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the

periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of

service quality, which includes the waiting times of the customers in the checkout section, he collected the data shown

below on 100 customers who visited the store.

Use data in tab “Prb47-50” for problems 47 through 50

47. Using the data given, apply k-means clustering with k = 5 using Wait Time (min), Purchase Amount ($), Customer

Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data, and specify 50 iterations and

10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Analyze the resultant clusters.

What is the smallest cluster? What is the least dense cluster (as measured by the average distance in the cluster)?

What reasons do you see for low customer satisfaction ratings?

Print program output

48. Using the data given, apply hierarchical clustering with 5 clusters using Wait Time (min), Purchase Amount ($),

Customer Age, and Customer Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the

XLMiner Hierarchical Clustering procedure. Use Ward’s method as the clustering method.

a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the five clusters

in the hierarchical clustering.

b. Identify the cluster with the largest average waiting time. Using all the variables, how would you characterize

this cluster?

c. Identify the smallest cluster.

d. By examining the dendrogram on the HC_Dendrogram worksheet (as well as the sequence of clustering stages

in HC_Output1), what number of clusters seems to be the most natural fit based on the distance?

49. a. Using the data given, apply hierarchical clustering with 5 clusters using Wait Time (min) and Customer

Satisfaction Rating as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical

Clustering procedure, and specify single linkage as the clustering method. Analyze the resulting clusters

by computing the cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet

generated by XLMiner to compute descriptive measures of the Wait Time and Customer Satisfaction Rating

variables in each cluster. You can also visualize the clusters by creating a scatter plot with Wait Time (min)

as the x-variable and Customer Satisfaction Rating as the y-variable.

b. Repeat part a using average linkage as the clustering method. Compare the clusters to the previous method.

50. Using the data given, apply k-means clustering using Wait time (min) as the variable with k = 3. Be sure to Normalize

input data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then

create one distinct data set for each of the three resulting clusters for waiting time.

a. For the observations composing the cluster which has the low waiting time, apply hierarchical clustering with Ward’s

method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be

sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data

in HC_Clusters, report the characteristics of each cluster.

b. For the observations composing the cluster which has the medium waiting time, apply hierarchical clustering with

Ward’s method to form three clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as

variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a

PivotTable on the data in HC_Clusters, report the characteristics of each cluster.

c. For the observations composing the cluster which has the high waiting time, apply hierarchical clustering with Ward’s

method to form two clusters using Purchase Amount, Customer Age, and Customer Satisfaction Rating as variables. Be

sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering procedure. Using a PivotTable on the data

in HC_Clusters, report the characteristics of each cluster.

CH 4 – Descriptive Data Mining

Copyright Cengage Learning. Powered by Cognero. Page 8

To examine the local housing market in a particular region, a sample of 120 homes sold during a year is collected. The

data are given below.

Use data in tab “Prb51-54” for problems 51-54

51. Using the data given, apply k-means clustering with k = 10 using LandValue ($), BuildingValue ($), Acres, Age, and

Price ($) as variables. Be sure to Normalize input data, and specify 50 iterations and 10 random starts in Step 2 of the

XLMiner k-Means Clustering procedure. What is the smallest cluster? What is the least dense cluster (as measured by the

average distance in the cluster)?

52. Using the data given, apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres,

Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering

procedure. Use Ward’s method as the clustering method.

a. Use a PivotTable on the data in the HC_Clusters1 worksheet to compute the cluster centers for the clusters in the

hierarchical clustering.

b. Identify the cluster with the largest average price. Using all the variables, how would you characterize this cluster?

c. Identify the smallest cluster.

53. a. Using the data given, apply hierarchical clustering with 10 clusters using LandValue ($), BuildingValue ($), Acres,

Age, and Price ($) as variables. Be sure to Normalize input data in Step 2 of the XLMiner Hierarchical Clustering

procedure, and specify complete linkage as the clustering method. Analyze the resulting clusters by computing the

cluster size. It may be helpful to use a PivotTable on the data in the HC_Clusters worksheet generated by XLMiner.

You can also visualize the clusters by creating a scatter plot with Acre as the x-variable and Price ($) as the y-variable.

b. Repeat part a using average group linkage as the clustering method. Compare the clusters to the previous method.

54. Using the data given, apply k-means clustering using Price ($) as the variable with k = 3. Be sure to Normalize input

data, and specify 50 iterations and 10 random starts in Step 2 of the XLMiner k-Means Clustering procedure. Then create

one distinct data set for each of the three resulting clusters of price.

a. For the observations composing the cluster with low home price, apply hierarchical clustering with Ward’s method to

form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner

Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each

cluster.

b. For the observations composing the cluster with medium home price, apply hierarchical clustering with Ward’s method

to form three clusters using Acres and Age as variables. Be sure to Normalize input data in Step 2 of the XLMiner

Hierarchical Clustering procedure. Using a PivotTable on the data in HC_Clusters1, report the characteristics of each

cluster.

c. Comment on the cluster with high home price.

**You Can See the Upload file for the Data.**