Database Exercises
Chapter
1
2
3
4
5
6
7
8
9
10
11
12
- Find at least two commercial OLAP tools and compare the features.
- Find a commercial data mining tool and outline the steps needed to extract and transform data from a typical DBMS so it is usable by the system.
- Find a commercial data mining tool and outline the steps needed to perform a market basket analysis.
Most of the following questions require an OLAP cube processor. You should have access to SQL, an OLAP browser within the DBMS, or a PivotTable. For the data mining tools, if you do not have access to specialized software, you can use Excel for simple analyses.
Sally’s Pet Store
- Create a cube to browse merchandise sales by time, state, employee, and item category.
- Create a cube to browse animal sales by time, category, breed, gender, and registration.
- Create a cube to browse both animal and merchandise sales by time, state, and category.
- Create a cube to browse purchases of merchandise from suppliers based on time, employee, and location. As facts, include the value of the purchase, the shipping cost, and the delay between order and receipt.
- What is the correlation between the age of an animal at the time of the sale and its price?
- If you have access to market basket software, evaluate the sales tables to see if any associations exist.
- As a simple time series analysis, extract the merchandise sales by week, plot the data, and estimate the trend line.
- Are purchases of some categories of items significantly larger than others? For instance, does the company sell substantially more dog products?
- Are there certain categories of customers who purchase more than others? This is a general clustering problem: If you do not have access to the specialized software, you can test attributes using regression.
- Using weekly sales of merchandise, forecast sales for the next three weeks.
- Is there a geographic pattern to sales? Do some states or regions have more sales?
Rolling Thunder Bicycles
- Create an OLAP cube to evaluate sales (value and quantity) by model type, state, time, and sales employee.
- Create a new size dimension to reduce the dimension of frame size down to three (small, medium, and large). Classify the bicycles into this dimension. If available, use clustering software; otherwise, use the mean and standard deviation. Remember that road bikes are measured in centimeters and mountain bikes in inches.
- Add the reduced size dimension created in Exercise 16 to the OLAP cube created in Exercise 15.
- Create an OLAP cube to evaluate production time by order date (time), model type, month, and employee who assembled the frame.
- Create an OLAP cube to evaluate purchases of components by time, manufacturer, road or mountain bike, and component category.
- What is the correlation between sales by city and its population? Evaluate both for quantity and value of the bicycles.
- What is the correlation between the size of a bicycle (frame size) and its price?
- Using monthly sales by model type, forecast sales for the next six months.
- If you have market basket analysis software, evaluate the purchases of components. Do any patterns exist—outside of the defined groupo relationships?
- Create an OLAP cube to evaluate sales (quantity) by paint type, letter style, and model type.
- Ignoring capital costs but including salary, evaluate profit by month and forecast it for 6 months.
Corner Med
- Use association software or computations to see if some diagnoses commonly arise together.
- If you have clustering software or decision-tree software, analyze the patient visits to see if patients can be split into separate groups.
- Using categorization software, such as regression, neural network, or decision tree, try to identify features of patients that spend the most money.
- Create an OLAP cube to explore physician data in terms of patients and procedures. Managers want to focus on revenue and patients visited per day, week, and month.
- Forecast the number of patients expected for a specific month. Hint: Use simple regression unless you have access to a time series analyzer.