Introduction
Today, to be successful, business organizations need rapid and easy access to information about their customers, internal finances, and external market conditions—information collectively known as Business Intelligence (BI). Modern business environment requires a timely and accurate insight into up-to-date business conditions. Without relevant information extracted from data decisions may be more risky and may not be the best one for the faced challenges.
This course provides basic introduction into key areas such as OLAP (that stands for On Line Analytical Processing) Design, Data Warehousing (DW), and Data Mining (DM) emphasizing DM to ensure that students of the course will gain the necessary background and skills to turn available data into valuable and useful information.
The course consists of lectures, tutorials (demonstrations) followed by laboratory works, and seminars where students are required to demonstrate their group work.
Objectives of the Course
- Provide basic introduction into key areas such as
- OLAP (that stands for On Line Analytical Processing) Design,
- Data Warehousing (DW), and
- Data Mining (DM)
- Provide an overview of most common tasks and application areas of DM
- Prediction and knowledge discovery
- Provide an overview of most common techniques used in DM
- Building and evaluating predictive and descriptive models
- Ensure that students of the course will gain the necessary background and skills
- to turn available data into valuable and useful information
|
Overview of the Course
- Lectures 15*2 = 30 hours
- Wed 8:15 – 10:00, Thu 12:15 – 14:00, Fri 10:15 – 12:00
- all in Ag Beeta
- Nov 17, 12.00 – 14.00 Sami’s public examination of PhD (after the lecture)
-
Tutorial followed by an assignment 5*2 = 10 hours
- Tue 14.15 – 16.00, Ag B212.2 (Mountains),
- but week 50: Wed 8:15 – 10:00
-
Seminar 2 hours
- Ag B212.2 (Mountains)
- 5-10 min presentation by each student about the final Assignment
-
Final assignment (no final exam)
-
to be sent to mpechen{AT}cs{dot}jyu{dot}fi and samiayr{AT}mit{dot}jyu{dot}fi by the end of Jan’07 (always use TIES443 keyword in the subject field)
Lectures & Handouts
Date and Time |
Lecture title |
Instructor |
Handouts |
Wed, 1.11.06, 8:15-10:00 |
Introduction to the course |
Mykola Pechenizkiy |
Handouts |
Thu, 2.11.06, 12:15-14:00 |
Business Intelligence |
Mykola Pechenizkiy |
Handouts |
Fri, 3.11.06, 10:15-12:00 |
Data Warehousing |
Mykola Pechenizkiy |
Handouts |
Wed, 8.11.06, 08:15-10:00 |
Data Mining: basic concepts |
Sami Äyrämö |
Handouts |
Thu, 9.11.06, 12:15-14:00 |
Data Mining process |
Sami Äyrämö |
Handouts |
Fri, 10.11.06, 10:15-12:00 |
Data preprocessing |
Sami Äyrämö |
Han & Kamber's book: Chapt.3: "Data Preparation" Tan et al. book: Topic 2: Data |
Wed, 15.11.06 |
Clustering. Part I |
Sami Äyrämö |
Han & Kamber's book: Chapter 8 Tan et al. book: Present. 8 |
Thu, 16.11.06 |
Clustering. Part II |
Sami Äyrämö |
Tan et al. book: Presentation 9 |
Fri, 17.11.06 |
Visualization |
Mykola Pechenizkiy |
Handouts |
Wed, 22.11.06 |
Classification. Part I |
Mykola Pechenizkiy |
Handouts |
Thu, 23.11.06 |
Classification. Part II |
Mykola Pechenizkiy |
request by e-mail |
Fri, 24.11.06 |
Evaluation |
Mykola Pechenizkiy |
Piatetsky-Shapiro & Parker course: Modules DM10-DM11 |
Wed, 29.11.06 |
Association rules |
Mykola Pechenizkiy |
Tan et al. book: Present. 6-7
Piatetsky-Shapiro & Parker course: Beginning of module DM9 |
Thu, 30.11.06 |
Advanced and Miscellaneous issues |
Mykola Pechenizkiy |
Handouts |
Fri, 1.12.06 |
Closing lecture |
Sami Äyrämö
|
Handouts |
Tutorials & Assignments
Date and Time |
Title of Tutorial |
Instructor |
Handouts |
Assignment |
Tue, 7.11.06, 14:15-16:00 |
Prototyping DM techniques with WEKA and YALE open-source software
|
Mykola Pechenizkiy |
Handouts |
Assignment |
Tue, 14.11.06, 14:15-16:00 |
Prototyping DM techniques with Matlab environment |
Sami Äyrämö |
Handouts |
Iris dataset |
Tue, 21.11.06, 14:15-16:00 |
Mining time-series data |
Mykola Pechenizkiy |
request by e-mail |
Assignment |
Tue, 28.11.06, 14:15-16:00 |
Mining image data |
Sami Äyrämö |
Handouts |
Assignment |
Wed, 13.12.06, 08:15-10:00 |
Mining textual data |
Miika Nurminen |
Slides and reading |
Assignment |
Seminar & Final Assignment
Start to Prepare for the Final Assignment
Seminar will be on Tuesday, Nov 28, right after tutorial (16.15-18.00)
Send me e-mail if you know what you would like to do as a final assignment
If you do not have any ides, then:
- Decide on application domain that interests you
- Two types of assignments: (1) mining real dataset, and (2) comparison of techniques
- Search for a dataset (e.g. from KDD Cups) if you decide to analyze data
- Search for a collection of datasets if you decide to compare DM techniques
- Define the problem: (1) goal, (2) input, (3) constraints or background
knowledge if any, (4) expected outcome, (5) how do you plan to apply DM
(if 1st type) or compare different techniques (if 2nd type) as detailed as
possible
- 5-10 min presentation
- Mykola and Sami will give you feedback
- You will try to do what you promise during the seminar, report you work
and send it by e-mail to Mykola and Sami
There will be additional summary information placed to the course webpage after the seminar if needed.
Literature
Introduction to the field
- Witten I., Frank E. 2000. Data Mining: Practical machine learning tools with Java implementations", Morgan Kaufmann, San Francisco. (book & software page)
- Crawford D. 1996 Special Issue on Data Mining. Communications of the ACM, Volume 39, Number 11, November, 1996
- Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg.
- Han J. and Kamber M. 2000, Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers, 550 pages. ISBN 1-55860-489-8 (ppt slides to the book)
- Data Mining: A Practitioner’s Approach, ELCA Informatique SA, 2001
- CRISP-DM 1.0: Step-by-step data mining guide, SPSS Inc.
Advanced reading
Links
Under Construction!
- Business Intelligence
- 1
- 2
- Data Warehouses
- 1
- 2
| |
|
|
News |
All materials for Tutorial 5 (Text Mining) are available online!
Handouts for Lecture 10 (Classification) are available online!
Handouts for Lecture 9 (Vizualization) are available online!
The task for the third assignmnent is available online!
The next lection will be on Thursday, November, 23 at 12:15.
The next Tutorial will be on Tuesday, November 28 at 14:15 in Ag B212.2 (Mountains)!
|
|