Mykola Pechenizkiy

TU/e | JYU
Short CV (pdf) | Short Bio | Full CV (pdf)
in chronological order | grouped by research area | grouped by category | listed in DBLP
subglobal4 link | subglobal4 link
subglobal5 link | subglobal5 link
subglobal6 link | subglobal6 link

TIES 443. Introduction to Data Mining

Introduction

Today, to be successful, business organizations need rapid and easy access to information about their customers, internal finances, and external market conditions—information collectively known as Business Intelligence (BI). Modern business environment requires a timely and accurate insight into up-to-date business conditions. Without relevant information extracted from data decisions may be more risky and may not be the best one for the faced challenges.
This course provides basic introduction into key areas such as OLAP (that stands for On Line Analytical Processing) Design, Data Warehousing (DW), and Data Mining (DM) emphasizing DM to ensure that students of the course will gain the necessary background and skills to turn available data into valuable and useful information.
The course consists of lectures, tutorials (demonstrations) followed by laboratory works, and seminars where students are required to demonstrate their group work.

Objectives of the Course

  • Provide basic introduction into key areas such as
    • OLAP (that stands for On Line Analytical Processing) Design,
    • Data Warehousing (DW), and
    • Data Mining (DM)
  • Provide an overview of most common tasks and application areas of DM
    • Prediction and knowledge discovery
  • Provide an overview of most common techniques used in DM
    • Building and evaluating predictive and descriptive models
  • Ensure that students of the course will gain the necessary background and skills
    • to turn available data into valuable and useful information

Overview of the Course

  • Lectures 15*2 = 30 hours
    • Wed 8:15 – 10:00, Thu 12:15 – 14:00, Fri 10:15 – 12:00
    • all in Ag Beeta
    • Nov 17, 12.00 – 14.00 Sami’s public examination of PhD (after the lecture)
  • Tutorial followed by an assignment 5*2 = 10 hours
    • Tue 14.15 – 16.00, Ag B212.2 (Mountains),
    • but week 50: Wed 8:15 – 10:00
  • Seminar 2 hours
    • Ag B212.2 (Mountains)
    • 5-10 min presentation by each student about the final Assignment
  • Final assignment (no final exam)
    • to be sent to mpechen{AT}cs{dot}jyu{dot}fi and samiayr{AT}mit{dot}jyu{dot}fi by the end of Jan’07 (always use TIES443 keyword in the subject field)

Instructors

Mykola PechenizkiySami Äyrämö
E-mail: mpechen{AT}cs{dot}jyu{dot}fi
office: AgC 414.3
Tel.: 014 260 4907

Home Page
E-mail: sami.ayramo{AT}mit{dot}jyu{dot}fi
office: AgC 416.2
Tel.: 014 260 2533

Home Page

Lectures & Handouts

Date and Time Lecture title Instructor Handouts
Wed, 1.11.06, 8:15-10:00 Introduction to the course Mykola Pechenizkiy Handouts
Thu, 2.11.06, 12:15-14:00 Business Intelligence Mykola Pechenizkiy Handouts
Fri, 3.11.06, 10:15-12:00 Data Warehousing Mykola Pechenizkiy Handouts
Wed, 8.11.06, 08:15-10:00 Data Mining: basic  concepts Sami Äyrämö Handouts
Thu, 9.11.06, 12:15-14:00 Data Mining process Sami Äyrämö Handouts
Fri, 10.11.06, 10:15-12:00 Data preprocessing Sami Äyrämö Han & Kamber's book: Chapt.3: "Data Preparation"
Tan et al. book: Topic 2: Data
Wed, 15.11.06 Clustering. Part I Sami Äyrämö Han & Kamber's book: Chapter 8
Tan et al. book: Present. 8
Thu, 16.11.06 Clustering. Part II Sami Äyrämö Tan et al. book: Presentation 9
Fri, 17.11.06 Visualization Mykola Pechenizkiy Handouts
Wed, 22.11.06 Classification. Part I Mykola Pechenizkiy Handouts
Thu, 23.11.06 Classification. Part II Mykola Pechenizkiy request by e-mail
Fri, 24.11.06 Evaluation Mykola Pechenizkiy Piatetsky-Shapiro & Parker course:
Modules DM10-DM11
Wed, 29.11.06 Association rules Mykola Pechenizkiy Tan et al. book: Present. 6-7
Piatetsky-Shapiro & Parker course:
Beginning of module DM9
Thu, 30.11.06 Advanced and Miscellaneous issues Mykola Pechenizkiy Handouts
Fri, 1.12.06 Closing lecture Sami Äyrämö Handouts

Tutorials & Assignments

Date and Time Title of Tutorial Instructor Handouts Assignment
Tue, 7.11.06, 14:15-16:00 Prototyping DM techniques with WEKA and YALE open-source software Mykola Pechenizkiy Handouts Assignment
Tue, 14.11.06, 14:15-16:00 Prototyping DM techniques with Matlab environment Sami Äyrämö Handouts Iris dataset
Tue, 21.11.06, 14:15-16:00 Mining time-series data Mykola Pechenizkiy request by e-mail Assignment
Tue, 28.11.06, 14:15-16:00 Mining image data Sami Äyrämö Handouts Assignment
Wed, 13.12.06, 08:15-10:00 Mining textual data Miika Nurminen Slides and reading Assignment

Seminar & Final Assignment

Start to Prepare for the Final Assignment
Seminar will be on Tuesday, Nov 28, right after tutorial (16.15-18.00)

    Send me e-mail if you know what you would like to do as a final assignment
    If you do not have any ides, then:
  1. Decide on application domain that interests you
  2. Two types of assignments: (1) mining real dataset, and (2) comparison of techniques
  3. Search for a dataset (e.g. from KDD Cups) if you decide to analyze data
  4. Search for a collection of datasets if you decide to compare DM techniques
  5. Define the problem: (1) goal, (2) input, (3) constraints or background
  6. knowledge if any, (4) expected outcome, (5) how do you plan to apply DM (if 1st type) or compare different techniques (if 2nd type) as detailed as possible
  7. 5-10 min presentation
  8. Mykola and Sami will give you feedback
  9. You will try to do what you promise during the seminar, report you work
  10. and send it by e-mail to Mykola and Sami
There will be additional summary information placed to the course webpage after the seminar if needed.

Literature

Introduction to the field

  1. Witten I., Frank E. 2000. Data Mining: Practical machine learning tools with Java implementations", Morgan Kaufmann, San Francisco. (book & software page)
  2. Crawford D. 1996 Special Issue on Data Mining. Communications of the ACM, Volume 39, Number 11, November, 1996
  3. Reinartz, T. 1999, Focusing Solutions for Data Mining. LNAI 1623, Berlin Heidelberg.
  4. Han J. and Kamber M. 2000, Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann Publishers, 550 pages. ISBN 1-55860-489-8 (ppt slides to the book)
  5. Data Mining: A Practitioner’s Approach, ELCA Informatique SA, 2001
  6. CRISP-DM 1.0: Step-by-step data mining guide, SPSS Inc.

Advanced reading

News
All materials for Tutorial 5 (Text Mining) are available online!

Handouts for Lecture 10 (Classification) are available online!

Handouts for Lecture 9 (Vizualization) are available online!

The task for the third assignmnent is available online!

The next lection will be on Thursday, November, 23 at 12:15.

The next Tutorial will be on Tuesday, November 28 at 14:15 in Ag B212.2 (Mountains)!
© 2006 Mykola Pechenizkiy