ABSTRACT

In any particular data mining problem, the most important task is to define patterns operationally. The model (definition of patterns) should ultimately lead to useful understandable patterns. Designing a good model and evaluating what patterns are generated from a particular model is difficult.

In this dissertation, we examine closely the problem of mining sequential patterns. Sequential pattern mining finds interesting patterns in sequences of sets. Since it has been proposed in 1995, mining sequential patterns in large databases has become an important data mining task with broad applications.

Sequential pattern mining is commonly defined as finding the complete set of frequent subsequences. However, such conventional model may meet inherent difficulties in mining databases with long sequences and noises. They may generate a huge number of short trivial patterns but fail to find the underlying interesting patterns.

Motivated by these observations, we examine an entirely different model for analyzing sequential data. In this dissertation,