ApproxMAP : Sequence of sets

Exploratory Data Analysis of {A,B,D} {B} {B,C}


Does your data look like {A,B,D} {B} {B,C} ?
Would you like to see what your data looks like?

Some examples
life path analysis {Graduation, Job1, Marriage} {Job1} {Job1, Birth}
welfare service analysis {Foster Care, Medicaid, Foodstamp} {Medicaid} {Medicaid, TANF}
market basket analysis {Bread, Milk, Jam} {Milk} {Milk, Banana}

Papers

Related works

Larger Example

Table 1: Basic Terms - sequence of sets

Items I {A B C D} Let items be a set of literals I={i1 i2 i3 ... il}.
Itemsets s13 {B,C} Then an itemset is a set of items (unordered list).
Sequence seq1 {A,B,D} {B} {B,C} And a sequence is an ordered list of itemsets.

Sequential pattern mining is an important data mining task with broad applications. We propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. We present an efficient and effective algorithm, ApproxMAP (APPROXimate Multiple Alignment Pattern mining), to find maximal approximate sequential patterns from large databases. Our goal is to assist the data analyst in exploratory data analysis through organization and data reduction. The method works in three steps.

  1. First, similar sequences are grouped together using kNN clustering.
  2. Then we organize and compress sequences within each cluster into weighted sequences using multiple alignment.
  3. In the last step, the longest consensus pattern best fitting each cluster are generated from the weighted sequences .

Our extensive experimental results on synthetic and real data show that ApproxMAP is very robust to noise and does well in mapping the high dimensional noisy data into approximate sequential patterns.


An example : Given below are 10 sequences lexically sorted.


Glossary


@ Return to my homepage