Association Rules
Association says that if a certain set of conditions are present, then another condition is likely as well.
Confidence is essentially like P(B | A) - the probability.

Monotonicity 
If a set I of items is frequent, then so is every subset of I.
Triples Method
Store hash map of (i, j) -> c
Frequent Itemsets
Counting the number of times a set of items occur together. Assume:
- file size is HUGE. Can't fit entire thing in memory. Read in chunks. We only care about the number of passes taken by algorithm.
- Our algorithm must run in main memory (not from hard disk).
Naive Algorithm - Triangular matrix
Read file once.
Maintain hash table from items in file to integers (ints are a compact storage mechanism).
Triangular matrix. Generates
A Priori
First pass
- Table 1: translates item name -> integers (from 1 to n)
- Table 2: counts for each item name (mapped to integer)
- Create a new numbering from 1 to m where m is number of frequent items.
Second pass
Count all pairs that consist of two frequent items:
- For each basket, examine only frequent items, and generate all pairs. For each pair, add one to count. 
SON
- Partition data into N sets of equal sizes
- Pass 1: Run A Priori on those sets, with support s/N, to identify candidate sets
- Pass 2: Then test candidates sets to see whether they're actually frequent.
Market-Basket Model
Sets of items & baskets (sets of items)
Support
Support for itemset I = # of baskets containing all items in I. 
- given support threshold s, a set of items appearing in  at least s baskets is called a frequent itemset
PCY
Store bitmap of counts in first pass
Optimizations
Toivonen's Algorithm
- Uses slightly lower support threshold than s/N
- Uses negative border 
   Login to remove ads X
Feedback | How-To