Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Apriori Algorithm Explained.

Frequent pattern mining is an important subcategory under the subject of Data Mining. Apriori Algorithm is one of the basic and famous approach used in mining frequent patterns. This article will explain how this algorithm works using a simple example.

First we need to clarify two terms that used in this field, Support and Confidence. We will use the following example  (Table 1) to clarify things easily. This table contains a list of items bought from a supermarket. A,B,C… code names are used to identify items and Transaction ID correspond to a one transaction done by a customer.

Transaction ID Items Bought
T100 M,O,N,K,E,Y
T200 D,O,N,K,E,Y
T300 M,A,K,E
T400 M,U,C,K,Y
T500 C,O,O,K,I,E

Table 1

Support

Support is the probability of having expected k-item set in a transaction. This is usually given as a percentage, fraction(probability) or as an integer(occurrences) This is denoted by sup(X). As an Example,

sup(M) =3/5  = 0.6 ( 60% or 3)

sup(M,E) = 2/5 = 0.4 (40% or 2 )

Confidence

Confidence is the conditional probability that a transaction having X also contains Y. This is also given as percentage, fraction or integer.

Confidence(X=>Y)  = sup(X ∩ Y)/sup(X)

As an example,

Confidence (K=>Y) = (3/5) /(5/5) =3/5 =0.6 (60% or 3)

Apriori-Pruning Principal

When understanding the method of Apriori Algorithm, it is important to understand Apriori-pruning Principal.

If there is any item set which is infrequent, its superset should not be generated or tested. 

This simply means if A is infrequent we can avoid checking A,B as a frequent item set.

Method

Step 1 :

Scan Database for frequent 1-item sets(k=1 initially). Remove item sets which does not satisfy the Minimum Support requirement.

For the above example case,lets assume minimum support is 60% and minimum confidence is 80%.

1-item set Support
M 60%
O 60%
N 40%
K 100%
E 80%
Y 60%
D 20%
A 20%
U 20%
C 20%
I 20%

Table 2

From this table we remove 1-item sets with less support than required minimum support. After that operation we can have a table like this. (Table 3)

1-item set Support
M 60%
O 60%
K 100%
E 80%
Y 60%

Table 3

Step 2 :

Generate 2-item sets from frequent 1-item sets.

By looking at table 3 we can see that possible combinations for 2-item sets are MO, MK,ME,MY,OK,OE,OY,KE,KY and EY.

Then we need to calculate support for them as well. (see table 4)

2-item set Support
MO 20%
MK 60%
ME 40%
MY 40%
OK 60%
OE 60%
OY 40%
KE 80%
KY 60%
EY 40%

Table 4

Remove the item sets with less support than required minimum support. When we do that we will have the table 5.

2-item set Support
MK 60%
OK 60%
OE 60%
KE 80%
KY 60%

Table 5

Step 3 :

Generate K+1 -item set from k-item set. If a particular k+1-item set contains a k-item subset which does not belongs to previous table with k-item sets which satisfy minimum support remove them. (According to Apriori-prune Principal). Then check for enough support as usual and remove item sets with not enough support. We will continue this process till no frequent candidate set is generated.

All possible 3 item sets for the above example are KMO, EKM, KMY and  EOK. KMO cannot be in the list since MO or KO are not in the 2-item list. Same thing applies to EKM and KMY. EKO is left in the table since all possible 2-item sets(KE,KO and EO) are in the list.

Then we calculate support for the 3-item set we have. (See table 6)

3-item set Support
EKO 60%

Table 6

EKO have the minimum support that requires. So it remains in the table further.

It is clear that we cannot generate 4-item set from this table since we only have one 3-item set. So we terminate from this process.

Thank you for reading this post. I will soon create a new post to show how confidence is affected to this process and how it is used practically.




This post first appeared on Never Stop Coding, please read the originial post: here

Share the post

Apriori Algorithm Explained.

×

Subscribe to Never Stop Coding

Get updates delivered right to your inbox!

Thank you for your subscription

×