Association Rule Mining

Association Rule Mining#

Imagine you own a small grocery store. Every day, hundreds of customers walk through the aisles, picking up different combinations of products.

Some buy:

Bread and Butter
Milk and Cereal
Chips and Soda
Diapers and Baby Wipes

After months of sales, you start wondering:

“Are there certain products customers frequently buy together?”

This is exactly the type of problem solved by Association Rule Mining.

Association Rule Mining is a data mining technique used to discover relationships between items in large datasets. Instead of manually checking thousands of receipts, we allow algorithms to automatically discover patterns such as:

Customers who buy bread often buy butter.
Customers who buy pasta may also buy tomato sauce.
Customers who buy diapers may also buy baby wipes.

These discovered relationships are called association rules.

What is “Association Rules”?#

Association Rule Mining is a technique used to find interesting relationships or patterns among items in large datasets. It is commonly used in market basket analysis, where we want to understand which items are frequently purchased together. For example:

If many customers who buy bread also buy butter, we may discover the rule: Bread → Butter

This means: If a customer buys bread, they are likely to buy butter.

Retail stores, online shopping websites, streaming platforms, and recommendation systems use these ideas every day.

Understanding Itemsets#

Before finding patterns, we first need to understand something called an itemset.

An itemset: is simply a group of items purchased together.

Examples:

{Bread}
{Bread, Butter}
{Milk, Bread, Eggs}

Now imagine looking through thousands of shopping transactions.

Some combinations appear only once.
Others appear again and again.

The combinations that appear frequently are called frequent itemsets.

Support Count vs Support#

Suppose the itemset: X = {Milk, Bread, Diaper}

appears in 2 transactions out of a total of 5 transactions.

Support Count#

The support count (also called frequency) is the number of transactions that contain the itemset.

\[\sigma(X) = 2\]

This means the itemset {Milk, Bread, Diaper} appears 2 times in the dataset.

Support#

The support of an itemset measures the fraction (or percentage) of transactions that contain the itemset. The formula for support is:

\[s(X) = \frac{\sigma(X)}{|T|}\]

where:

\(\sigma(X)\) = support count of itemset (X)
\(|T|\) = total number of transactions

For this example: s({Milk, Bread, Diaper}) = \(\frac{2}{5}\) = 0.4 =40%

This means 40% of all transactions contain the itemset {Milk, Bread, Diaper}.

Frequent Itemset#

Before discovering association rules, we first need to identify which groups of items appear frequently in the dataset.

An itemset is considered a frequent itemset if its support is greater than or equal to a minimum support threshold.

If the minimum support threshold is 0.3, then {Bread, Butter} is considered a frequent itemset because: 0.4 > 0.3

Now let’s look at a small grocery store example. Suppose the store has the following transactions:

Transaction ID	Items Purchased
T1	Bread, Milk
T2	Bread, Butter
T3	Bread, Milk, Butter
T4	Milk, Butter
T5	Bread, Milk, Butter

As the store owner, you begin counting how often each item appears.

Item	Count
Bread	4
Milk	4
Butter	4

If we decide that an item must appear at least 3 times to be considered important, then all three individual items are frequent itemsets.

Next, we check pairs of items.

Itemset	Count
Bread, Milk	3
Bread, Butter	3
Milk, Butter	3

These pairs are also frequent because each appears at least 3 times.

Finally, we check the larger 3-itemset:

Itemset	Count
Bread, Milk, Butter	2

This combination appears only twice, so it is not considered a frequent itemset if the minimum support count is 3.

The main goal of frequent itemset mining is to identify the combinations of items that occur together often enough to reveal meaningful patterns in the data.