2. Collaborative Filtering Recommender System#

Collaborative filtering recommends items based on patterns of user behavior, such as ratings, clicks, purchases, or views.

Unlike content-based filtering, it does not use item features (e.g., genre, keywords). Instead, it relies on the idea:

Users who behaved similarly in the past may like similar items in the future.

How is Collaborative Filtering Different from Content-Based Filtering?#

Method

Main Idea

Uses Item Features?

Uses Other Users?

Content-Based Filtering

Recommend items similar to what the user liked before

Yes

No

Collaborative Filtering

Recommend items based on similar users/items

No

Yes

Content-based filtering uses item features while on the other hand Collaborative filtering uses user–item interaction patterns (matrix).

2.1 User-Based Collaborative Filtering (User–User CF)#

User-based collaborative filtering asks:

“Who else has similar taste as me?”

It finds the k most similar users (neighbors) and recommends items those users liked that the target user has not seen.

“Find people who think like you, then recommend what they loved.”

Content-Based Filtering Diagram

Figure: Illustration of a User-Based Collaborative Filtering recommender system. Source: Medium.

Key Observation:: We do not use item features (like genre). We only use rating patterns.

Similarity Measures#

Similarity between users is computed using:

Pearson Correlation#

\[ \text{Pearson}(A,B)=\frac{\sum_{i}(A_i-\bar{A})(B_i-\bar{B})}{\sqrt{\sum_{i}(A_i-\bar{A})^2}\;\sqrt{\sum_{i}(B_i-\bar{B})^2}} \]

Intuition#

  • Measures linear relationship

  • Removes rating bias (mean-centering)

  • Focuses on relative preferences


Why Pearson over Cosine?#

Pearson handles user rating bias. Example:

  • Alice rates everything 4–5

  • Bob rates everything 1–2

  • But both like the same movies

Pearson → identifies them as similar
Cosine → may treat them as different

Example:

Cosine Similarity: Cosine compares the raw vectors (no mean adjustment).

  • Alice = [5, 5, 4]

  • Bob = [2, 2, 1]

Even though the pattern is similar, the magnitudes are different, so cosine may not fully capture their similarity.

Pearson Correlation: Pearson first subtracts the mean:

  • Alice mean = 4.67 → [0.33, 0.33, -0.67]

  • Bob mean = 1.67 → [0.33, 0.33, -0.67]

Now they are identical after normalization

Pearson → similarity ≈ 1 (very similar)

So we can say:

  • Cosine similarity compares absolute values

  • Pearson correlation compares relative preferences (after removing bias)

Use cosine similarity → item features / embeddings
Use Pearson correlation → user ratings


Algorithm#

  1. Construct a user–item rating matrix

  2. Compute similarity between users (on commonly rated items)

  3. For a target user, find k nearest neighbors

  4. Predict ratings using a weighted average of neighbors’ ratings

  5. Recommend items with the highest predicted ratings

Example#

User

Movie A

Movie B

Movie C

User 1

5

4

?

User 2

5

4

5

User 3

1

2

1

User 1 ≈ User 2 → similar preferences

Recommend Movie C to User 1


Important Clarification: In user-based collaborative filtering, we compare:

  • one user’s rating pattern
    with

  • other users’ rating patterns

We do not use item features.


Advantages#

The intuition is powerful; it enables serendipitous discovery beyond your existing taste profile.

  • No need for item features

  • Can recommend new and diverse items

  • Captures collective intelligence of users

But it’s expensive to compute, since comparing every user pair scales as O(N²).

Disadvantages#

  • Requires sufficient user interaction data

  • Cold start problem (new users/items)

  • Computationally expensive (can scale as O(N²))

  • Similarity unreliable with sparse ratings

Key Insight: User-based collaborative filtering works best when we have:

  • Many users

  • Sufficient rating history

It answers: “What did similar users like that this user hasn’t seen yet?”

2.2 Item-Based Collaborative Filtering (Item–Item CF)#

“Find movies that are always rated the same way by the same people, then use that.”

Item-based collaborative filtering flips the perspective: instead of finding similar users, it finds similar items based on user interaction patterns.

At recommendation time, it looks at what a user has already rated, finds items most similar to those, and recommends them.

The Key Idea:

“Which items are similar to the ones this user liked?”
“If you liked this item, you may also like similar items.”

Item Based Filtering Diagram

Figure: Illustration of a Item-Based Collaborative Filtering recommender system. Source: Medium.

This is what Amazon uses (“customers who bought X also bought Y”).

Key Observation: We do not use item features; instead, we compute item similarity using user behavior (ratings, clicks, purchases) via cosine similarity or Pearson correlation.

Intuition: Two items are similar if many users interacted with both items in a similar way

Algorithm#

  1. Construct a user–item rating matrix

  2. Compute item–item similarity (compare columns using cosine or Pearson similarity)

  3. For a target user and an unseen item, find items most similar to that item that the user has already rated

  4. Predict the rating using a weighted average of the user’s ratings on similar items

  5. Recommend items with the highest predicted scores

Predict how much a user will like an item based on how similar it is to items they already liked.

Example#

User

Movie A

Movie B

Movie C

User 1

5

4

?

User 2

5

4

5

User 3

1

2

1

Movie A and Movie B have similar rating patterns across users

  • If a user liked Movie A, recommend Movie B


Important Clarification: In item-based collaborative filtering, we compare:

  • item rating patterns (columns)
    with

  • other item rating patterns

We are not comparing users directly.#

Advantages#

  • More scalable than user-based CF

  • Item similarities are more stable over time

  • Can be precomputed for faster recommendations

  • Works well in real-world systems (e.g., e-commerce)

Disadvantages#

  • Requires sufficient interaction data

  • Cold start problem for new items

Cold start = no data → no recommendation

  • Cannot recommend items with no prior interactions


Key advantage over User-Based: Item similarity is stable (Inception vs Interstellar don’t change relationship over time), so we can pre-compute it once and look it up cheaply. This is why Amazon and Netflix prefer item-based CF.


Key Insight: Item-based collaborative filtering works best when:

  • Items are relatively stable

  • There is sufficient user interaction data

It answers: “What items are similar to the ones this user already liked?”


Cold Start Problem in Collaborative filtering#

The cold start problem occurs when a recommender system has insufficient data to make reliable recommendations.

Types of Cold Start#

  1. New User Problem

    • A new user joins the system

    • No history (ratings, clicks, purchases)
      System doesn’t know their preferences

  2. New Item Problem

    • A new item is added

    • No user interactions yet
      System cannot recommend it

  3. New System Problem

    • Entire system is new

    • Very little interaction data overall

Why it happens#

Collaborative filtering relies on:

  • user–item interactions

  • similarity between users/items

Without data → no similarity can be computed

Example: A new user signs up on Netflix and has not watched anything yet. System cannot recommend movies effectively

Some Common Solutions:

  • Ask users for initial preferences (onboarding questionnaire)

  • Use content-based filtering initially

  • Recommend popular/trending items

  • Use hybrid systems