Example: Recommender Systems — Complete Walkthrough

Example: Recommender Systems — Complete Walkthrough#

Content-Based · User-Based CF · Item-Based CF#

What you’ll build: Three recommender systems from scratch, with every calculation shown as a printed table you can trace step by step.

Dataset: 5 users (Alice, Bob, Carol, Dave, Eve) rate 5 movies on a scale of 1–5.
A ? means they haven’t seen it yet; that’s what we want to predict.

	Inception	Interstellar	The Notebook	Alien	Titanic
Alice	5	4	2	?	?
Bob	5	4	1	5	2
Carol	1	2	5	2	5
Dave	4	5	2	4	1
Eve	2	1	5	3	4

1. Content Based Recommender Systems#

Step 1: The Rating Matrix (Alice’s History)

User	Inception (Action, Sci-fi)	Interstellar (Sci-fi)	The Notebook (Romance)	Alien (Action, Sci-fi)	Titanic (Romance)
Alice	5	4	2	?	?
Bob	3	5	1	4	2
Carol	1	2	5	1	4

Notice that, Alice rated Inception (5) and Interstellar (4) highly (both are Sci-fi genre). She rated The Notebook (2) low, the Romance genre. We need to predict her ratings for Alien and Titanic.

Step 2: Build Alice’s Genre Preference Profile (User Profile)

Average rating per genre across movies she has seen:

Genre	Movies seen	Avg rating
Sci-fi	Inception (5), Interstellar (4)	(5 + 4) / 2 = 4.5
Action	Inception (5)	5 / 1 = 5.0
Romance	The Notebook (2)	2 / 1 = 2.0

Detailed Calculation:

Sci-fi: Alice watched Inception and Interstellar= (5 + 4) / 2 = 4.5
Action: Alice watched Inception = 5 / 1 = 5.0
Romance: Alice watched The Notebook = 2 / 1 = 2.0

Alice’s profile: Likes Sci-fi (4.5) and Action (5.0) Dislikes Romance (2.0)

Step 3: Score Unseen Movies

General Formula: Predicted score for a movie:

\[ \text{Score(movie)} = \frac{\sum (\text{user's average rating for each genre in the movie})}{\text{number of genres in the movie}} \]

Movie	Genres	Predicted score	Calculation	Recommend?
Alien	Action, Sci-fi	4.75	(5.0 + 4.5) / 2	Yes
Titanic	Romance	2.0	(2.0)	No

Step-by-Step Calculation:

Alien (the movie has 2 genre: Action + Sci-fi) = Score(Alien)=(5.0 + 4.5) / 2 = 4.75
Titanic (the movie has 1 genre: Romance) = Score(Titanic)= 2.0/ 1 =2.0

Final Recommendation: Recommend “Alien” because its genres closely match Alice’s demonstrated preferences.

2. User-Based Collaborative Filtering Recommender Systems#

Step 1: User-Item Rating Matrix where (? = unseen)

User/Item (Movie)	Inception	Interstellar	The Notebook	Alien	Titanic
Alice	5	4	2	?	?
Bob	5	4	1	5	2
Carol	1	2	5	2	5
Dave	4	5	2	4	1
Eve	2	1	5	3	4

We use the 3 movies that Alice and other users have both rated (Inception, Interstellar, and The Notebook) to compute similarity. Then we use ratings from the most similar users to predict Alice’s missing ratings.

Step 2: Compute Similarity Between Alice and Each User

We compute similarity using only the movies that both Alice and other users have rated (overlapping / co-rated items): Inception, Interstellar, The Notebook.

Similarity is computed using cosine similarity or Pearson correlation
Then, we select the top-k most similar users to predict Alice’s missing ratings.

Alice’s rating vector:

\[ Alice = [5, 4, 2] \]

Option A: Cosine Similarity#

\[ \text{similarity}(A,B) = \frac{A \cdot B}{||A|| \times ||B||} \]

Example: Alice vs Bob#

\[ \frac{(5×5 + 4×4 + 2×1)}{\sqrt{5^2+4^2+2^2} \cdot \sqrt{5^2+4^2+1^2}} = \frac{43}{\sqrt{45} \cdot \sqrt{42}} \approx 0.989 \]

Cosine Similarity Results:#

User	Their ratings (Inception / Interstellar / Notebook)	Alice’s ratings	Similarity
Bob	5 / 4 / 1	5 / 4 / 2	0.99 (very similar)
Dave	4 / 5 / 2	5 / 4 / 2	0.98 (similar)
Carol	1 / 2 / 5	5 / 4 / 2	0.63
Eve	2 / 1 / 5	5 / 4 / 2	0.65

Bob and Dave are Alice’s nearest neighbors.

Option B: Pearson Correlation#

\[ \text{Pearson}(A,B) = \frac{\sum (A_i - \bar{A})(B_i - \bar{B})} {\sqrt{\sum (A_i - \bar{A})^2} \sqrt{\sum (B_i - \bar{B})^2}} \]

Alice’s mean:

\[ \bar{A} = \frac{5 + 4 + 2}{3} = 3.67 \]

Example: Alice vs Bob#

Bob’s mean:

\[ \bar{B} = \frac{5 + 4 + 1}{3} = 3.33 \]

Centered vectors:

\[ Alice = [1.33, 0.33, -1.67] \]

\[ Bob = [1.67, 0.67, -2.33] \]

\[ \text{Pearson}(Alice, Bob) \approx 0.98 \]

Pearson Similarity Results#

User	Ratings (I / IS / N)	Similarity
Bob	5 / 4 / 1	0.99 (very similar)
Dave	4 / 5 / 2	0.79 (similar)
Carol	1 / 2 / 5	-1.00 (very different)
Eve	2 / 1 / 5	-0.84 (very different)

Key Insight:

Cosine similarity → measures angle (raw rating patterns)
Pearson correlation → measures similarity after removing user bias (mean-centered)

For recommendation systems, Pearson is often preferred because it handles users with different rating scales.

Step 3: Choose Top-k Neighbors

We compute similarity using the movies Alice and the other users have both rated. Then we use the top-k most similar users to predict Alice’s missing ratings.

Since we use:

\[ k = 2 \]

we select the top 2 most similar users:

\[ \text{Nearest neighbors} = \{Bob, Dave\} \]

Neighbor	Similarity
Bob	0.99
Dave	0.98

Step 4: Predict Alice’s Missing Ratings

We use the weighted average formula:

\[ \text{Predicted rating} = \frac{\sum(\text{similarity} \times \text{neighbor rating})}{\sum(\text{similarity})} \]

Unseen Movie	Similar Users Used	Prediction Formula	Score
Alien	Bob (sim 0.99, rated 5) Dave (sim 0.98, rated 4)	(0.99×5 + 0.98×4) / (0.99 + 0.98)	4.50
Titanic	Bob (sim 0.99, rated 2) Dave (sim 0.98, rated 1)	(0.99×2 + 0.98×1) / (0.99 + 0.98)	1.50

Step-by-Step Calculation#

Alien: Predict Alice’s Rating for Alien#

Bob rated Alien = 5
Dave rated Alien = 4

\[ \text{Predicted rating for Alien} = \frac{(0.99 \times 5) + (0.98 \times 4)}{0.99 + 0.98} = \frac{4.95 + 3.92}{1.97} = 4.50 \]

Titanic: Predict Alice’s Rating for Titanic#

Bob rated Titanic = 2
Dave rated Titanic = 1

\[ \text{Predicted rating for Titanic} = \frac{(0.99 \times 2) + (0.98 \times 1)}{0.99 + 0.98} = \frac{1.98 + 0.98}{1.97} = 1.50 \]

Final Recommendation#

Movie	Predicted Rating	Recommend?
Alien	4.50	Yes
Titanic	1.50	No

Recommendation: Recommend Alien (predicted 4.50), because Alice’s nearest neighbors, Bob and Dave, both rated Alien highly.

Remember:In user-based collaborative filtering, we rely entirely on similar users’ behavior; we do not use genre or content information.

3. Item-Based Collaborative Filtering Recommender Systems#

Item-based CF flips the question: instead of asking “who has similar taste to Alice?”, it asks “which movies tend to get rated similarly by the same people?”

Idea: Instead of finding similar users, we find similar items (movies).

We predict Alice’s ratings by finding movies similar to those she already rated, and using her past ratings to estimate new ones.

If two movies are rated similarly by many users → they are similar
We predict a user’s rating based on movies they already liked

Step 1: Represent Movies as Vectors

Instead of comparing users, we compare how movies are rated across users.

Movie(Item)/User	Alice	Bob	Carol	Dave	Eve
Inception	5	5	1	4	2
Interstellar	4	4	2	5	1
The Notebook	2	1	5	2	5
Alien	?	5	2	4	3
Titanic	?	2	5	1	4

Here, we have adjusted the the matrix: rows are now movies, columns are users. Why? Instead of comparing users, we compare how movies were rated across all users.

Here, as you can see, each movie is represented using ratings from all users:

Inception = [5, 5, 1, 4, 2]
Interstellar = [4, 4, 2, 5, 1]
The Notebook = [2, 1, 5, 2, 5]
Alien = [?, 5, 2, 4, 3] (Here, Alice has not rated Alien) → ignore Alice → [5, 2, 4, 3]
Titanic = [?, 2, 5, 1, 4] → ignore Alice → [2, 5, 1, 4]

We ignore Alice’s missing values (?) when computing similarity.

Why Do We Ignore “?” (Missing Ratings) above to represent vector?#

When computing similarity between two movies, we must use only users who have rated both movies.

Missing values (“?”) mean the user has not seen or rated the movie, so we do not know their preference. Including them would introduce incorrect or undefined values in the calculation.

Key Idea: Similarity is computed only on co-rated users (users who rated both items). You can’t compare two movies based on a user who hasn’t watched one of them.

Example#

We want to compute similarity between Inception and Alien.

Original vectors:

Inception = [5, 5, 1, 4, 2]
Alien = [?, 5, 2, 4, 3]

Here, Alice has not rated Alien. Problem:

“?” is unknown → cannot multiply or compute distance
Leads to invalid similarity

So, Remove Alice and use only:

Inception = [5, 1, 4, 2]
Alien = [5, 2, 4, 3]

(users: Bob, Carol, Dave, Eve)

Now both vectors have:

Same length
Only known values
Valid comparison

Summary#

Ignore missing values (“?”)
Use only overlapping users
Ensures fair and correct similarity computation

Step 2: Compute Item-Item Similarity

So, we compute similarity between movies using only co-rated users (ignore “?”).

From Step 1, we have found: Movie Vectors (Example: Inception vs Alien)

Using users: Bob, Carol, Dave, Eve

Inception = [5, 1, 4, 2]
Alien = [5, 2, 4, 3]

So, we compare each movie’s rating vector using only users who rated both movies (Bob, Carol, Dave, Eve).

Similarity Table#

Movie Pair	Raters	Cosine Similarity	Pearson Similarity	Interpretation
Inception vs Alien	Bob (5,5), Carol (1,2), Dave (4,4), Eve (2,3)	0.983	0.99	Very similar audiences
Interstellar vs Alien	Bob (4,5), Carol (2,2), Dave (5,4), Eve (1,3)	0.943	0.71	Similar audiences
Inception vs Titanic	Bob (5,2), Carol (1,5), Dave (4,1), Eve (2,4)	0.587	-0.90	Opposite preferences by Pearson
The Notebook vs Titanic	Bob (1,2), Carol (5,5), Dave (2,1), Eve (5,4)	0.974	0.89	Very similar audiences

Step-by-Step Example Calculation (Inception vs Alien)#

Option A: Cosine Similarity#

\[ \text{Cosine}(i,j) = \frac{i \cdot j}{||i|| \times ||j||} \]

Calculation

\[ \frac{(5\times5 + 1\times2 + 4\times4 + 2\times3)} {\sqrt{5^2+1^2+4^2+2^2} \cdot \sqrt{5^2+2^2+4^2+3^2}} \]

\[ = \frac{25 + 2 + 16 + 6}{\sqrt{46} \cdot \sqrt{54}} = \frac{49}{\sqrt{46} \cdot \sqrt{54}} \approx 0.98 \]

Option B: Pearson Correlation#

\[ \text{Pearson}(i,j) = \frac{\sum (i_k - \bar{i})(j_k - \bar{j})} {\sqrt{\sum (i_k - \bar{i})^2} \cdot \sqrt{\sum (j_k - \bar{j})^2}} \]

Step 1: Compute Means

\[ \bar{i} = \frac{5 + 1 + 4 + 2}{4} = 3 \]

\[ \bar{j} = \frac{5 + 2 + 4 + 3}{4} = 3.5 \]

Step 2: Centered Vectors

\[ i - \bar{i} = [2, -2, 1, -1] \]

\[ j - \bar{j} = [1.5, -1.5, 0.5, -0.5] \]

Step 3: Compute Pearson

\[ \frac{(2\times1.5) + (-2\times-1.5) + (1\times0.5) + (-1\times-0.5)} {\sqrt{(2^2+(-2)^2+1^2+(-1)^2)} \cdot \sqrt{(1.5^2+(-1.5)^2+0.5^2+(-0.5)^2)}} \]

\[ = \frac{3 + 3 + 0.5 + 0.5}{\sqrt{10} \cdot \sqrt{5}} = \frac{7}{\sqrt{10} \cdot \sqrt{5}} \approx 0.99 \]

Method Comparison#

Method	Similarity
Cosine	0.98
Pearson	0.99

Key Insight#

Cosine similarity → compares raw rating patterns
Pearson correlation → compares rating patterns after removing bias (mean-centered)

Pearson is often better when users have different rating scales.

Step 3: Predict Alice’s Ratings (Item-Based):

We use movies Alice has already rated as anchors:

Inception (5)
Interstellar (4)
The Notebook (2)

Top-N Selection (i.e N=2)#

For each target movie, we select the top-2 most similar movies among the movies Alice has already rated.

Here, we use:

\[ N = 2 \]

Using cosine similarity, the selected neighbors are:

For Alien → Inception (0.98), Interstellar (0.94)
For Titanic → The Notebook (0.97), Inception (0.59)

(Neighbors are selected using cosine similarity for prediction. Pearson correlation can also be used; however, negative similarities should be ignored. An example is provided below.)

General Formula#

\[ \text{Score}(i) = \frac{\sum (\text{sim}(i,j) \times r_{Alice,j})}{\sum \text{sim}(i,j)} \]

Predict Alien#

\[ \frac{(0.98 \times 5) + (0.94 \times 4)}{0.98 + 0.94} = \frac{4.90 + 3.76}{1.92} = \frac{8.66}{1.92} \approx 4.51 \]

Predict Titanic#

\[ \frac{(0.97 \times 2) + (0.59 \times 5)}{0.97 + 0.59} = \frac{1.94 + 2.95}{1.56} = \frac{4.89}{1.56} \approx 3.14 \]

Final Results#

Movie	Predicted Score	Recommend?
Alien	4.51	Yes
Titanic	3.14	Maybe / weaker

(OPTION B) Using Pearson Prediction by Ignoring Negative Similarities#

When using Pearson correlation, negative similarity means opposite preference. So for prediction, we ignore negative similarities and use only positive similarities.

Predict Alien using Pearson#

Positive similarities:

sim(Alien, Inception) = 0.99
sim(Alien, Interstellar) = 0.71

\[ \frac{(0.99 \times 5) + (0.71 \times 4)}{0.99 + 0.71} = \frac{4.95 + 2.84}{1.70} = \frac{7.79}{1.70} \approx 4.58 \]

Predict Titanic using Pearson#

Positive similarity:

sim(Titanic, The Notebook) = 0.89
sim(Titanic, Inception) = -0.90 → ignored

\[ \frac{0.89 \times 2}{0.89} = 2.00 \]

Pearson Results After Ignoring Negative Similarities#

Movie	Pearson Score	Recommend?
Alien	4.58	Yes
Titanic	2.00	No

Interpretation#

Using Pearson, Titanic receives a low predicted score because its strongest positive match is The Notebook, which Alice rated low.

Final Recommendation#

Recommendation: Recommend Alien (predicted 4.51) because it is highly similar to movies Alice already rated highly.

Intuition: If Alice liked movies similar to Alien, she will likely like Alien as well.

Remember: Item-based uses movie similarity, not users. Since movies don’t change, their similarity stays consistent. This allows us to compute similarities once and reuse them, making the system efficient.

We use Top-N similar items (N = 2)
Predictions are based on weighted averages of similar movies
Item-based filtering leverages movie similarity, not user similarity

Here’s the core intuition from all three method: All three methods predicted Alice should watch Alien, but for completely different reasons.

Content-based said “Alien is Sci-fi, and Alice likes Sci-fi.”
User-based said “Bob and Dave love Alien, and they think just like Alice.”
Item-based said “Alien gets rated by the same people who rated Inception highly.”

The practical reason why real systems like Amazon and Netflix prefer item-based CF is scalability. With 100 million users, you’d need to compare every user pair (that’s 10 quadrillion comparisons). But movies are fewer, their similarity scores are stable, and you can pre-compute them once overnight and reuse them for every user lookup in milliseconds.

Example: Recommender Systems — Complete Walkthrough

Contents

Example: Recommender Systems — Complete Walkthrough#

Content-Based · User-Based CF · Item-Based CF#

1. Content Based Recommender Systems#

2. User-Based Collaborative Filtering Recommender Systems#

Option A: Cosine Similarity#

Example: Alice vs Bob#

Cosine Similarity Results:#

Option B: Pearson Correlation#

Example: Alice vs Bob#

Pearson Similarity Results#

Step-by-Step Calculation#

Alien: Predict Alice’s Rating for Alien#

Titanic: Predict Alice’s Rating for Titanic#

Final Recommendation#

3. Item-Based Collaborative Filtering Recommender Systems#

Why Do We Ignore “?” (Missing Ratings) above to represent vector?#

Example#

Summary#

Similarity Table#

Step-by-Step Example Calculation (Inception vs Alien)#

Option A: Cosine Similarity#

Option B: Pearson Correlation#

Method Comparison#

Key Insight#

Top-N Selection (i.e N=2)#

General Formula#

Predict Alien#

Predict Titanic#

Final Results#

(OPTION B) Using Pearson Prediction by Ignoring Negative Similarities#

Predict Alien using Pearson#

Predict Titanic using Pearson#

Pearson Results After Ignoring Negative Similarities#

Interpretation#

Final Recommendation#