1. Content-Based Filtering Recommender System#
Content-based filtering builds a profile of each item (e.g. a movie’s genre, director, runtime) and a profile of each user based on the items they have liked.
It then recommends items whose profiles are **most similar to the user’s profile, typically using cosine similarity.
Content-based filtering builds a profile of each item (e.g., genre, keywords, director) and a profile of each user based on the items they have liked.
It then recommends items whose profiles are most similar to the user’s profile, typically using cosine similarity.
The key idea:#
“Show me more of what I’ve already shown I like.”
Recommend items similar to what the user liked before.
|
|
Figure: Two visual representations of Content-Based Filtering. The system builds a user profile from previously liked items and recommends new items with similar features. (Sources: shaped.ai, spotintelligence.com)
Algorithm:
Each item has features (e.g., genre, keywords).
Build a user profile; a weighted average of the feature vectors of movies they’ve rated (based on past preferences)
For each unseen movie, compute the cosine similarity between its feature vector and the user profile
Recommend the unseen movie with the highest similarity (Recommend items with similar features)
User likes Action, Sci-Fi movies. Then recommend more Action + Sci-Fi movies.
Example (User Profile)
Suppose features = [Action, Romance, Sci-Fi]
If a user liked:
Movie A → [1, 0, 1]
Movie B → [1, 0, 0]
Then the user profile can be computed as the average: [1,0,0.5]
This means the user:
strongly prefers Action
somewhat likes Sci-Fi
does not prefer Romance
The system then compares this user profile with new items and recommends those with the highest similarity (typically using cosine similarity).
Key summary: Content-based Filtering recommends items whose features are most similar to the user profile, where the user profile is a vector that summarizes the types of items the user has liked in the past.
Important Clarification: In content-based filtering, we do NOT compare users with other users. We compare user profile (preferences) with item features.
Advantages and Disadvantages:#
The strength is that it needs only that user’s history; no data from others. But the weakness is a “filter bubble”: users only ever see more of what they already like.
Advantages:
Uses only the user’s own history (no need for other users)
Provides personalized recommendations
Works well with smaller datasets
Disadvantages:
Limited diversity (recommends similar items repeatedly)
Filter bubble problem (hard to discover new interests)
Depends on quality of item featur
Cosine Similarity: quick refresher#
Returns a value between -1 (opposite) and 1 (identical). We want values close to 1.

