Example input
[ programming language or framework ]: Python (Scikit-Learn, Pandas)
[ specify recommendation method, e.g., collaborative filtering, content-based, or hybrid approach ]: collaborative filtering
[ specify user data like purchase history, browsing history, or ratings ]: purchase history
Example output
Here’s a Python program that implements an e-commerce product recommendation system using collaborative filtering. This code leverages Scikit-Learn for machine learning and Pandas for data handling. It uses user purchase history to generate personalized recommendations.
The approach here is item-based collaborative filtering, a commonly used method in recommendation systems where items are recommended based on their similarity to what the user has previously interacted with.
```python
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler
import numpy as np
# Sample data structure
# Let's assume we have a dataset of user purchases with the following structure
# user_id | product_id | purchase_count
# ------- | ---------- | --------------
# 1 | 101 | 2
# 1 | 102 | 1
# 2 | 101 | 1
# 2 | 103 | 2
class ProductRecommendationSystem:
def __init__(self, data):
"""
Initializes the recommendation system with purchase data.
:param data: A DataFrame containing user purchase history with columns:
'user_id', 'product_id', and 'purchase_count'.
"""
self.data = data
self.user_product_matrix = None
self.product_similarity = None
def preprocess_data(self):
"""
Preprocesses data to create a user-product matrix. Each row represents
a user and each column a product, with values being the purchase count.
"""
try:
# Create the user-product matrix
self.user_product_matrix = self.data.pivot_table(index='user_id',
columns='product_id',
values='purchase_count',
fill_value=0)
print("User-Product matrix created successfully.")
except Exception as e:
print(f"Error in data preprocessing: {e}")
def compute_product_similarity(self):
"""
Calculates item-item (product-product) similarity matrix using cosine similarity.
This will be used for item-based collaborative filtering.
"""
try:
# Normalize the data before computing similarity to avoid bias
scaler = StandardScaler()
user_product_matrix_scaled = scaler.fit_transform(self.user_product_matrix)
# Compute cosine similarity between products
self.product_similarity = cosine_similarity(user_product_matrix_scaled.T)
print("Product similarity matrix computed successfully.")
except Exception as e:
print(f"Error in computing product similarity: {e}")
def recommend_products(self, user_id, top_n=5):
"""
Recommends products for a given user based on collaborative filtering.
:param user_id: The ID of the user for whom to generate recommendations.
:param top_n: The number of top recommendations to return.
:return: A list of recommended product IDs.
"""
if self.user_product_matrix is None or self.product_similarity is None:
print("Data not processed. Please run preprocess_data() and compute_product_similarity() first.")
return []
if user_id not in self.user_product_matrix.index:
print("User not found in data.")
return []
try:
# Get the user's purchase history
user_purchase_vector = self.user_product_matrix.loc[user_id]
user_purchase_history = user_purchase_vector[user_purchase_vector > 0].index
# Sum up similarity scores for all items based on user's purchase history
similarity_scores = np.zeros(self.product_similarity.shape[0])
for product_id in user_purchase_history:
product_idx = self.user_product_matrix.columns.get_loc(product_id)
similarity_scores += self.product_similarity[product_idx]
# Remove already purchased products from the recommendations
for product_id in user_purchase_history:
product_idx = self.user_product_matrix.columns.get_loc(product_id)
similarity_scores[product_idx] = 0
# Get top N product indices
recommended_product_indices = similarity_scores.argsort()[-top_n:][::-1]
recommended_product_ids = [self.user_product_matrix.columns[i] for i in recommended_product_indices]
# Print recommendation results
if not recommended_product_ids:
print("No recommendations available for this user.")
else:
print(f"Top {top_n} recommended products for user {user_id}: {recommended_product_ids}")
return recommended_product_ids
except Exception as e:
print(f"Error in generating recommendations: {e}")
return []
# Sample Usage
# Assuming we have a purchase history DataFrame with columns: 'user_id', 'product_id', 'purchase_count'
data = pd.DataFrame({
'user_id': [1, 1, 2, 2, 3, 3, 4],
'product_id': [101, 102, 101, 103, 104, 105, 106],
'purchase_count': [2, 1, 1, 2, 3, 1, 5]
})
# Initialize and set up the recommendation system
recommender = ProductRecommendationSystem(data)
recommender.preprocess_data()
recommender.compute_product_similarity()
# Generate recommendations for a specific user
user_id = 1
recommendations = recommender.recommend_products(user_id, top_n=3)
```
### Explanation of Key Sections
1. **Data Preprocessing**:
- The `preprocess_data()` function creates a **user-product matrix** from purchase history data. This matrix has users as rows and products as columns, with values representing purchase counts.
2. **Computing Similarity**:
- `compute_product_similarity()` calculates a **cosine similarity matrix** between products. Normalizing data is essential before calculating similarity to prevent bias from users with higher purchase volumes.
3. **Generating Recommendations**:
- `recommend_products(user_id, top_n=5)` generates recommendations by summing similarity scores of all products based on a user's past purchases and filtering out products the user has already bought.
- Only the top `n` recommendations are returned based on sorted similarity scores.
4. **Error Handling**:
- The code includes error handling for data preprocessing and recommendation generation. It checks if data or user IDs are missing and provides feedback if recommendations aren’t possible.
Each key part of the recommendation process is documented, and error handling is added to ensure a robust system.