Anirecs: Anime Recommendation System

Welcome to Anirecs! This is a production-ready, end-to-end anime recommendation engine designed to help users discover their next favorite anime based on content similarity. Built with a focus on scalability, efficiency, and user-friendliness, Anirecs leverages natural language processing (NLP) techniques to analyze anime synopses, genres, and themes. Whether you're a casual viewer or a hardcore otaku, Anirecs makes personalized recommendations effortless.

This project demonstrates a complete machine learning pipeline—from data acquisition to model deployment—making.

Project Overview

Anirecs is an intelligent recommendation system that suggests anime based on textual content analysis. It processes over 37,000 anime entries, combining synopses, genres, and themes into a unified "tags" feature. Using TF-IDF vectorization and cosine similarity, it computes content-based recommendations in real-time.

The system is built for production: the model is exportable via Pickle for easy integration into web apps (e.g., Flask/Django backend). In the accompanying Jupyter Notebook (Anirecs.ipynb), we walk through the entire process step-by-step, from raw data to a live recommender.

Why Anirecs?

Personalized & Accurate: Focuses on content similarity for meaningful suggestions.
Scalable: Handles large datasets efficiently without precomputing full similarity matrices.
End-to-End: Covers data ingestion, processing, modeling, and deployment—perfect for demonstrating full-stack ML skills.

Key Features

Content-Based Recommendations: Suggests anime similar to your favorites based on synopses, genres, and themes.
Real-Time Querying: Fast similarity computation on-the-fly.
Robust Data Handling: Manages missing values, duplicates, and inconsistencies gracefully.
Exportable Model: Pickle-serialized for seamless deployment.
Tested with Popular Anime: Includes examples like "Toradora!" and "Cowboy Bebop" for validation.

Tech Stack

Programming Language: Python 3.8+
Data Processing: Pandas, NumPy
NLP & ML: Scikit-learn (TF-IDF Vectorizer, Cosine Similarity)
Data Sources: Web scraping (e.g., MyAnimeList), Jikan API (MAL's unofficial API)
Visualization/Notebooks: Jupyter Notebook
Deployment: Pickle for model serialization; compatible with Flask, FastAPI, or Streamlit
Version Control: Git

Installation

To get started locally, follow these steps:

Clone the Repository:

git clone https://github.com/Genious07/Recommendation-System
cd anirecs

Set Up a Virtual Environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies:
```
pip install -r requirements.txt
```
(If requirements.txt isn't present, install manually: pip install pandas numpy scikit-learn)
Download Data (if not included):
- Run the data gathering scripts (see Data Gathering) or use pre-processed CSVs from the repo.

You're all set! 🎉 Run the Jupyter Notebook with jupyter notebook Anirecs.ipynb.

Usage

Run the Notebook: Open Anirecs.ipynb and execute cells sequentially to build and test the model.

Get Recommendations:

recommendations = get_recommendations('Toradora!')
print(recommendations)

Output: A list of similar anime titles.

Deploy Locally (e.g., via Flask):
- Export the model: import pickle; pickle.dump((tfidf, vectors, df), open('anirecs_model.pkl', 'wb'))
- Create a simple API endpoint (see Deployment for details).

For production, host on Heroku/AWS with a web framework.

Data Pipeline

The backbone of Anirecs is a robust data pipeline, ensuring high-quality input for the model. We handle everything from raw data collection to feature-ready datasets. This section mirrors the workflow in Anirecs.ipynb.

Data Gathering

Data is sourced from multiple places to create a comprehensive dataset:

Web Scraping: Used BeautifulSoup and Requests to scrape anime details from MyAnimeList (MAL). Focused on pages like top anime lists, extracting fields such as name, synopsis, genres, themes, score, members, favorites, and scored_by.
- Example: Scraped ~15,000 entries from MAL's top anime dataset.
- Ethical Note: Rate-limited requests to avoid server overload; complied with robots.txt.
Jikan API: MAL's unofficial REST API (via jikan.moe) for structured data retrieval.
- Fetched anime by ID or search queries: GET /anime/{id} for details like episodes, studios, and demographics.
- Integrated with scraping: Used API for missing fields (e.g., full synopses) in scraped data.
- Handled pagination and rate limits (3 requests/second).
Additional CSVs: Merged with open datasets like anime-dataset-2023.csv (~25,000 rows) and myanilist.csv (~21,000 rows) for broader coverage.
- Total Raw Data: ~108,000 rows across 6 files.

Code Snippet (from Notebook):

# Example: Loading and inspecting scraped/API data
def load_df(path):
    df = pd.read_csv(path)
    print(f"{path}: {df.shape[0]} rows, columns = {list(df.columns)}")
    return df

files = ['top_anime_dataset.csv', 'myanilist.csv', ...]
dfs = {f: load_df(f) for f in files}

Data Cleaning and Merging

Raw data is messy—duplicates, missing values, inconsistent formats. We standardized it:

Schema Harmonization: Defined target columns (e.g., anime_id, name, synopsis, genres).
Renaming & Selection: Used rename maps to align columns across datasets.
Merging: Concatenated harmonized DataFrames into a unified DF (~108,000 rows).
Deduplication: Dropped duplicates based on name (reduced to ~39,000 unique anime).
Handling Missing Values:
- Filled NaNs in synopsis with empty strings.
- Replaced "Unknown" in genres/themes with empties.
- Median imputation for numerical fields like score, members.
Final Selection: Kept key columns: synopsis, genres, themes, name, score, etc. (Output: TheFinalData.csv ~37,000 rows).

Code Snippet:

# Deduplication and missing value handling
df.drop_duplicates(subset='name', keep='first', inplace=True)
df['synopsis'].fillna('', inplace=True)

Feature Engineering

Transformed raw text into a powerful "tags" feature for modeling:

Combined genres, themes, and synopsis into a single lowercase string.
Removed commas for clean tokenization.

Result: A concise, descriptive feature per anime (e.g., "action adventure fantasy during their decade-long quest...").

Code Snippet:

df['tags'] = (df['genres'].str.replace(',', ' ') + ' ' + 
              df['themes'].str.replace(',', ' ') + ' ' + 
              df['synopsis']).str.lower()

Model Building and Training

The core of Anirecs is a content-based recommender using NLP. No supervised training needed—it's unsupervised similarity matching.

Vectorization

Used TF-IDF to convert tags into numerical vectors.
Limited to top 10,000 features (words) to focus on signal over noise.
Removed English stop words for efficiency.

Code Snippet:

from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(max_features=10000, stop_words='english')
vectors = tfidf.fit_transform(df['tags']).toarray()  # Shape: (37k, 10k)

Similarity Calculation

Computed cosine similarity on-the-fly (efficient for large datasets).
Avoided full matrix storage to save memory (~1.4 GB otherwise).

Recommendation Engine

Function: Takes anime name, returns top-N similar titles.
Logic: Fetch vector, compute similarities, sort, and exclude self.

Code Snippet:

from sklearn.metrics.pairwise import cosine_similarity

def get_recommendations(anime_name, num_recs=10):
    anime_index = df[df['name'] == anime_name].index[0]
    scores = cosine_similarity(vectors[anime_index].reshape(1, -1), vectors)[0]
    similar_indices = np.argsort(scores)[::-1][1:num_recs+1]
    return df['name'].iloc[similar_indices].tolist()

Deployment

Model Export: Serialize with Pickle for production.

import pickle
pickle.dump((tfidf, vectors, df), open('anirecs_model.pkl', 'wb'))

Backend Setup: Load Pickle in a Flask/FastAPI app.

Endpoint: /recommend?anime=Toradora!&num=10

Example Flask Code:

from flask import Flask, request, jsonify
app = Flask(__name__)
tfidf, vectors, df = pickle.load(open('anirecs_model.pkl', 'rb'))

@app.route('/recommend', methods=['GET'])
def recommend():
    anime_name = request.args.get('anime')
    recs = get_recommendations(anime_name)
    return jsonify(recommendations=recs)

Hosting: Deploy on Heroku (free tier) or AWS EC2. Add frontend with React for a full app.
Scalability Tips: Use vector databases like FAISS for faster queries on larger datasets.

Testing and Evaluation

Unit Tests: Verified with popular anime (e.g., "Toradora!" suggests school-life rom-coms).
Metrics: Qualitative (relevance checks); quantitative (cosine scores >0.5 for top recs).
Edge Cases: Handled missing anime, empty tags.

From Notebook: Tested "Toradora!" and "Cowboy Bebop" with accurate results.

Contributing

We'd love your input! Fork the repo, create a branch, and submit a PR. Follow these guidelines:

Use descriptive commit messages.
Add tests for new features.
Update docs for changes.

Issues? Open one with details.

License

MIT License. Feel free to use, modify, and distribute. See LICENSE for details.

Contact

Developer: Satwik (e.g., GitHub | LinkedIn)
Email: [email protected]
Feedback: Star the repo or drop a message.

Acknowledgments

Inspired by MyAnimeList and anime communities.
Thanks to Jikan API creators and open datasets.

Last Updated: August 15, 2025
If this project impresses you, imagine what we could build together!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
app		app
client		client
shared		shared
README.md		README.md
components.json		components.json
package.json		package.json
postcss.config.js		postcss.config.js
requirements.txt		requirements.txt
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Anirecs: Anime Recommendation System

Table of Contents

Project Overview

Key Features

Tech Stack

Installation

Usage

Data Pipeline

Data Gathering

Data Cleaning and Merging

Feature Engineering

Model Building and Training

Vectorization

Similarity Calculation

Recommendation Engine

Deployment

Testing and Evaluation

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Genious07/Recommendation-System

Folders and files

Latest commit

History

Repository files navigation

Anirecs: Anime Recommendation System

Table of Contents

Project Overview

Key Features

Tech Stack

Installation

Usage

Data Pipeline

Data Gathering

Data Cleaning and Merging

Feature Engineering

Model Building and Training

Vectorization

Similarity Calculation

Recommendation Engine

Deployment

Testing and Evaluation

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages