heboba/spotify_vibe

Fork 0

Files

heboba e3ae678fea Add uk and nl

2026-02-26 20:40:18 +00:00

9.7 KiB

Raw Blame History

Spotify Daily Vibe Bot (Telegram + Spotify + Docker)

Telegram bot: https://t.me/spotify_vibe_bot (@spotify_vibe_bot)

Ready-to-run backend service that:

connects to your Spotify account
reads your liked tracks (Liked Songs)
uses your recent listening history
generates a Spotify playlist with a similar vibe via /generate
can optionally run on a schedule via cron
minimizes repeats and tries to keep >=80% of tracks "new" (not liked and not previously recommended by the bot)
is controlled via Telegram
runs in Docker (app, optional cron)

What's inside

FastAPI backend (OAuth callback + internal job endpoint)
python-telegram-bot (polling)
SQLite (recommendation history, liked-track cache, run log)
supercronic in a separate container for nightly cron trigger (optional)

Important note about the Spotify API

Spotify endpoint /recommendations may be limited/unavailable for some apps. The service includes fallbacks:

Spotify recommendations (if available)
top tracks by artists from your recent listening / liked library
Spotify search by seed artists (fallback when recommendations/top-tracks are unavailable)
optional Last.fm similarity (very helpful for better "vibe" quality)

For better recommendation quality, adding LASTFM_API_KEY is recommended.

Quick Start

Create a Telegram bot via @BotFather and get a token.
Create a Spotify App: https://developer.spotify.com/dashboard
Add a Redirect URI in the Spotify App (must match exactly), for example:
- https://your-domain.com/auth/spotify/callback
- or for local development via tunnel: https://xxxx.ngrok-free.app/auth/spotify/callback
Copy .env.example to .env and fill in the values.
Start:

If you are using the provided docker-compose.yml as-is, create the external Docker network once (used by Traefik labels/network wiring):

docker network create web || true

Then start:

docker compose up -d --build

By default this starts only app (manual mode via Telegram /generate).

If you want nightly cron, start it separately:

docker compose --profile cron up -d cron

Open Telegram and message the bot:
- /start
- /connect (get the Spotify auth link)
- after connecting: /generate

`.env` configuration

Minimum required fields:

TELEGRAM_BOT_TOKEN
SPOTIFY_CLIENT_ID
SPOTIFY_CLIENT_SECRET
SPOTIFY_REDIRECT_URI
INTERNAL_JOB_TOKEN

Recommended:

LASTFM_API_KEY (improves similarity quality)
APP_TIMEZONE / TZ
SPOTIFY_DEFAULT_MARKET (two-letter country code, e.g. NL, DE, US)
CRON_SCHEDULE (e.g. 15 2 * * *, only if you enable cron)

Telegram commands

/connect - connect Spotify
/status - connection status and latest playlist run
/generate - generate a playlist now
/latest - latest playlist link
/setsize 30 - playlist size (5..100)
/setratio 0.8 - target new-track ratio (0.5..1.0)
/sync - force sync liked tracks
/lang ru|en|uk|nl - switch bot language

Recommendation Algorithm

This is the actual playlist generation pipeline used by the current code.

1. Input preparation

Before generation, the bot:

refreshes Spotify access token if needed
syncs liked tracks from Liked Songs into the local cache (saved_tracks)
loads recent listening for the RECENT_DAYS_WINDOW period (default 5 days)
loads history of previously recommended tracks (recommendation_history)

2. Seed profile construction

The bot builds seeds from two sources: recent plays and liked library.

Recent plays:
- each track gets a recency-weighted score (newer plays matter more)
- weights are accumulated for both tracks and artists
Liked tracks:
- takes a slice of recent likes (~120)
- adds a random sample from older likes (for exploration/diversity)
- accumulates artist weights from this pool as well

Seed profile output includes:

seed_track_ids (up to ~10 tracks)
seed_artists (up to ~20 artists)
seed_artist_names (used by Last.fm and Spotify Search fallback)
recent_track_meta (used for Last.fm track-similar lookups)

3. Candidate collection (candidate pool)

The bot builds a shared candidate pool from multiple sources and deduplicates results.

Sources (in order):

Spotify recommendations
- requested in batches
- respects Spotify limit: max 5 seeds per request (track + artist combined)
Spotify artist top tracks
- by seed artists
Spotify search by seed artists (fallback)
- used when recommendations / top-tracks are restricted or return too few results
Last.fm track similar -> Spotify search
- for recent seed tracks
Last.fm artist similar -> Spotify search
- for seed artists

If Spotify/Last.fm fails on individual calls, the bot tries to degrade gracefully (use other sources) instead of failing the whole run immediately.

4. Candidate deduplication

Candidates are deduplicated:

by spotify_track_id
by normalized signature track_name + artist_names (to catch duplicates / alternate versions)

If the same track is found via multiple sources:

the best score is kept
the source field is merged (e.g. source1+source2)

5. Filtering and ranking

Base logic:

first, tracks already in your likes (liked_ids) are excluded
if that leaves an empty pool, a fallback is enabled:
- already-liked tracks may be used (with a penalty) so the run does not fail with an empty result

Additional score adjustments:

penalty for tracks previously recommended by the bot (history_ids)
penalty for liked tracks (only if liked fallback is active)
small boost for collaborations / multiple artists
small boost for tracks with multiple source/reason signals
popularity scoring slightly favors mid-popularity tracks (not only mainstream and not only obscure tracks)

6. Final selection

After ranking, candidates are split into:

novel - not previously recommended and not in likes
reused - previously recommended or (fallback case) already liked

Then the bot:

first tries to satisfy min_new_ratio
enforces artist caps (limit tracks per artist)
relaxes caps if there are not enough new tracks
fills the remainder with reused candidates

Result includes:

tracks - final ordered playlist tracks
new_count / reused_count
notes - explanation if the target new ratio could not be met

7. Playlist creation and history persistence

After the final track list is selected, the bot:

creates a Spotify playlist
adds tracks to it
writes the run to playlist_runs and playlist_run_tracks
updates recommendation_history
stores latest_playlist_url for the user

Anti-repeat behavior

The bot stores:

all tracks it has recommended before
all your liked tracks (cached and refreshed)

When building a new playlist:

it first excludes liked tracks (when possible)
prioritizes tracks that have not been recommended before
fills with history repeats only if there are not enough new tracks
may use a liked-track fallback instead of failing the run if all candidates are already liked
stores new / reused stats in the DB

If there are not enough new tracks to satisfy the 80% target, the run status includes a note explaining that.

Cron (nightly run)

cron is disabled by default (manual-first mode: run /generate manually in Telegram).

In docker-compose.yml, the cron service is under profile cron, so it does not start with a normal:

docker compose up -d --build

To enable nightly runs:

docker compose --profile cron up -d cron

cron calls the internal endpoint on schedule:

POST /internal/jobs/nightly

Change time via .env:

CRON_SCHEDULE=15 2 * * *
TZ=Europe/Amsterdam

Disable again:

docker compose stop cron

Data storage

SQLite DB: ./data/app.db

This folder is mounted as a Docker volume, so data persists across container restarts.

Health check / verification

GET /health should return {"ok": true}
after /generate, Telegram should send a Spotify playlist link

Typical deployment

VPS + Docker Compose
APP_BASE_URL = public service URL
SPOTIFY_REDIRECT_URI = ${APP_BASE_URL}/auth/spotify/callback
Telegram runs via polling (no webhook required)
cron can remain disabled if you only want manual generation

Architecture

Detailed architecture, data flow, and DB table docs are in DESIGN.md.

Feature Plans

Roadmap items that fit the current architecture well:

Explicit feedback loop:
- commands like /ban, /unban, /prefer
- separate blacklist table so "didn't like it" != "just didn't save it"
Anti-repeat controls:
- hard no-repeat window (N days/weeks)
- separate rules for liked / previously recommended tracks
Explainability / debug:
- why-this-track (source, score, reasons)
- dry-run endpoint/command without creating a playlist
Fine-tuning the algorithm:
- source weights (Spotify / Last.fm / search fallback)
- generation modes (explore / familiar / mixed)
Better candidate sources:
- additional music metadata sources
- smarter genre/artist clustering
Personal scheduler:
- per-user timezone and per-user cron schedule
- weekday / time selection
Observability:
- structured logs for source coverage and filtering reasons
- basic metrics for Spotify/Last.fm errors and latency
Storage / scaling:
- migrations (Alembic)
- Postgres instead of SQLite for multi-user usage

Limitations / future improvements

Per-user timezone support is only partially used today (cron is global, though manual per-user generation is supported)
More candidate sources could improve quality (e.g. MusicBrainz/Discogs mapping)
Postgres would be better than SQLite for higher multi-user load

9.7 KiB Raw Blame History