304 lines
9.4 KiB
Markdown
304 lines
9.4 KiB
Markdown
# Spotify Daily Vibe Bot (Telegram + Spotify + Docker)
|
|
|
|
Ready-to-run backend service that:
|
|
|
|
- connects to your Spotify account
|
|
- reads your liked tracks (`Liked Songs`)
|
|
- uses your recent listening history
|
|
- generates a Spotify playlist with a similar vibe via `/generate`
|
|
- can optionally run on a schedule via `cron`
|
|
- minimizes repeats and tries to keep `>=80%` of tracks "new" (not liked and not previously recommended by the bot)
|
|
- is controlled via Telegram
|
|
- runs in Docker (`app`, optional `cron`)
|
|
|
|
## What's inside
|
|
|
|
- `FastAPI` backend (OAuth callback + internal job endpoint)
|
|
- `python-telegram-bot` (polling)
|
|
- `SQLite` (recommendation history, liked-track cache, run log)
|
|
- `supercronic` in a separate container for nightly cron trigger (optional)
|
|
|
|
## Important note about the Spotify API
|
|
|
|
Spotify endpoint `/recommendations` may be limited/unavailable for some apps. The service includes fallbacks:
|
|
|
|
- Spotify recommendations (if available)
|
|
- top tracks by artists from your recent listening / liked library
|
|
- Spotify search by seed artists (fallback when recommendations/top-tracks are unavailable)
|
|
- optional Last.fm similarity (very helpful for better "vibe" quality)
|
|
|
|
For better recommendation quality, adding `LASTFM_API_KEY` is recommended.
|
|
|
|
## Quick Start
|
|
|
|
1. Create a Telegram bot via `@BotFather` and get a token.
|
|
2. Create a Spotify App: https://developer.spotify.com/dashboard
|
|
3. Add a Redirect URI in the Spotify App (must match exactly), for example:
|
|
- `https://your-domain.com/auth/spotify/callback`
|
|
- or for local development via tunnel: `https://xxxx.ngrok-free.app/auth/spotify/callback`
|
|
4. Copy `.env.example` to `.env` and fill in the values.
|
|
5. Start:
|
|
|
|
```bash
|
|
docker compose up -d --build
|
|
```
|
|
|
|
By default this starts only `app` (manual mode via Telegram `/generate`).
|
|
|
|
If you want nightly `cron`, start it separately:
|
|
|
|
```bash
|
|
docker compose --profile cron up -d cron
|
|
```
|
|
|
|
6. Open Telegram and message the bot:
|
|
- `/start`
|
|
- `/connect` (get the Spotify auth link)
|
|
- after connecting: `/generate`
|
|
|
|
## `.env` configuration
|
|
|
|
Minimum required fields:
|
|
|
|
- `TELEGRAM_BOT_TOKEN`
|
|
- `SPOTIFY_CLIENT_ID`
|
|
- `SPOTIFY_CLIENT_SECRET`
|
|
- `SPOTIFY_REDIRECT_URI`
|
|
- `INTERNAL_JOB_TOKEN`
|
|
|
|
Recommended:
|
|
|
|
- `LASTFM_API_KEY` (improves similarity quality)
|
|
- `APP_TIMEZONE` / `TZ`
|
|
- `SPOTIFY_DEFAULT_MARKET` (two-letter country code, e.g. `NL`, `DE`, `US`)
|
|
- `CRON_SCHEDULE` (e.g. `15 2 * * *`, only if you enable `cron`)
|
|
|
|
## Telegram commands
|
|
|
|
- `/connect` - connect Spotify
|
|
- `/status` - connection status and latest playlist run
|
|
- `/generate` - generate a playlist now
|
|
- `/latest` - latest playlist link
|
|
- `/setsize 30` - playlist size (5..100)
|
|
- `/setratio 0.8` - target new-track ratio (0.5..1.0)
|
|
- `/sync` - force sync liked tracks
|
|
- `/lang ru|en` - switch bot language
|
|
|
|
## Recommendation Algorithm
|
|
|
|
This is the actual playlist generation pipeline used by the current code.
|
|
|
|
### 1. Input preparation
|
|
|
|
Before generation, the bot:
|
|
|
|
- refreshes Spotify access token if needed
|
|
- syncs liked tracks from `Liked Songs` into the local cache (`saved_tracks`)
|
|
- loads recent listening for the `RECENT_DAYS_WINDOW` period (default `5` days)
|
|
- loads history of previously recommended tracks (`recommendation_history`)
|
|
|
|
### 2. Seed profile construction
|
|
|
|
The bot builds seeds from two sources: recent plays and liked library.
|
|
|
|
- Recent plays:
|
|
- each track gets a recency-weighted score (newer plays matter more)
|
|
- weights are accumulated for both tracks and artists
|
|
- Liked tracks:
|
|
- takes a slice of recent likes (`~120`)
|
|
- adds a random sample from older likes (for exploration/diversity)
|
|
- accumulates artist weights from this pool as well
|
|
|
|
Seed profile output includes:
|
|
|
|
- `seed_track_ids` (up to ~10 tracks)
|
|
- `seed_artists` (up to ~20 artists)
|
|
- `seed_artist_names` (used by Last.fm and Spotify Search fallback)
|
|
- `recent_track_meta` (used for Last.fm track-similar lookups)
|
|
|
|
### 3. Candidate collection (candidate pool)
|
|
|
|
The bot builds a shared candidate pool from multiple sources and deduplicates results.
|
|
|
|
Sources (in order):
|
|
|
|
1. `Spotify recommendations`
|
|
- requested in batches
|
|
- respects Spotify limit: max `5` seeds per request (track + artist combined)
|
|
2. `Spotify artist top tracks`
|
|
- by seed artists
|
|
3. `Spotify search` by seed artists (fallback)
|
|
- used when recommendations / top-tracks are restricted or return too few results
|
|
4. `Last.fm track similar` -> `Spotify search`
|
|
- for recent seed tracks
|
|
5. `Last.fm artist similar` -> `Spotify search`
|
|
- for seed artists
|
|
|
|
If Spotify/Last.fm fails on individual calls, the bot tries to degrade gracefully (use other sources) instead of failing the whole run immediately.
|
|
|
|
### 4. Candidate deduplication
|
|
|
|
Candidates are deduplicated:
|
|
|
|
- by `spotify_track_id`
|
|
- by normalized signature `track_name + artist_names` (to catch duplicates / alternate versions)
|
|
|
|
If the same track is found via multiple sources:
|
|
|
|
- the best score is kept
|
|
- the source field is merged (e.g. `source1+source2`)
|
|
|
|
### 5. Filtering and ranking
|
|
|
|
Base logic:
|
|
|
|
- first, tracks already in your likes (`liked_ids`) are excluded
|
|
- if that leaves an empty pool, a fallback is enabled:
|
|
- already-liked tracks may be used (with a penalty) so the run does not fail with an empty result
|
|
|
|
Additional score adjustments:
|
|
|
|
- penalty for tracks previously recommended by the bot (`history_ids`)
|
|
- penalty for liked tracks (only if liked fallback is active)
|
|
- small boost for collaborations / multiple artists
|
|
- small boost for tracks with multiple source/reason signals
|
|
- popularity scoring slightly favors mid-popularity tracks (not only mainstream and not only obscure tracks)
|
|
|
|
### 6. Final selection
|
|
|
|
After ranking, candidates are split into:
|
|
|
|
- `novel` - not previously recommended and not in likes
|
|
- `reused` - previously recommended or (fallback case) already liked
|
|
|
|
Then the bot:
|
|
|
|
- first tries to satisfy `min_new_ratio`
|
|
- enforces artist caps (limit tracks per artist)
|
|
- relaxes caps if there are not enough new tracks
|
|
- fills the remainder with reused candidates
|
|
|
|
Result includes:
|
|
|
|
- `tracks` - final ordered playlist tracks
|
|
- `new_count` / `reused_count`
|
|
- `notes` - explanation if the target new ratio could not be met
|
|
|
|
### 7. Playlist creation and history persistence
|
|
|
|
After the final track list is selected, the bot:
|
|
|
|
- creates a Spotify playlist
|
|
- adds tracks to it
|
|
- writes the run to `playlist_runs` and `playlist_run_tracks`
|
|
- updates `recommendation_history`
|
|
- stores `latest_playlist_url` for the user
|
|
|
|
## Anti-repeat behavior
|
|
|
|
The bot stores:
|
|
|
|
- all tracks it has recommended before
|
|
- all your liked tracks (cached and refreshed)
|
|
|
|
When building a new playlist:
|
|
|
|
- it first excludes liked tracks (when possible)
|
|
- prioritizes tracks that have not been recommended before
|
|
- fills with history repeats only if there are not enough new tracks
|
|
- may use a liked-track fallback instead of failing the run if all candidates are already liked
|
|
- stores `new / reused` stats in the DB
|
|
|
|
If there are not enough new tracks to satisfy the `80%` target, the run status includes a note explaining that.
|
|
|
|
## Cron (nightly run)
|
|
|
|
`cron` is disabled by default (manual-first mode: run `/generate` manually in Telegram).
|
|
|
|
In `docker-compose.yml`, the `cron` service is under profile `cron`, so it does not start with a normal:
|
|
|
|
```bash
|
|
docker compose up -d --build
|
|
```
|
|
|
|
To enable nightly runs:
|
|
|
|
```bash
|
|
docker compose --profile cron up -d cron
|
|
```
|
|
|
|
`cron` calls the internal endpoint on schedule:
|
|
|
|
- `POST /internal/jobs/nightly`
|
|
|
|
Change time via `.env`:
|
|
|
|
```env
|
|
CRON_SCHEDULE=15 2 * * *
|
|
TZ=Europe/Amsterdam
|
|
```
|
|
|
|
Disable again:
|
|
|
|
```bash
|
|
docker compose stop cron
|
|
```
|
|
|
|
## Data storage
|
|
|
|
- SQLite DB: `./data/app.db`
|
|
|
|
This folder is mounted as a Docker volume, so data persists across container restarts.
|
|
|
|
## Health check / verification
|
|
|
|
- `GET /health` should return `{"ok": true}`
|
|
- after `/generate`, Telegram should send a Spotify playlist link
|
|
|
|
## Typical deployment
|
|
|
|
- VPS + Docker Compose
|
|
- `APP_BASE_URL` = public service URL
|
|
- `SPOTIFY_REDIRECT_URI` = `${APP_BASE_URL}/auth/spotify/callback`
|
|
- Telegram runs via polling (no webhook required)
|
|
- `cron` can remain disabled if you only want manual generation
|
|
|
|
## Architecture
|
|
|
|
Detailed architecture, data flow, and DB table docs are in `DESIGN.md`.
|
|
|
|
## Feature Plans
|
|
|
|
Roadmap items that fit the current architecture well:
|
|
|
|
- Explicit feedback loop:
|
|
- commands like `/ban`, `/unban`, `/prefer`
|
|
- separate blacklist table so "didn't like it" != "just didn't save it"
|
|
- Anti-repeat controls:
|
|
- hard no-repeat window (N days/weeks)
|
|
- separate rules for liked / previously recommended tracks
|
|
- Explainability / debug:
|
|
- why-this-track (source, score, reasons)
|
|
- dry-run endpoint/command without creating a playlist
|
|
- Fine-tuning the algorithm:
|
|
- source weights (Spotify / Last.fm / search fallback)
|
|
- generation modes (explore / familiar / mixed)
|
|
- Better candidate sources:
|
|
- additional music metadata sources
|
|
- smarter genre/artist clustering
|
|
- Personal scheduler:
|
|
- per-user timezone and per-user cron schedule
|
|
- weekday / time selection
|
|
- Observability:
|
|
- structured logs for source coverage and filtering reasons
|
|
- basic metrics for Spotify/Last.fm errors and latency
|
|
- Storage / scaling:
|
|
- migrations (Alembic)
|
|
- Postgres instead of SQLite for multi-user usage
|
|
|
|
## Limitations / future improvements
|
|
|
|
- Per-user timezone support is only partially used today (cron is global, though manual per-user generation is supported)
|
|
- More candidate sources could improve quality (e.g. MusicBrainz/Discogs mapping)
|
|
- Postgres would be better than SQLite for higher multi-user load
|