Files
spotify_vibe/DESIGN.md
2026-02-26 20:25:20 +00:00

366 lines
9.2 KiB
Markdown

# Design / Architecture
## Goal
The service generates a Spotify "daily vibe" playlist based on:
- the user's recent listening
- a local cache of liked tracks
- the history of tracks previously recommended by the bot
The main user interface is a Telegram bot (`/generate`, `/connect`, `/status`, etc.), with an optional nightly cron trigger.
## High-level overview
Core components:
- `FastAPI` application
- health check
- Spotify OAuth start/callback
- internal endpoint for cron (`/internal/jobs/nightly`)
- `TelegramBotRunner` (polling)
- handles user commands
- starts generation and sends status updates
- `PlaylistJobService`
- orchestrates a single run (token -> sync likes -> candidates -> playlist -> persist)
- `RecommendationEngine`
- builds seed profile
- collects candidate pool
- ranks and selects tracks
- `SpotifyClient` / `LastFmClient`
- external API calls
- `SQLite` (via async SQLAlchemy)
- users, liked cache, recommendation history, run log
## Runtime / lifecycle
Entry point: `app/main.py`.
On startup:
1. Load `Settings` (`app/config.py`)
2. Create async SQLAlchemy engine and session factory (`app/db/session.py`)
3. Run `create_all` (auto-create tables)
4. Create shared `httpx.AsyncClient`
5. Create API clients:
- `SpotifyClient`
- `LastFmClient`
6. Create services:
- `SpotifyAuthService`
- `RecommendationEngine`
- `PlaylistJobService`
7. Initialize `TelegramBotRunner` and start polling
8. Store runtime/service objects in `app.state.runtime` and `app.state.services`
On shutdown:
- stop Telegram polling
- close `httpx.AsyncClient`
- dispose DB engine
## Containers / deployment
`docker-compose.yml` defines:
- `app` (main service, FastAPI + Telegram polling)
- `cron` (optional service with `supercronic`)
Important:
- `cron` is under `profiles: ["cron"]` and does not start by default
- the project is now manual-first: users generate playlists via Telegram `/generate`
`cron` runs `scripts/run_nightly.sh`, which calls:
- `POST /internal/jobs/nightly` with `Authorization: Bearer <INTERNAL_JOB_TOKEN>`
## Application layers
### 1. API layer (`app/api/routes.py`)
Responsibilities:
- HTTP endpoints for OAuth and internal jobs
Endpoints:
- `GET /health`
- `GET /auth/spotify/start`
- `GET /auth/spotify/callback`
- `POST /internal/jobs/nightly`
Notes:
- OAuth callback sends a Telegram notification to the user on success
- nightly endpoint is protected by `INTERNAL_JOB_TOKEN`
### 2. Bot layer (`app/bot/telegram_bot.py`)
Responsibilities:
- user-facing interface via Telegram commands and reply-keyboard buttons
Supported commands:
- `/start`
- `/help`
- `/connect`
- `/status`
- `/generate`
- `/latest`
- `/setsize`
- `/setratio`
- `/sync`
- `/lang`
Notes:
- `/generate` calls `PlaylistJobService.generate_for_user(..., force=True, notify=False)`
- `/sync` only refreshes liked tracks cache
- each command uses a short-lived DB session from `session_factory`
- bot UI supports `ru` and `en` (localized text/buttons)
### 3. Service layer
#### `SpotifyAuthService` (`app/services/spotify_auth.py`)
Responsibilities:
- create OAuth state
- exchange `code` for tokens
- refresh access token
- ensure valid access token before Spotify calls
Notes:
- datetime comparison is normalized to UTC (important for SQLite naive datetimes)
- stores scopes and expiry on the `users` row
#### `RecommendationEngine` (`app/services/recommendation.py`)
Responsibilities:
- sync liked tracks into local cache
- build seed profile
- collect candidates from multiple sources
- rank/select final track list
Current candidate sources:
- Spotify recommendations
- Spotify artist top tracks
- Spotify search (seed-artist fallback)
- Last.fm track similar -> Spotify search
- Last.fm artist similar -> Spotify search
Key implementation details:
- respects Spotify recommendations seed limit: max `5` seeds per request
- degrades gracefully when some sources fail
- includes liked fallback (if all candidates are already liked)
#### `PlaylistJobService` (`app/services/playlist_job.py`)
Responsibilities:
- orchestrate an end-to-end playlist generation run
- create Spotify playlist and add tracks
- persist run details and track list
- update recommendation history
- send Telegram notifications (if notifier is configured)
Run sequence:
1. Validate user / Spotify connection
2. Create `playlist_runs` row with `running` status
3. Get valid access token
4. Sync liked tracks
5. Build playlist via `RecommendationEngine`
6. Create playlist in Spotify
7. Add tracks to playlist
8. Persist run tracks / history / metadata
9. Commit and return `JobOutcome`
On error:
- `playlist_runs.status = failed`
- error message is written to `notes`
## Client layer
### `SpotifyClient` (`app/clients/spotify.py`)
Encapsulates Spotify Web API calls.
Important implementation choices:
- `create_playlist()` uses `POST /me/playlists`
- chosen because `POST /users/{id}/playlists` can return `403` in some app/account combinations
- `add_playlist_items()` uses `POST /playlists/{playlist_id}/items`
- `/tracks` may return `403` while `/items` succeeds
- `delete_playlist()` uses `DELETE /playlists/{playlist_id}/followers`
- this is "unfollow" (Spotify does not support hard-delete of playlists)
- built-in retry for `429` rate limiting using `Retry-After`
### `LastFmClient` (`app/clients/lastfm.py`)
Optional enrichment source for similarity.
- can be disabled (empty `LASTFM_API_KEY`)
- Last.fm errors should not fail the whole run if other sources still work
## Persistence layer (SQLite + SQLAlchemy)
### Tables (`app/db/models.py`)
#### `users`
Stores:
- Telegram identity (`telegram_chat_id`, `telegram_username`)
- Spotify identity/tokens/scopes (`spotify_user_id`, access/refresh token, expiry, scopes)
- user settings (`playlist_size`, `min_new_ratio`, timezone)
- last outputs (`last_generated_date`, `latest_playlist_id`, `latest_playlist_url`)
#### `auth_states`
Temporary OAuth state for callback:
- `state`
- `telegram_chat_id`
- `expires_at`
#### `saved_tracks`
Local cache of the user's `Liked Songs`:
- `spotify_track_id`
- track/artist metadata, album, popularity
- `added_at`
#### `recommendation_history`
History of previously recommended tracks:
- `spotify_track_id`
- `first_recommended_at`
- `last_recommended_at`
- `times_recommended`
#### `playlist_runs`
Playlist generation run log:
- status (`running/success/failed`)
- Spotify playlist metadata
- stats (`total/new/reused`)
- `notes`
#### `playlist_run_tracks`
Snapshot of tracks in a specific run:
- track id / name / artists
- source (which source produced the track)
- position
- `is_new_to_bot`
### Repository layer (`app/db/repositories.py`)
Pattern:
- thin repositories over `AsyncSession`
- isolates CRUD/query logic from the service layer
Repositories include:
- `UserRepository`
- `AuthStateRepository`
- `SavedTrackRepository`
- `RecommendationHistoryRepository`
- `PlaylistRunRepository`
## Data flows
### OAuth flow
1. Telegram `/connect`
2. `SpotifyAuthService.create_connect_url()`
3. User opens Spotify auth page
4. `GET /auth/spotify/callback`
5. `SpotifyAuthService.handle_callback()`
6. Tokens and Spotify profile are saved to `users`
7. User receives a Telegram confirmation message
### Manual generation flow (`/generate`)
1. Telegram `/generate`
2. `PlaylistJobService.generate_for_user(..., force=True)`
3. Sync likes + load recent listening + collect candidates
4. Create playlist + add items in Spotify
5. Persist run/history
6. Reply to user in Telegram
### Nightly cron flow (optional)
1. `supercronic` in the `cron` container
2. `scripts/run_nightly.sh`
3. `POST /internal/jobs/nightly`
4. `PlaylistJobService.generate_for_all_connected_users()`
## Concurrency / consistency
- Generation is protected by a single `asyncio.Lock` (`generate_lock`) in `PlaylistJobService`
- prevents overlapping runs and history update races
- Most run operations happen in one DB session
- Errors inside a run mark the run as `failed`
## Recommendation algorithm (summary)
Detailed explanation is in `README.md`, but architecturally the pipeline is:
1. Build seed profile (recent + liked)
2. Collect candidate pool (Spotify + Last.fm + fallback search)
3. Deduplicate
4. Rank (penalties/boosts)
5. Select (min_new_ratio + artist caps)
6. Persist stats/history
## Configuration
Main environment variables (`app/config.py`):
- `TELEGRAM_BOT_TOKEN`
- `SPOTIFY_CLIENT_ID`
- `SPOTIFY_CLIENT_SECRET`
- `SPOTIFY_REDIRECT_URI`
- `SPOTIFY_DEFAULT_MARKET`
- `LASTFM_API_KEY` (optional)
- `INTERNAL_JOB_TOKEN`
- `DB_PATH`
- `DEFAULT_PLAYLIST_SIZE`
- `MIN_NEW_RATIO`
- `RECENT_DAYS_WINDOW`
- `PLAYLIST_VISIBILITY`
## Diagnostics / observability
Current state:
- primary feedback comes from Telegram messages and `playlist_runs.notes`
- HTTP `/health` for liveness
- tests cover critical Spotify routes and parts of the recommendation pipeline
Possible improvements:
- structured logs for source coverage (how many candidates from each source)
- metrics for Spotify/Last.fm errors and latency
- dedicated debug dry-run endpoint (without creating a playlist)
## Known limitations
- SQLite is suitable for small-scale / single-node setups
- Telegram polling + FastAPI run in the same process/container
- per-user timezone support is limited (cron is global)
- external API limitations (Spotify/Last.fm) vary by app/account