29 Commits

Author SHA1 Message Date
Alexis Bruneteau
ff71d052e6 Track individual model files instead of single multitask model
Some checks failed
MLOps CI/CD Pipeline / test (push) Failing after 5m3s
MLOps CI/CD Pipeline / train (push) Has been skipped
MLOps CI/CD Pipeline / deploy (push) Has been skipped
The training script creates separate model files for each task
(match_winner, map_winner, score_team1, score_team2, round_diff, total_maps)
so DVC needs to track each file individually.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:55:28 +02:00
Alexis Bruneteau
9520395ee9 Fix DVC output path overlap in train stage
Changed from tracking entire models/ directory to specific model file
to resolve conflict with models/metrics.json metric tracking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:51:16 +02:00
Alexis Bruneteau
9440f4eecd Implement multi-task learning pipeline for CSGO predictions
Created comprehensive multi-objective modeling system:

**6 Prediction Tasks:**
1. Match Winner (Binary Classification) - Who wins the match?
2. Map Winner (Binary Classification) - Who wins this specific map?
3. Team 1 Score (Regression) - Predict exact round score for team 1
4. Team 2 Score (Regression) - Predict exact round score for team 2
5. Round Difference (Regression) - Predict score margin
6. Total Maps (Regression) - Predict number of maps in match

**Implementation:**
- Updated preprocessing to generate all target variables
- Created train_multitask.py with separate models per task
- Classification tasks use Random Forest Classifier
- Regression tasks use Random Forest Regressor
- All models logged to MLflow experiment 'csgo-match-prediction-multitask'
- Metrics tracked per task (accuracy/precision for classification, MAE/RMSE for regression)
- Updated DVC pipeline to use new training script

**No Data Leakage:**
- All features are pre-match only (rankings, map, starting side)
- Target variables properly separated and saved with 'target_' prefix

This enables comprehensive match analysis and multiple betting/analytics use cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:28:06 +02:00
Alexis Bruneteau
a28a363dd9 Add comprehensive pre-match features for better predictions
Enhanced feature engineering with legitimate pre-match information:

New features:
- Map one-hot encoding (Dust2, Mirage, Inferno, etc.)
- rank_sum: Combined team strength indicator
- rank_ratio: Relative team strength
- team1_is_favorite: Whether team 1 has better ranking
- both_top_tier: Both teams in top 10
- underdog_matchup: Large ranking difference (>50)

All features are known before match starts - no data leakage.
Expected to improve model performance while maintaining integrity.

Current feature count: ~20 (4 base + 3 rank + ~10 maps + 3 indicators)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:24:07 +02:00
Alexis Bruneteau
6995102d76 Remove map_wins features - they contain match outcome data
The map_wins_1 and map_wins_2 columns represent maps won DURING
the current match, not historical performance. This is data leakage
as these values are only known during/after the match.

Now using only truly pre-match features:
- rank_1, rank_2: Team rankings before match
- starting_ct: Which team starts CT side
- rank_diff: Derived ranking difference

This should finally give realistic model performance based solely
on information available before the match begins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:17:07 +02:00
Alexis Bruneteau
efaf5ff0e1 Fix critical data leakage in feature engineering
Removed features that contain match outcome information:
- result_1, result_2 (actual match scores - only known after match)
- ct_1, t_2, t_1, ct_2 (rounds won per side - only known after match)
- total_rounds, round_diff (derived from results)

These features caused perfect 1.0 accuracy because the model was
essentially "cheating" by knowing the match outcome.

Now using only pre-match information:
- Team rankings (rank_1, rank_2)
- Historical map performance (map_wins_1, map_wins_2)
- Starting side (starting_ct)
- Derived: rank_diff, map_wins_diff

This will give realistic model performance based on what would
actually be known before a match starts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:01:46 +02:00
Alexis Bruneteau
cb7b80ca6a Fix MLflow model logging warnings
Added input_example parameter to auto-infer model signature and
explicitly set artifact_path parameter to remove deprecation warnings.

This improves MLflow tracking by:
- Auto-generating model signature from training data
- Using correct parameter names for MLflow 3.x
- Enabling better model serving and inference validation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:01:05 +02:00
Alexis Bruneteau
22db96b3eb Simplify MLflow auth to use native env var support
Reverted to simpler approach - MLflow natively supports
MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD environment
variables for HTTP Basic Auth.

Removed the manual URI construction since it's not needed.
The workflow already sets these env vars correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 19:57:15 +02:00
Alexis Bruneteau
a4ddfb57be Use HTTP Basic Auth for MLflow authentication
Changed MLflow authentication to use HTTP Basic Auth by embedding
credentials in the tracking URI (https://user:pass@host).

This is the standard authentication method for MLflow when using
basic auth, rather than relying on environment variables alone.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 19:53:02 +02:00
Alexis Bruneteau
bc5d96981a Fix MLflow authentication in training script
Added explicit environment variable configuration for MLflow credentials.
The credentials are now properly passed through from CI/CD environment
to the MLflow client.

Changes:
- Check for MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD env vars
- Explicitly set them in os.environ for MLflow to use
- Added connection success message for debugging

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 19:47:22 +02:00
Alexis Bruneteau
8dc524af22 Fix Poetry cache path for proper dependency caching
Changed cache configuration:
- Moved Install Poetry step before cache setup
- Updated cache path to ~/.cache/pypoetry/virtualenvs (actual venv location)
- Removed **/poetry.lock wildcard in favor of direct poetry.lock reference
- This ensures the virtualenv itself is cached, not just metadata

This should significantly speed up CI/CD runs by reusing installed packages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 18:53:40 +02:00
Alexis Bruneteau
c9dbe70bdb Fix DVC pull to only fetch raw data
Changed dvc pull to specifically pull data/raw.dvc instead of all
outputs. The processed data and model files are generated by the
DVC pipeline (dvc repro), not pulled from remote storage.

This prevents errors about missing processed files that haven't
been generated yet.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 18:52:16 +02:00
Alexis Bruneteau
662d1a3b8f Configure DVC credentials explicitly in CI/CD pipeline
DVC needs credentials to be configured via 'dvc remote modify' command
rather than just environment variables. This fixes 403 Forbidden errors
when accessing MinIO/S3 storage.

Changes:
- Added dvc remote modify commands to set access_key_id and secret_access_key
- Applied to both pull and push operations in test and train jobs
- Added .dvc/config.local to .gitignore to prevent credential leaks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 18:45:29 +02:00
Alexis Bruneteau
3cb1b23669 Add DVC S3 credentials to CI/CD pipeline
Configure DVC to use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
from Gitea secrets (DVC_ID and DVC_PASSWORD) for MinIO/S3 access.

Changes:
- Added DVC credentials to all DVC operations (pull/push)
- Changed poetry install to use --no-root flag for faster installs
- Credentials applied to both test and train jobs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 17:52:21 +02:00
Alexis Bruneteau
d61fad678c Add dependency caching to CI/CD pipeline
Added actions/cache@v3 to cache Poetry and pip dependencies across
workflow runs. This significantly speeds up CI/CD by avoiding
full reinstallation when poetry.lock hasn't changed.

Cache strategy:
- Cache key based on OS and poetry.lock hash
- Caches ~/.cache/pypoetry and ~/.cache/pip
- Falls back to OS-specific cache if exact match not found

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 17:45:18 +02:00
Alexis Bruneteau
bb8b08500b Add dvc-s3 dependency for S3/MinIO storage support
Added dvc-s3>=3.2.0 to dependencies to enable DVC to work with
S3-compatible storage backends like MinIO.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 17:44:24 +02:00
Alexis Bruneteau
e7883b8dab Update poetry.lock with PyYAML dependency
Regenerate lock file to include pyyaml>=6.0.0 added to dependencies.
This resolves the poetry.lock sync issue with pyproject.toml.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 17:39:14 +02:00
Alexis Bruneteau
af9b700a5b secrets and mlflow should now work 2025-10-01 17:35:13 +02:00
Alexis Bruneteau
f107164b51 maybe maybe not 2025-10-01 15:04:13 +02:00
paul.roost
cce9eb29a0 Refactor CI/CD pipeline to install dependencies before setting up DVC 2025-09-30 17:12:21 +02:00
paul.roost
4df499be5c train fix 2025-09-30 17:04:43 +02:00
Alexis Bruneteau
abad691246 setup dvc 2025-09-30 17:03:15 +02:00
paul.roost
65b5b6c151 test 2025-09-30 16:38:14 +02:00
paul.roost
652f58cdb1 Add Prometheus client dependency and update README with project details 2025-09-30 16:23:29 +02:00
paul.roost
ca9c3bfce3 Add CI/CD pipeline, monitoring, and model training components for CS:GO MLOps platform 2025-09-30 16:14:56 +02:00
paul.roost
4cc5705b97 Initialize DVC 2025-09-30 15:48:38 +02:00
paul.roost
a7c884462e Initial project structure 2025-09-30 15:44:35 +02:00
Alexis Bruneteau
92032f67a4 tag 2025-09-30 15:25:32 +02:00
Alexis Bruneteau
ee9fe1bca2 init 2025-09-23 18:29:32 +02:00