MLOps

Author	SHA1	Message	Date
Alexis Bruneteau	ff71d052e6	Track individual model files instead of single multitask model Some checks failed MLOps CI/CD Pipeline / test (push) Failing after 5m3s Details MLOps CI/CD Pipeline / train (push) Has been skipped Details MLOps CI/CD Pipeline / deploy (push) Has been skipped Details The training script creates separate model files for each task (match_winner, map_winner, score_team1, score_team2, round_diff, total_maps) so DVC needs to track each file individually. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:55:28 +02:00
Alexis Bruneteau	9520395ee9	Fix DVC output path overlap in train stage Changed from tracking entire models/ directory to specific model file to resolve conflict with models/metrics.json metric tracking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:51:16 +02:00
Alexis Bruneteau	9440f4eecd	Implement multi-task learning pipeline for CSGO predictions Created comprehensive multi-objective modeling system: 6 Prediction Tasks: 1. Match Winner (Binary Classification) - Who wins the match? 2. Map Winner (Binary Classification) - Who wins this specific map? 3. Team 1 Score (Regression) - Predict exact round score for team 1 4. Team 2 Score (Regression) - Predict exact round score for team 2 5. Round Difference (Regression) - Predict score margin 6. Total Maps (Regression) - Predict number of maps in match Implementation: - Updated preprocessing to generate all target variables - Created train_multitask.py with separate models per task - Classification tasks use Random Forest Classifier - Regression tasks use Random Forest Regressor - All models logged to MLflow experiment 'csgo-match-prediction-multitask' - Metrics tracked per task (accuracy/precision for classification, MAE/RMSE for regression) - Updated DVC pipeline to use new training script No Data Leakage: - All features are pre-match only (rankings, map, starting side) - Target variables properly separated and saved with 'target_' prefix This enables comprehensive match analysis and multiple betting/analytics use cases. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:28:06 +02:00
Alexis Bruneteau	a28a363dd9	Add comprehensive pre-match features for better predictions Enhanced feature engineering with legitimate pre-match information: New features: - Map one-hot encoding (Dust2, Mirage, Inferno, etc.) - rank_sum: Combined team strength indicator - rank_ratio: Relative team strength - team1_is_favorite: Whether team 1 has better ranking - both_top_tier: Both teams in top 10 - underdog_matchup: Large ranking difference (>50) All features are known before match starts - no data leakage. Expected to improve model performance while maintaining integrity. Current feature count: ~20 (4 base + 3 rank + ~10 maps + 3 indicators) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:24:07 +02:00
Alexis Bruneteau	6995102d76	Remove map_wins features - they contain match outcome data The map_wins_1 and map_wins_2 columns represent maps won DURING the current match, not historical performance. This is data leakage as these values are only known during/after the match. Now using only truly pre-match features: - rank_1, rank_2: Team rankings before match - starting_ct: Which team starts CT side - rank_diff: Derived ranking difference This should finally give realistic model performance based solely on information available before the match begins. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:17:07 +02:00
Alexis Bruneteau	efaf5ff0e1	Fix critical data leakage in feature engineering Removed features that contain match outcome information: - result_1, result_2 (actual match scores - only known after match) - ct_1, t_2, t_1, ct_2 (rounds won per side - only known after match) - total_rounds, round_diff (derived from results) These features caused perfect 1.0 accuracy because the model was essentially "cheating" by knowing the match outcome. Now using only pre-match information: - Team rankings (rank_1, rank_2) - Historical map performance (map_wins_1, map_wins_2) - Starting side (starting_ct) - Derived: rank_diff, map_wins_diff This will give realistic model performance based on what would actually be known before a match starts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:01:46 +02:00
Alexis Bruneteau	cb7b80ca6a	Fix MLflow model logging warnings Added input_example parameter to auto-infer model signature and explicitly set artifact_path parameter to remove deprecation warnings. This improves MLflow tracking by: - Auto-generating model signature from training data - Using correct parameter names for MLflow 3.x - Enabling better model serving and inference validation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 20:01:05 +02:00
Alexis Bruneteau	22db96b3eb	Simplify MLflow auth to use native env var support Reverted to simpler approach - MLflow natively supports MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD environment variables for HTTP Basic Auth. Removed the manual URI construction since it's not needed. The workflow already sets these env vars correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 19:57:15 +02:00
Alexis Bruneteau	a4ddfb57be	Use HTTP Basic Auth for MLflow authentication Changed MLflow authentication to use HTTP Basic Auth by embedding credentials in the tracking URI (https://user:pass@host). This is the standard authentication method for MLflow when using basic auth, rather than relying on environment variables alone. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 19:53:02 +02:00
Alexis Bruneteau	bc5d96981a	Fix MLflow authentication in training script Added explicit environment variable configuration for MLflow credentials. The credentials are now properly passed through from CI/CD environment to the MLflow client. Changes: - Check for MLFLOW_TRACKING_USERNAME and MLFLOW_TRACKING_PASSWORD env vars - Explicitly set them in os.environ for MLflow to use - Added connection success message for debugging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 19:47:22 +02:00
Alexis Bruneteau	8dc524af22	Fix Poetry cache path for proper dependency caching Changed cache configuration: - Moved Install Poetry step before cache setup - Updated cache path to ~/.cache/pypoetry/virtualenvs (actual venv location) - Removed **/poetry.lock wildcard in favor of direct poetry.lock reference - This ensures the virtualenv itself is cached, not just metadata This should significantly speed up CI/CD runs by reusing installed packages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 18:53:40 +02:00
Alexis Bruneteau	c9dbe70bdb	Fix DVC pull to only fetch raw data Changed dvc pull to specifically pull data/raw.dvc instead of all outputs. The processed data and model files are generated by the DVC pipeline (dvc repro), not pulled from remote storage. This prevents errors about missing processed files that haven't been generated yet. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 18:52:16 +02:00
Alexis Bruneteau	662d1a3b8f	Configure DVC credentials explicitly in CI/CD pipeline DVC needs credentials to be configured via 'dvc remote modify' command rather than just environment variables. This fixes 403 Forbidden errors when accessing MinIO/S3 storage. Changes: - Added dvc remote modify commands to set access_key_id and secret_access_key - Applied to both pull and push operations in test and train jobs - Added .dvc/config.local to .gitignore to prevent credential leaks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 18:45:29 +02:00
Alexis Bruneteau	3cb1b23669	Add DVC S3 credentials to CI/CD pipeline Configure DVC to use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from Gitea secrets (DVC_ID and DVC_PASSWORD) for MinIO/S3 access. Changes: - Added DVC credentials to all DVC operations (pull/push) - Changed poetry install to use --no-root flag for faster installs - Credentials applied to both test and train jobs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 17:52:21 +02:00
Alexis Bruneteau	d61fad678c	Add dependency caching to CI/CD pipeline Added actions/cache@v3 to cache Poetry and pip dependencies across workflow runs. This significantly speeds up CI/CD by avoiding full reinstallation when poetry.lock hasn't changed. Cache strategy: - Cache key based on OS and poetry.lock hash - Caches ~/.cache/pypoetry and ~/.cache/pip - Falls back to OS-specific cache if exact match not found 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 17:45:18 +02:00
Alexis Bruneteau	bb8b08500b	Add dvc-s3 dependency for S3/MinIO storage support Added dvc-s3>=3.2.0 to dependencies to enable DVC to work with S3-compatible storage backends like MinIO. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 17:44:24 +02:00
Alexis Bruneteau	e7883b8dab	Update poetry.lock with PyYAML dependency Regenerate lock file to include pyyaml>=6.0.0 added to dependencies. This resolves the poetry.lock sync issue with pyproject.toml. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-01 17:39:14 +02:00
Alexis Bruneteau	af9b700a5b	secrets and mlflow should now work	2025-10-01 17:35:13 +02:00
Alexis Bruneteau	f107164b51	maybe maybe not	2025-10-01 15:04:13 +02:00
paul.roost	cce9eb29a0	Refactor CI/CD pipeline to install dependencies before setting up DVC	2025-09-30 17:12:21 +02:00
paul.roost	4df499be5c	train fix	2025-09-30 17:04:43 +02:00
Alexis Bruneteau	abad691246	setup dvc	2025-09-30 17:03:15 +02:00
paul.roost	65b5b6c151	test	2025-09-30 16:38:14 +02:00
paul.roost	652f58cdb1	Add Prometheus client dependency and update README with project details	2025-09-30 16:23:29 +02:00
paul.roost	ca9c3bfce3	Add CI/CD pipeline, monitoring, and model training components for CS:GO MLOps platform	2025-09-30 16:14:56 +02:00
paul.roost	4cc5705b97	Initialize DVC	2025-09-30 15:48:38 +02:00
paul.roost	a7c884462e	Initial project structure	2025-09-30 15:44:35 +02:00
Alexis Bruneteau	92032f67a4	tag	2025-09-30 15:25:32 +02:00
Alexis Bruneteau	ee9fe1bca2	init	2025-09-23 18:29:32 +02:00

29 Commits