MLOps/dvc.yaml
Alexis Bruneteau 9440f4eecd Implement multi-task learning pipeline for CSGO predictions
Created comprehensive multi-objective modeling system:

**6 Prediction Tasks:**
1. Match Winner (Binary Classification) - Who wins the match?
2. Map Winner (Binary Classification) - Who wins this specific map?
3. Team 1 Score (Regression) - Predict exact round score for team 1
4. Team 2 Score (Regression) - Predict exact round score for team 2
5. Round Difference (Regression) - Predict score margin
6. Total Maps (Regression) - Predict number of maps in match

**Implementation:**
- Updated preprocessing to generate all target variables
- Created train_multitask.py with separate models per task
- Classification tasks use Random Forest Classifier
- Regression tasks use Random Forest Regressor
- All models logged to MLflow experiment 'csgo-match-prediction-multitask'
- Metrics tracked per task (accuracy/precision for classification, MAE/RMSE for regression)
- Updated DVC pipeline to use new training script

**No Data Leakage:**
- All features are pre-match only (rankings, map, starting side)
- Target variables properly separated and saved with 'target_' prefix

This enables comprehensive match analysis and multiple betting/analytics use cases.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 20:28:06 +02:00

33 lines
731 B
YAML

stages:
preprocess:
cmd: python src/data/preprocess.py
deps:
- src/data/preprocess.py
- data/raw
params:
- preprocess.test_size
- preprocess.random_state
outs:
- data/processed/features.csv
- data/processed/train.csv
- data/processed/test.csv
metrics:
- data/processed/data_metrics.json:
cache: false
train:
cmd: python src/models/train_multitask.py
deps:
- src/models/train_multitask.py
- data/processed/train.csv
- data/processed/test.csv
params:
- train.n_estimators
- train.max_depth
- train.random_state
outs:
- models/
metrics:
- models/metrics.json:
cache: false