The map_wins_1 and map_wins_2 columns represent maps won DURING the current match, not historical performance. This is data leakage as these values are only known during/after the match. Now using only truly pre-match features: - rank_1, rank_2: Team rankings before match - starting_ct: Which team starts CT side - rank_diff: Derived ranking difference This should finally give realistic model performance based solely on information available before the match begins. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
MLOps Project
This is an MLOps project for CSGO data analysis and model training.
Features
- Data pipeline with Apache Airflow
- Model training with PyTorch and scikit-learn
- MLflow for experiment tracking
- DVC for data versioning
- Monitoring with Prometheus
- FastAPI for API serving
Setup
-
Install dependencies:
poetry install -
Run the data pipeline:
airflow dags unpause csgo_data_pipeline
Project Structure
dags/: Airflow DAGssrc/: Source codemodels/: Trained modelsdata/: Data filesnotebooks/: Jupyter notebookstests/: Test filesconfig/: Configuration filesdocker/: Docker fileskubernetes/: Kubernetes manifests
Description
Languages
Python
73.3%
Typst
25.9%
Dockerfile
0.8%