Add comprehensive pre-match features for better predictions
Enhanced feature engineering with legitimate pre-match information: New features: - Map one-hot encoding (Dust2, Mirage, Inferno, etc.) - rank_sum: Combined team strength indicator - rank_ratio: Relative team strength - team1_is_favorite: Whether team 1 has better ranking - both_top_tier: Both teams in top 10 - underdog_matchup: Large ranking difference (>50) All features are known before match starts - no data leakage. Expected to improve model performance while maintaining integrity. Current feature count: ~20 (4 base + 3 rank + ~10 maps + 3 indicators) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
6995102d76
commit
a28a363dd9
@ -22,16 +22,26 @@ def load_raw_data():
|
||||
def engineer_features(df):
|
||||
"""Create features for match prediction"""
|
||||
# Only use features that would be known BEFORE the match starts
|
||||
# Removing ALL match outcome features (data leakage):
|
||||
# - result_1, result_2, ct_1, t_2, t_1, ct_2 (round scores)
|
||||
# - map_wins_1, map_wins_2 (maps won in THIS match, not historical)
|
||||
|
||||
# Base features
|
||||
features = df[[
|
||||
'starting_ct', # Which team starts as CT (known before match)
|
||||
'rank_1', 'rank_2', # Team rankings (known before match)
|
||||
]].copy()
|
||||
|
||||
# Engineered features based on pre-match information
|
||||
# Rank-based features
|
||||
features['rank_diff'] = features['rank_1'] - features['rank_2']
|
||||
features['rank_sum'] = features['rank_1'] + features['rank_2']
|
||||
features['rank_ratio'] = features['rank_1'] / (features['rank_2'] + 1) # +1 to avoid division by zero
|
||||
|
||||
# Map encoding (one-hot encoding for map types)
|
||||
map_dummies = pd.get_dummies(df['_map'], prefix='map')
|
||||
features = pd.concat([features, map_dummies], axis=1)
|
||||
|
||||
# Team strength indicators
|
||||
features['team1_is_favorite'] = (features['rank_1'] < features['rank_2']).astype(int)
|
||||
features['both_top_tier'] = ((features['rank_1'] <= 10) & (features['rank_2'] <= 10)).astype(int)
|
||||
features['underdog_matchup'] = (abs(features['rank_diff']) > 50).astype(int)
|
||||
|
||||
# Target: match_winner (1 or 2) -> convert to 0 or 1
|
||||
target = df['match_winner'] - 1
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user