5. XGBoost

Machine Learning/Model

5. XGBoost

베짱이28호 2024. 2. 7. 22:45

5. XGBoost

1. XGBoost

XGBoost(eXtreme Gradient Boosting)는 그래디언트 부스팅의 고성능 구현체로, 여러 개의 결정 트리를 순차적으로 학습하여 강력한 예측 모델을 만드는 알고리즘이다.

장점

높은 예측 정확도
병렬 처리를 통한 빠른 수행 속도
과적합에 강한 내성
결측치 자동 처리
다양한 하이퍼파라미터 제공

단점

하이퍼파라미터 튜닝이 복잡함
메모리 사용량이 큼
작은 데이터셋에서 과적합 위험

2. XGBoost의 주요 특징

캐시 최적화: 중간 계산 결과를 캐시에 저장하여 빠른 접근 가능
스파스 데이터 처리: 결측치나 희소 데이터를 효율적으로 처리
정규화: L1, L2 정규화를 통한 과적합 방지
트리 가지치기: 불필요한 분할을 제거하여 모델 단순화

3. 코드 실습

import numpy as np
import pandas as pd
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# 데이터 생성
X, y = make_classification(n_samples=1000, n_features=20, 
                         n_informative=15, n_redundant=5, 
                         random_state=42)

# 학습/테스트 데이터 분할
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2, 
                                                    random_state=42)

# XGBoost 모델 생성
xgb_model = xgb.XGBClassifier(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=5,
    min_child_weight=1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42
)

# 모델 학습
xgb_model.fit(X_train, y_train)

# 특성 중요도 시각화
plt.figure(figsize=(10, 6))
xgb.plot_importance(xgb_model, max_num_features=10)
plt.title('XGBoost Feature Importance')
plt.tight_layout()
plt.show()

# 예측 및 성능 평가
from sklearn.metrics import accuracy_score, classification_report

y_pred = xgb_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"모델 정확도: {accuracy:.4f}")
print("\n분류 보고서:")
print(classification_report(y_test, y_pred))