3.0 KiB

Raw Blame History

Phase 2 Model Package - 20260106_011633

Model Information

Model Type: Decision Tree
Dataset: FINAL_DATASET_WITH_TEXT_BACKUP_20260105_213507.xlsx
Total Samples: 454
Total Features: 92 (Numeric: 46, Text SVD: 50)

Performance Metrics

Validation Set

R² Score: 0.3339
MAE: 2.5525
RMSE: 4.2579

Test Set

R² Score: 0.2893
MAE: 3.5730
RMSE: 5.6899

Files Included

best_model_Decision_Tree_20260106_011633.pkl - Trained model
scaler_20260106_011633.pkl - StandardScaler for numeric features
tfidf_vectorizer_20260106_011633.pkl - TF-IDF vectorizer for text
svd_model_20260106_011633.pkl - SVD dimensionality reduction
feature_names_20260106_011633.pkl - List of all feature names
model_metadata_20260106_011633.pkl - Complete metadata dictionary
phase2_complete_package_20260106_011633.joblib - All-in-one package (recommended for deployment)

How to Load and Use

Option 1: Load Complete Package (Recommended)

import joblib
import pandas as pd

# Load package
package = joblib.load('phase2_complete_package_20260106_011633.joblib')
model = package['model']
scaler = package['scaler']
tfidf = package['tfidf']
svd = package['svd']

# Make prediction
# 1. Process text
text_combined = "your text here"  # Combined task text
tfidf_features = tfidf.transform([text_combined])
text_svd = svd.transform(tfidf_features)

# 2. Prepare numeric features
numeric_features = [...] # Your numeric features array

# 3. Combine and scale
all_features = pd.concat([
    pd.DataFrame(numeric_features, columns=package['feature_names'][:len(numeric_features)]),
    pd.DataFrame(text_svd, columns=package['feature_names'][len(numeric_features):])
], axis=1)
all_features_scaled = scaler.transform(all_features)

# 4. Predict
prediction = model.predict(all_features_scaled)
print(f"Predicted staff count: {prediction[0]:.0f}")

Option 2: Load Individual Files

import pickle

with open('best_model_Decision_Tree_20260106_011633.pkl', 'rb') as f:
    model = pickle.load(f)

with open('scaler_20260106_011633.pkl', 'rb') as f:
    scaler = pickle.load(f)

# ... (same prediction process as above)

Model Configuration

TF-IDF Parameters

max_features: 200
ngram_range: (1, 2)
min_df: 2
max_df: 0.95

SVD Parameters

n_components: 50
explained_variance: 89.66%

Training Parameters

random_state: 42
train_size: 289 (63.7%)
val_size: 62 (13.7%)
test_size: 62 (13.7%)

Phase 1 vs Phase 2 Comparison

Phase 1 (Numeric only): R² = 0.4136 if best_model_phase2 in phase1_results else 'N/A' Phase 2 (With text): R² = 0.3339 Improvement: -0.07973710481765645

Notes

This model includes text features extracted from task descriptions
Text preprocessing: lowercase, remove special chars, combine task columns
Feature engineering: TF-IDF → SVD → StandardScaler
Use the same preprocessing pipeline for new predictions

Generated: 2026-01-06 01:16:33

3.0 KiB Raw Blame History

Phase 2 Model Package - 20260106_011633

Model Information

Performance Metrics

Validation Set

Test Set

Files Included

How to Load and Use

Option 1: Load Complete Package (Recommended)

Option 2: Load Individual Files

Model Configuration

TF-IDF Parameters

SVD Parameters

Training Parameters

Phase 1 vs Phase 2 Comparison

Notes

3.0 KiB

Raw Blame History