sharkpy.learning
Attributes
Functions
|
Train a machine learning model using the provided data and parameters. |
|
Create and optimize an XGBoost model using Optuna. |
|
Create and optimize a LightGBM model using Optuna. |
Create and optimize a CatBoost model using Optuna. |
Module Contents
- sharkpy.learning.PREDICTION_INTROS = ['🦈 Diving into {project_name}! Time to make some waves! 🌊', '🦈 Sharpening teeth on...[source]
- sharkpy.learning.learn(self, data: str | pandas.DataFrame, project_name: str = 'your data', target: str | None = None, problem_type: str | None = None, model: Any | None = None, model_choice: str | None = None, detailed_stats: bool = False, n_trials: int = 30, verbose: bool = False) Shark[source]
Train a machine learning model using the provided data and parameters.
- Parameters:
self (Shark) – The Shark instance.
data (str or pandas.DataFrame) – The dataset to use for training. Can be a file path (CSV) or a DataFrame.
project_name (str, optional) – Name of the project for tracking and reporting.
target (str, optional) – Name of the column to predict. If None, uses the last column.
problem_type (str, optional) – Type of problem: “regression” or “classification”. If None, tries to infer automatically.
model (sklearn.base.BaseEstimator, optional) – A custom scikit-learn compatible model instance to use. If provided, overrides model_choice.
model_choice (str, optional) –
- String identifier for built-in model selection. Options:
”random_forest”: RandomForestRegressor or RandomForestClassifier
”svm”: SVR or SVC
”ridge”: Ridge Regression (L2 regularization)
”lasso”: Lasso Regression (L1 regularization)
”knn”: K-Nearest Neighbors
”xgboost”: XGBoost with Optuna optimization
”lightgbm”: LightGBM with Optuna optimization
”catboost”: CatBoost with Optuna optimization
None: LinearRegression or LogisticRegression (default)
detailed_stats (bool, optional) – If True, uses statsmodels for detailed statistical analysis
n_trials (int, optional) – Number of optimization trials for boosting models (default: 30)
verbose (bool, optional) – If True, enables verbose logging for Optuna optimization (default: False)
Notes
Encodes categorical features and target automatically for classification.
Performs K-Fold cross-validation and prints mean and std of scores.
Fits the selected model on the entire dataset after cross-validation.
Sets self.model, self.problem_type, self.features, self.target, and self.encoders.
Warning: Avoid loading untrusted CSV files, as they may contain malicious data.
- sharkpy.learning._create_optimized_xgboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) Any[source]
Create and optimize an XGBoost model using Optuna.
- Parameters:
X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)
- Returns:
model – Trained XGBoost model with optimized parameters
- Return type:
Any
- sharkpy.learning._create_optimized_lightgbm(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) Any[source]
Create and optimize a LightGBM model using Optuna.
- Parameters:
X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)
- Returns:
model – Trained LightGBM model with optimized parameters
- Return type:
Any
- sharkpy.learning._create_optimized_catboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) xgboost.XGBRegressor | xgboost.XGBClassifier[source]
Create and optimize a CatBoost model using Optuna.
- Parameters:
X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)
- Returns:
model – Trained CatBoost model with optimized parameters
- Return type:
Any