sharkpy.learning

Attributes

PREDICTION_INTROS

Functions

`learn`(→ Shark)	Train a machine learning model using the provided data and parameters.
`_create_optimized_xgboost`(→ Any)	Create and optimize an XGBoost model using Optuna.
`_create_optimized_lightgbm`(→ Any)	Create and optimize a LightGBM model using Optuna.
`_create_optimized_catboost`(...)	Create and optimize a CatBoost model using Optuna.

Module Contents

sharkpy.learning.PREDICTION_INTROS = ['🦈 Diving into {project_name}! Time to make some waves! 🌊', '🦈 Sharpening teeth on...[source]

sharkpy.learning.learn(self, data: str | pandas.DataFrame, project_name: str = 'your data', target: str | None = None, problem_type: str | None = None, model: Any | None = None, model_choice: str | None = None, detailed_stats: bool = False, n_trials: int = 30, verbose: bool = False) → Shark[source]

Train a machine learning model using the provided data and parameters.

Parameters:

self (Shark) – The Shark instance.
data (str or pandas.DataFrame) – The dataset to use for training. Can be a file path (CSV) or a DataFrame.
project_name (str, optional) – Name of the project for tracking and reporting.
target (str, optional) – Name of the column to predict. If None, uses the last column.
problem_type (str, optional) – Type of problem: “regression” or “classification”. If None, tries to infer automatically.
model (sklearn.base.BaseEstimator, optional) – A custom scikit-learn compatible model instance to use. If provided, overrides model_choice.
model_choice (str, optional) –
String identifier for built-in model selection. Options:
- ”random_forest”: RandomForestRegressor or RandomForestClassifier
- ”svm”: SVR or SVC
- ”ridge”: Ridge Regression (L2 regularization)
- ”lasso”: Lasso Regression (L1 regularization)
- ”knn”: K-Nearest Neighbors
- ”xgboost”: XGBoost with Optuna optimization
- ”lightgbm”: LightGBM with Optuna optimization
- ”catboost”: CatBoost with Optuna optimization
- None: LinearRegression or LogisticRegression (default)
detailed_stats (bool, optional) – If True, uses statsmodels for detailed statistical analysis
n_trials (int, optional) – Number of optimization trials for boosting models (default: 30)
verbose (bool, optional) – If True, enables verbose logging for Optuna optimization (default: False)

Notes

Encodes categorical features and target automatically for classification.
Performs K-Fold cross-validation and prints mean and std of scores.
Fits the selected model on the entire dataset after cross-validation.
Sets self.model, self.problem_type, self.features, self.target, and self.encoders.
Warning: Avoid loading untrusted CSV files, as they may contain malicious data.

sharkpy.learning._create_optimized_xgboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) → Any[source]

Create and optimize an XGBoost model using Optuna.

Parameters:

X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained XGBoost model with optimized parameters

Return type:

Any

sharkpy.learning._create_optimized_lightgbm(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) → Any[source]

Create and optimize a LightGBM model using Optuna.

Parameters:

X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained LightGBM model with optimized parameters

Return type:

Any

sharkpy.learning._create_optimized_catboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) → xgboost.XGBRegressor | xgboost.XGBClassifier[source]

Create and optimize a CatBoost model using Optuna.

Parameters:

X (pd.DataFrame) – Features DataFrame
y (pd.Series) – Target series
problem_type (str) – Type of problem: “regression” or “classification”
n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained CatBoost model with optimized parameters

Return type:

Any