sharkpy.learning

Attributes

PREDICTION_INTROS

Functions

learn(→ Shark)

Train a machine learning model using the provided data and parameters.

_create_optimized_xgboost(→ Any)

Create and optimize an XGBoost model using Optuna.

_create_optimized_lightgbm(→ Any)

Create and optimize a LightGBM model using Optuna.

_create_optimized_catboost(...)

Create and optimize a CatBoost model using Optuna.

Module Contents

sharkpy.learning.PREDICTION_INTROS = ['🦈 Diving into {project_name}! Time to make some waves! 🌊', '🦈 Sharpening teeth on...[source]
sharkpy.learning.learn(self, data: str | pandas.DataFrame, project_name: str = 'your data', target: str | None = None, problem_type: str | None = None, model: Any | None = None, model_choice: str | None = None, detailed_stats: bool = False, n_trials: int = 30, verbose: bool = False) Shark[source]

Train a machine learning model using the provided data and parameters.

Parameters:
  • self (Shark) – The Shark instance.

  • data (str or pandas.DataFrame) – The dataset to use for training. Can be a file path (CSV) or a DataFrame.

  • project_name (str, optional) – Name of the project for tracking and reporting.

  • target (str, optional) – Name of the column to predict. If None, uses the last column.

  • problem_type (str, optional) – Type of problem: “regression” or “classification”. If None, tries to infer automatically.

  • model (sklearn.base.BaseEstimator, optional) – A custom scikit-learn compatible model instance to use. If provided, overrides model_choice.

  • model_choice (str, optional) –

    String identifier for built-in model selection. Options:
    • ”random_forest”: RandomForestRegressor or RandomForestClassifier

    • ”svm”: SVR or SVC

    • ”ridge”: Ridge Regression (L2 regularization)

    • ”lasso”: Lasso Regression (L1 regularization)

    • ”knn”: K-Nearest Neighbors

    • ”xgboost”: XGBoost with Optuna optimization

    • ”lightgbm”: LightGBM with Optuna optimization

    • ”catboost”: CatBoost with Optuna optimization

    • None: LinearRegression or LogisticRegression (default)

  • detailed_stats (bool, optional) – If True, uses statsmodels for detailed statistical analysis

  • n_trials (int, optional) – Number of optimization trials for boosting models (default: 30)

  • verbose (bool, optional) – If True, enables verbose logging for Optuna optimization (default: False)

Notes

  • Encodes categorical features and target automatically for classification.

  • Performs K-Fold cross-validation and prints mean and std of scores.

  • Fits the selected model on the entire dataset after cross-validation.

  • Sets self.model, self.problem_type, self.features, self.target, and self.encoders.

  • Warning: Avoid loading untrusted CSV files, as they may contain malicious data.

sharkpy.learning._create_optimized_xgboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) Any[source]

Create and optimize an XGBoost model using Optuna.

Parameters:
  • X (pd.DataFrame) – Features DataFrame

  • y (pd.Series) – Target series

  • problem_type (str) – Type of problem: “regression” or “classification”

  • n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained XGBoost model with optimized parameters

Return type:

Any

sharkpy.learning._create_optimized_lightgbm(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) Any[source]

Create and optimize a LightGBM model using Optuna.

Parameters:
  • X (pd.DataFrame) – Features DataFrame

  • y (pd.Series) – Target series

  • problem_type (str) – Type of problem: “regression” or “classification”

  • n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained LightGBM model with optimized parameters

Return type:

Any

sharkpy.learning._create_optimized_catboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) xgboost.XGBRegressor | xgboost.XGBClassifier[source]

Create and optimize a CatBoost model using Optuna.

Parameters:
  • X (pd.DataFrame) – Features DataFrame

  • y (pd.Series) – Target series

  • problem_type (str) – Type of problem: “regression” or “classification”

  • n_trials (int) – Number of optimization trials (default: 30)

Returns:

model – Trained CatBoost model with optimized parameters

Return type:

Any