sharkpy.learning ================ .. py:module:: sharkpy.learning Attributes ---------- .. autoapisummary:: sharkpy.learning.PREDICTION_INTROS Functions --------- .. autoapisummary:: sharkpy.learning.learn sharkpy.learning._create_optimized_xgboost sharkpy.learning._create_optimized_lightgbm sharkpy.learning._create_optimized_catboost Module Contents --------------- .. py:data:: PREDICTION_INTROS :value: ['🦈 Diving into {project_name}! Time to make some waves! 🌊', '🦈 Sharpening teeth on... .. py:function:: learn(self, data: Union[str, pandas.DataFrame], project_name: str = 'your data', target: Optional[str] = None, problem_type: Optional[str] = None, model: Optional[Any] = None, model_choice: Optional[str] = None, detailed_stats: bool = False, n_trials: int = 30, verbose: bool = False) -> Shark Train a machine learning model using the provided data and parameters. :param self: The Shark instance. :type self: Shark :param data: The dataset to use for training. Can be a file path (CSV) or a DataFrame. :type data: str or pandas.DataFrame :param project_name: Name of the project for tracking and reporting. :type project_name: str, optional :param target: Name of the column to predict. If None, uses the last column. :type target: str, optional :param problem_type: Type of problem: "regression" or "classification". If None, tries to infer automatically. :type problem_type: str, optional :param model: A custom scikit-learn compatible model instance to use. If provided, overrides model_choice. :type model: sklearn.base.BaseEstimator, optional :param model_choice: String identifier for built-in model selection. Options: - "random_forest": RandomForestRegressor or RandomForestClassifier - "svm": SVR or SVC - "ridge": Ridge Regression (L2 regularization) - "lasso": Lasso Regression (L1 regularization) - "knn": K-Nearest Neighbors - "xgboost": XGBoost with Optuna optimization - "lightgbm": LightGBM with Optuna optimization - "catboost": CatBoost with Optuna optimization - None: LinearRegression or LogisticRegression (default) :type model_choice: str, optional :param detailed_stats: If True, uses statsmodels for detailed statistical analysis :type detailed_stats: bool, optional :param n_trials: Number of optimization trials for boosting models (default: 30) :type n_trials: int, optional :param verbose: If True, enables verbose logging for Optuna optimization (default: False) :type verbose: bool, optional .. rubric:: Notes - Encodes categorical features and target automatically for classification. - Performs K-Fold cross-validation and prints mean and std of scores. - Fits the selected model on the entire dataset after cross-validation. - Sets self.model, self.problem_type, self.features, self.target, and self.encoders. - Warning: Avoid loading untrusted CSV files, as they may contain malicious data. .. py:function:: _create_optimized_xgboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) -> Any Create and optimize an XGBoost model using Optuna. :param X: Features DataFrame :type X: pd.DataFrame :param y: Target series :type y: pd.Series :param problem_type: Type of problem: "regression" or "classification" :type problem_type: str :param n_trials: Number of optimization trials (default: 30) :type n_trials: int :returns: **model** -- Trained XGBoost model with optimized parameters :rtype: Any .. py:function:: _create_optimized_lightgbm(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) -> Any Create and optimize a LightGBM model using Optuna. :param X: Features DataFrame :type X: pd.DataFrame :param y: Target series :type y: pd.Series :param problem_type: Type of problem: "regression" or "classification" :type problem_type: str :param n_trials: Number of optimization trials (default: 30) :type n_trials: int :returns: **model** -- Trained LightGBM model with optimized parameters :rtype: Any .. py:function:: _create_optimized_catboost(X: pandas.DataFrame, y: pandas.Series, problem_type: str = 'regression', n_trials: int = 30) -> Union[xgboost.XGBRegressor, xgboost.XGBClassifier] Create and optimize a CatBoost model using Optuna. :param X: Features DataFrame :type X: pd.DataFrame :param y: Target series :type y: pd.Series :param problem_type: Type of problem: "regression" or "classification" :type problem_type: str :param n_trials: Number of optimization trials (default: 30) :type n_trials: int :returns: **model** -- Trained CatBoost model with optimized parameters :rtype: Any