sharkpy.core
============

.. py:module:: sharkpy.core


Attributes
----------

.. autoapisummary::

   sharkpy.core.explain_with_shapash


Classes
-------

.. autoapisummary::

   sharkpy.core.Shark


Module Contents
---------------

.. py:data:: explain_with_shapash
   :value: None


.. py:class:: Shark

   A machine learning model manager that simplifies training, prediction, and analysis.

   .. attribute:: model

      The trained machine learning model (e.g., LogisticRegression, RandomForestClassifier).

      :type: object or None

   .. attribute:: problem_type

      Type of ML problem ('classification' or 'regression').

      :type: str or None

   .. attribute:: features

      Input features used for training.

      :type: pd.DataFrame or None

   .. attribute:: target

      Target variable (encoded for classification, original for regression).

      :type: pd.Series or np.ndarray or None

   .. attribute:: target_name

      Name of the target column in the input data.

      :type: str or None

   .. attribute:: data

      Original input DataFrame, including features and target.

      :type: pd.DataFrame or None

   .. attribute:: project_name

      Name of the current project for tracking and reporting.

      :type: str or None

   .. attribute:: feature_names

      Names of feature columns.

      :type: list of str or None

   .. attribute:: encoders

      Dictionary storing feature encoders (e.g., for categorical features).

      :type: dict

   .. attribute:: label_encoder

      Encoder for categorical target variable (for classification).

      :type: LabelEncoder or None

   .. attribute:: stats_model

      Statistical model for detailed analysis (optional).

      :type: object or None

   .. attribute:: statistical_summary

      Summary of statistical analysis (optional).

      :type: str or None

   .. attribute:: p_values

      P-values from statistical analysis (optional).

      :type: pd.Series or None

   .. attribute:: conf_intervals

      Confidence intervals from statistical analysis (optional).

      :type: pd.DataFrame or None

   .. rubric:: Examples

   >>> from sharkpy import Shark
   >>> import pandas as pd
   >>> shark = Shark()
   >>> data = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv', header=None)
   >>> data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
   >>> shark.learn(data=data, target='species', model_choice='logistic_regression')
   >>> predictions = shark.predict(data)
   >>> shark.explain(export_path='explanation.pdf', format='pdf', depth='simple')
   >>> cv_results, train_metrics = shark.report(cv_folds=5)


   .. py:attribute:: model
      :value: None


   .. py:attribute:: features
      :value: None


   .. py:attribute:: target
      :value: None


   .. py:attribute:: problem_type
      :value: None


   .. py:attribute:: target_name
      :value: None


   .. py:attribute:: data
      :value: None


   .. py:attribute:: label_encoder
      :value: None


   .. py:attribute:: project_name
      :value: None


   .. py:attribute:: feature_names
      :value: None


   .. py:attribute:: encoders


   .. py:attribute:: stats_model
      :value: None


   .. py:attribute:: statistical_summary
      :value: None


   .. py:attribute:: p_values
      :value: None


   .. py:attribute:: conf_intervals
      :value: None


   .. py:method:: learn(data: Union[str, pandas.DataFrame], project_name: str = 'your data', target: Optional[str] = None, problem_type: Optional[str] = None, model: Optional[object] = None, model_choice: Optional[str] = None, detailed_stats: bool = False, n_trials: int = 30, verbose: bool = False) -> Shark

      Train a machine learning model on the provided data.

      :param data: Dataset for training. Can be a file path (CSV) or a pandas DataFrame.
      :type data: str or pd.DataFrame
      :param project_name: Name of the project for tracking and reporting (default: "your data").
      :type project_name: str, optional
      :param target: Name of the target column to predict (default: None).
      :type target: str, optional
      :param problem_type: Type of problem: 'regression', 'classification', or None for auto-detection (default: None).
      :type problem_type: str, optional
      :param model: Custom model instance to use (default: None).
      :type model: object, optional
      :param model_choice: Built-in model to use (e.g., 'logistic_regression', 'random_forest', 'xgboost') (default: None).
      :type model_choice: str, optional
      :param detailed_stats: Whether to compute detailed statistical analysis (e.g., p-values, confidence intervals) (default: False).
      :type detailed_stats: bool, optional
      :param n_trials: Number of optimization trials for boosting models (e.g., XGBoost) (default: 30).
      :type n_trials: int, optional
      :param verbose: Whether to print detailed output during training (default: False).
      :type verbose: bool, optional

      :returns: The current Shark instance with trained model and updated attributes.
      :rtype: Shark

      .. rubric:: Notes

      - Automatically encodes categorical features and target (for classification).
      - Stores the original DataFrame in `self.data` and target name in `self.target_name`.
      - For classification, stores the `LabelEncoder` in `self.label_encoder` to preserve category names.
      - Performs K-Fold cross-validation and prints mean and standard deviation of scores.
      - Fits the selected model on the entire dataset after cross-validation.
      - Warning: Avoid loading untrusted CSV files, as they may contain malicious data.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'a']})
      >>> shark.learn(data, target='y', model_choice='logistic_regression')
      🦈 Looks like a classification problem (non-numeric target: y)
      🦈 Encoding categorical target 'y' to numeric labels
      ...
      >>> shark.target_name
      'y'
      >>> shark.label_encoder.classes_
      array(['a', 'b'], dtype=object)


   .. py:method:: predict(X: Optional[Union[Dict, pandas.DataFrame, List[Dict], numpy.ndarray]] = None) -> Union[float, str, numpy.ndarray]

      Make predictions using the trained model.

      :param X: Input samples to predict. If None, predicts on training data. Options:
                - dict: Single prediction (e.g., {'feature1': value1, 'feature2': value2}).
                - list of dict: Multiple scenarios (e.g., [{'feature1': value1}, {'feature1': value2}]).
                - pd.DataFrame: Multiple samples with feature columns.
                - np.ndarray: Raw feature values (must match training feature count).
      :type X: dict, pd.DataFrame, list of dict, np.ndarray, or None, optional

      :returns: Predicted values. For classification, returns original category names if `label_encoder` is available.
      :rtype: float, str, or np.ndarray

      :raises ValueError: If no model is trained or input data is invalid.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x1': [1, 2], 'x2': [3, 4], 'y': ['cat', 'dog']})
      >>> shark.learn(data, target='y')
      >>> shark.predict({'x1': 1, 'x2': 3})
      'cat'
      >>> shark.predict(data[['x1', 'x2']])
      array(['cat', 'dog'], dtype=object)


   .. py:method:: predict_baseline() -> Union[float, str]

      Make a baseline prediction using the minimum values of the training features.

      :returns: Baseline prediction for regression (mean) or classification (most frequent class).
      :rtype: float or str

      :raises ValueError: If no model is trained.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [10, 20, 30]})
      >>> shark.learn(data, target='y')
      >>> shark.predict_baseline()
      20.0


   .. py:method:: plot(kind: str = 'prediction', show: bool = True, save_path: Optional[str] = None, colors: Optional[Dict[str, str]] = None)

      Visualize model behavior based on the specified plot type.

      :param kind: Type of plot: 'prediction', 'residuals', 'confusion_matrix', 'roc',
                   'pr_curve', 'proba_hist', or 'feature_importance' (default: 'prediction').
      :type kind: str, optional
      :param show: Whether to display the plot (default: True).
      :type show: bool, optional
      :param save_path: Path to save the plot (default: None).
      :type save_path: str, optional
      :param colors: Custom color specifications for the plot. If None, uses default SharkPy colors.
                     Available keys: 'primary', 'secondary', 'accent', 'background', 'grid', 'text', 'bars'
      :type colors: dict, optional

      :rtype: None

      :raises ValueError: If no model is trained or the plot type is invalid.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [0, 1, 0]})
      >>> shark.learn(data, target='y')
      >>> shark.plot(kind='confusion_matrix')

      >>> # Custom colors example
      >>> custom_colors = {
      >>>     'primary': '#FF6B6B',    # Coral red
      >>>     'secondary': '#4ECDC4',  # Turquoise
      >>>     'background': '#F7FFF7'  # Light green
      >>> }
      >>> shark.plot(kind='feature_importance', colors=custom_colors)


   .. py:method:: report(cv_folds: int = 5, export_path: Optional[str] = None, format: str = 'txt') -> tuple

      Generate a comprehensive performance report with cross-validation metrics.

      :param cv_folds: Number of cross-validation folds (default: 5).
      :type cv_folds: int, optional
      :param export_path: Path to export the report (txt, docx, or pdf) (default: None).
      :type export_path: str, optional
      :param format: Export format: 'txt', 'docx', or 'pdf' (default: 'txt').
      :type format: str, optional

      :returns: (cv_results, train_metrics), where cv_results is a dict of cross-validation metrics and train_metrics is a dict of training metrics.
      :rtype: tuple

      :raises ValueError: If no model is trained or the format is invalid.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [0, 1, 0]})
      >>> shark.learn(data, target='y')
      >>> cv_results, train_metrics = shark.report(cv_folds=5)
      >>> print(cv_results['test_accuracy'].mean())


   .. py:method:: explain(cv_results=None, train_metrics=None, export_path: Optional[str] = None, format: str = 'txt', depth: str = 'deep', verbose: int = 1) -> Optional[pandas.DataFrame]

      Explain the model's behavior and performance with customizable depth and export options.

      :param cv_results: Cross-validation results from report(), containing metrics like test_r2 or test_accuracy.
      :type cv_results: dict, optional
      :param train_metrics: Training metrics from report(), containing metrics like r2 or accuracy.
      :type train_metrics: dict, optional
      :param export_path: Path to export the explanation (txt, docx, or pdf) (default: None).
      :type export_path: str, optional
      :param format: Export format: 'txt', 'docx', or 'pdf' (default: 'txt').
      :type format: str, optional
      :param depth: Explanation depth: 'simple' (beginner overview), 'mechanics' (technical details),
                    'interpretation' (performance analysis), 'actionable' (recommendations),
                    'deep' (all levels, default), or 'shapash' (interactive SHAP dashboard).
      :type depth: str, optional

      :returns: Feature importance DataFrame if available, else None.
      :rtype: pd.DataFrame or None

      .. rubric:: Notes

      - Requires a trained model (call `learn` first).
      - For classification, uses `label_encoder` to display original category names (e.g., 'Iris-setosa' instead of 0).
      - If `export_path` is provided, saves the explanation in the specified format.
      - 'shapash' depth requires the `shapash` package to be installed.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x1': [1, 2], 'x2': [3, 4], 'y': ['cat', 'dog']})
      >>> shark.learn(data, target='y')
      >>> shark.explain(depth='simple', export_path='explanation.txt')
      🦈 Sharky is diving into the LogisticRegression model explanation...
      ...
      >>> # explanation.txt contains: "This model predicts one of 2 categories (cat, dog)..."


   .. py:method:: save_model(name: str = 'shark_model', directory: str = 'models') -> str

      Save the trained model to a .joblib file.

      :param name: Filename without extension (default: "shark_model").
      :type name: str, optional
      :param directory: Folder where the model will be saved (default: "models").
      :type directory: str, optional

      :returns: Path to the saved model file.
      :rtype: str

      :raises ValueError: If no model is trained.
      :raises OSError: If directory creation or file writing fails.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [10, 20, 30]})
      >>> shark.learn(data, target='y')
      >>> shark.save_model(name='my_model')
      'models/my_model.joblib'


   .. py:method:: load_model(model_path: str) -> object

      Load a saved SharkPy model from a .joblib file.

      :param model_path: Path to the saved .joblib model file.
      :type model_path: str

      :returns: The loaded model object.
      :rtype: object

      :raises FileNotFoundError: If the model file does not exist.
      :raises ValueError: If the file is not a valid model.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> shark.load_model('models/my_model.joblib')
      <sklearn.linear_model.LinearRegression object at ...>


   .. py:method:: battle(data: pandas.DataFrame, target: str, models: List[str] = ['linear_regression', 'random_forest', 'xgboost'], metric: str = 'r2', n_trials: int = 30, early_stopping: bool = False, min_score: float = 0.5, verbose: int = 0) -> Dict

      Compare multiple models and select the best performer.

      :param data: Input data for training.
      :type data: pd.DataFrame
      :param target: Name of the target column.
      :type target: str
      :param models: List of model names to compare (e.g., ['linear_regression', 'random_forest']) (default: ['linear_regression', 'random_forest', 'xgboost']).
      :type models: list of str, optional
      :param metric: Metric to compare models (e.g., 'r2', 'accuracy') (default: 'r2').
      :type metric: str, optional
      :param n_trials: Number of optimization trials for boosting models (default: 30).
      :type n_trials: int, optional
      :param early_stopping: If True, stops training if any model exceeds `min_score`. Not recommended as it may miss better models later (default: False).
      :type early_stopping: bool, optional
      :param min_score: Minimum score to trigger early stopping (default: 0.5).
      :type min_score: float, optional
      :param verbose: Verbosity level for model training (default: 0)
      :type verbose: int, optional

      :returns: Dictionary containing champion model name, model object, score, all results, details, and comparison plot.
      :rtype: dict

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [10, 20, 30]})
      >>> result = shark.battle(data, target='y', models=['linear_regression', 'random_forest'])
      >>> print(result['champion'])
      'linear_regression'


   .. py:method:: explain_with_shapash(title_story: Optional[str] = None, display: bool = True)

      Create an interactive Shapash dashboard for model interpretation.

      :param title_story: Title for the Shapash dashboard (default: None).
      :type title_story: str, optional
      :param display: Whether to display the dashboard (default: True).
      :type display: bool, optional

      :rtype: None

      :raises ImportError: If the `shapash` package is not installed.
      :raises ValueError: If no model is trained.

      .. rubric:: Examples

      >>> shark = Shark()
      >>> data = pd.DataFrame({'x': [1, 2, 3], 'y': [0, 1, 0]})
      >>> shark.learn(data, target='y')
      >>> shark.explain_with_shapash(title_story='My Model Analysis')


   .. py:method:: available_models() -> Dict

      List all available models with their details and print a comparison table.

      :returns: Dictionary of available models and their details.
      :rtype: dict

      .. rubric:: Examples

      >>> shark = Shark()
      >>> models = shark.available_models()
      🦈 Available Models in SharkPy 🦈
      ...
      >>> print(models.keys())
      dict_keys(['linear_regression', 'random_forest', 'xgboost', ...])