============= autoCV Module ============= Description : - This module is used for model selection: * Automate the models training with cross validation * OpitmalFlow supports two methods of hyperparameter optimization; cartesian grid search and random grid search. * Export the optimized models as pkl files, and saved them in /pkl folders * Validate the optimized models, and select the best model .. image:: Learning_Curve_Compare.png :width: 980 - Class * dynaClassifier : Focus on classification problems - fit() : fit method for classifier * dynaRegressor : Focus on regression problems - fit() : fit method for regressor - Current available estimators * clf_cv : Class focusing on classification estimators - lgr - Logistic Regression (aka logit, MaxEnt) classifier - LogisticRegression() - svm : C-Support Vector Classification - SVM.SVC() - mlp : Multi-layer Perceptron classifier - MLPClassifier() - ada : An AdaBoost classifier - AdaBoostClassifier() - rf : Random Forest classifier - RandomForestClassifier() - gb : Gradient Boost classifier - GradientBoostingClassifier() - xgb : XGBoost classifier - xgb.XGBClassifier() - lsvc : Linear Support Vector Machine Classifier - LinearSVC() - hgboost : Hist Gradient Boosting classifier - HistGradientBoostingClassifier() - sgd : SDG classifier - SGDClassifier() - rgcv : Ridge Cross-validation classifier - RidgeClassifierCV() * reg_cv : Class focusing on regression estimators - lr : Linear Regression - LinearRegression() - knn : Regression based on k-nearest neighbors - KNeighborsRegressor() - svr : Epsilon-Support Vector Regression - SVM.SVR() - rf : Random Forest Regression - RandomForestRegressor() - ada : An AdaBoost regressor - AdaBoostRegressor() - gb : Gradient Boosting for regression -GradientBoostingRegressor() - tree : A decision tree regressor - DecisionTreeRegressor() - mlp : Multi-layer Perceptron regressor - MLPRegressor() - xgb : XGBoost regression - XGBRegressor() - hgboost : Hist Gradient Boosting regression - HistGradientBoostingRegressor() - huber : Huber regression - HuberRegressor() - rgcv : Ridge cross validation regression - RidgeCV() - cvlasso : Lasso cross validation regression - LassoCV() - sgd : Stochastic Gradient Descent regression - SGDRegressor() dynaClassifier --------------------- .. autoclass:: optimalflow.autoCV.dynaClassifier :members: fastClassifier --------------------- .. autoclass:: optimalflow.autoCV.fastClassifier :members: dynaRegressor --------------------- .. autoclass:: optimalflow.autoCV.dynaRegressor :members: fastRegressor --------------------- .. autoclass:: optimalflow.autoCV.fastRegressor :members: evaluate_model --------------------- .. autoclass:: optimalflow.autoCV.evaluate_model :members: clf_cv --------------------- .. autoclass:: optimalflow.estimatorCV.clf_cv reg_cv --------------------- .. autoclass:: optimalflow.estimatorCV.reg_cv data_splitting_tool --------------------- .. autofunction:: optimalflow.utilis_func.data_splitting_tool reset_parameters --------------------- .. autofunction:: optimalflow.utilis_func.reset_parameters update_parameters --------------------- .. autofunction:: optimalflow.utilis_func.update_parameters export_parameters --------------------- .. autofunction:: optimalflow.utilis_func.export_parameters Defaults Parameters for Classifiers/Regressors ---------------------------------------------- **Estimators default parameters setting:** .. list-table:: Classifiers Estimators Default Parameters Searching Range :widths: 25 25 50 :header-rows: 1 * - Estimators: - Parameters: - Value Range: * - lgr - 'C' - [0.001, 0.01, 0.1, 1, 10, 100, 1000] * - svm - 'C' - [0.1, 1, 10] * - - 'kernel' - ['linear', 'poly', 'rbf', 'sigmoid'] * - mlp - 'activation' - ['identity','relu', 'logistic'] * - - 'hidden_layer_sizes' - [10, 50, 100] * - - 'learning_rate' - ['constant', 'invscaling', 'adaptive'] * - - 'solver' - ['lbfgs', 'sgd', 'adam'] * - ada - 'n_estimators' - [50,100,150] * - - 'learning_rate' - [0.01,0.1, 1, 5, 10] * - rf - 'max_depth' - [2, 4, 8, 16, 32] * - - 'n_estimators' - [5, 50, 250] * - gb - 'n_estimators' - [50,100,150,200,250,300] * - - 'max_depth' - [1, 3, 5, 7, 9] * - - 'learning_rate' - [0.01, 0.1, 1, 10, 100] * - xgb - 'n_estimators' - [50,100,150,200,250,300] * - - 'max_depth' - [3, 5, 7, 9] * - - 'learning_rate' - [0.01, 0.1, 0.2,0.3,0.4] * - lsvc - 'C' - [0.1, 1, 10] * - sgd - 'penalty' - ['l2', 'l1', 'elasticnet'] * - hgboost - 'max_depth' - [3, 5, 7, 9] * - - 'learning_rate' - [0.1, 0.2,0.3,0.4] * - rgcv - 'fit_intercept' - [True,False] .. .. list-table:: Regressors Default Parameters Searching Range :widths: 25 25 50 :header-rows: 1 * - Estimators: - Parameters: - Value Range: * - lr - 'normalize' - [True,False] * - knn - 'algorithm' - ['auto', 'ball_tree', 'kd_tree', 'brute'] * - - 'n_neighbors' - [5, 10, 15, 20, 25] * - - 'weights' - ['uniform', 'distance'] * - svm - 'C' - [0.1, 1, 10] * - - 'kernel' - ['linear', 'poly', 'rbf', 'sigmoid'] * - mlp - 'activation' - ['identity','relu', 'tanh', 'logistic'] * - - 'hidden_layer_sizes' - [10, 50, 100] * - - 'learning_rate' - ['constant', 'invscaling', 'adaptive'] * - - 'solver' - ['lbfgs', 'adam'] * - ada - 'n_estimators' - [50,100,150,200,250,300] * - - 'loss' - ['linear','square','exponential'] * - - 'learning_rate' - [0.01, 0.1, 0.2,0.3,0.4] * - tree - 'splitter' - ['best', 'random'] * - - 'max_depth' - [1, 3, 5, 7, 9] * - - 'min_samples_leaf' - [1,3,5] * - rf - 'max_depth' - [2, 4, 8, 16, 32] * - - 'n_estimators' - [5, 50, 250] * - gb - 'n_estimators' - [50,100,150,200,250,300] * - - 'max_depth' - [3, 5, 7, 9] * - - 'learning_rate' - [0.01, 0.1, 0.2,0.3,0.4] * - xgb - 'n_estimators' - [50,100,150,200,250,300] * - - 'max_depth' - [3, 5, 7, 9] * - - 'learning_rate' - [0.01, 0.1, 0.2,0.3,0.4] * - sgd - 'shuffle' - [True,False] * - - 'penalty' - ['l2', 'l1', 'elasticnet'] * - - 'learning_rate' - ['constant','optimal','invscaling'] * - cvlasso - 'fit_intercept' - [True,False] * - rgcv - 'fit_intercept' - [True,False] * - huber - 'fit_intercept' - [True,False] * - hgboost - 'max_depth' - [3, 5, 7, 9] * - - 'learning_rate' - [0.1, 0.2,0.3,0.4] ..