H2O.ai Catalog

Extend the power of Driverless AI with custom recipes and build your own AI!

modelsscorerstransformersdata

Monotonic Models

LightGBM/XGBoostGBM/DecisionTree with user-given monotonicity constraints (1/-1/0) for original numeric features

Exponential Smoothing

Linear Model on top of Exponential Weighted Moving Average Lags for Time-Series. Provide appropriate lags and past outcomes during batch scoring for best results.

Fb Prophet

Prophet by Facebook for TimeSeries with an example of parameter mutation.

Fb Prophet Parallel

Prophet by Facebook for TimeSeries with an example of parameter mutation.

Historic Mean

Historic Mean for Time-Series problems. Predicts the mean of the target for each timegroup for regression problems.

Calibratedclassifier

Calibrated Classifier Model: To calibrate predictions using Platt's scaling, Isotonic Regression or Splines

Catboost

CatBoost gradient boosting by Yandex. Currently supports regression and binary classification.

Daal Trees

Binary Classification and Regression for Decision Forest and Gradient Boosting based on Intel DAAL

Extra Trees

Extremely Randomized Trees (ExtraTrees) model from sklearn

Extremeclassifier

Extreme Classifier Model: To speed up train of multiclass model (100s of classes) for lightGBM. Caution: can only be used for AUC (or GINI) and accuracy metrics. Based on: Extreme Classification in Log Memory using Count-Min Sketch: https://arxiv.org/abs/1910.13830

H2O 3 Gbm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GBM

H2O 3 Models

H2O-3 Distributed Scalable Machine Learning Models (DL/GLM/GBM/DRF/NB/AutoML)

H2O Glm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GLM

Knearestneighbour

K-Nearest Neighbor implementation by sklearn. For small data (< 200k rows).

Libfm Fastfm

LibFM implementation of fastFM

Linear Svm

Linear Support Vector Machine (SVM) implementation by sklearn. For small data.

Logistic Regression

Logistic Regression based upon sklearn.

Nusvm

Nu-SVM implementation by sklearn. For small data.

Random Forest

Random Forest (RandomForest) model from sklearn

Lightgbm Quantile Regression

Modified version of Driverless AI's internal LightGBM implementation with for quantile regression

Lightgbm Tweedie

Modified version of Driverless AI's internal LightGBM implementation with tweedie distribution

Lightgbm With Custom Loss

Modified version of Driverless AI's internal LightGBM implementation with a custom objective function (used for tree split finding).

Xgboost With Custom Loss

Modified version of Driverless AI's internal XGBoost implementation with a custom objective function (used for tree split finding).

Model Decision Tree Linear Combo

Decision tree plus linear model

Model Gam

Generalized Additive Model

Model Skopes Rules

Skopes rules

Model Ga2M

Explainable Boosting Machines (EBM), implementation of GA2M

Model Xnn

Explainable neural net

Finbert

Custom Bert model which uses FinBert pretrained weights. Can easily be adapted to other pretrained models, like SciBert.

Text Binary Count Logistic

Text classification model using binary count of words

Text Tfidf Model

Text classification / regression model using TFIDF

Text Tfidf Model Continuous

Text classification model using TFIDF

Huber Loss

Huber Loss for Regression or Binary Classification. Robust loss, combination of quadratic loss and linear loss.

F3 Score

F4 Score

Precision

Weighted Precision: `TP / (TP + FP)` at threshold for optimal F1 Score.

Recall

Weighted Recall: `TP / (TP + FN)` at threshold for optimal F1 Score.

Average Mcc

Averaged Matthews Correlation Coefficient (averaged over several thresholds, for imbalanced problems). Example how to use Driverless AI's internal scorer.

Brier Loss

Cost

Using hard-coded dollar amounts x for false positives and y for false negatives, calculate the cost of a model using: `(x * FP + y * FN) / N`

Cost Access To Data

Same as CostBinary, but provides access to full Data

Cost Smooth

Using hard-coded dollar amounts x for false positives and y for false negatives, calculate the cost of a model using: `(1 - y_true) * y_pred * fp_cost + y_true * (1 - y_pred) * fn_cost`

False Discovery Rate

Weighted False Discovery Rate: `FP / (FP + TP)` at threshold for optimal F1 Score.

Logloss With Costs

Logloss with costs associated with each type of 4 outcomes - typically applicable to fraud use case

Marketing Campaign

Computes the mean profit per outbound marketing letter, given a fraction of the population addressed, and fixed cost and reward

Profit

Uses domain information about user behavior to calculate the profit or loss of a model.

Hamming Loss

Hamming Loss - Misclassification Rate (1 - Accuracy)

Map@K

Mean Average Precision @ k (MAP@k)

Quadratic Weighted Kappa

Qudratic Weighted Kappa

Wape Scorer

Weighted Absoluted Percent Error

Asymmetric Mae

MAE with a penalty that differs for positive and negative errors

Cosh Loss

Hyperbolic Cosine Loss

Explained Variance

Explained Variance. Fraction of variance that is explained by the model.

Largest Error

Largest error for regression problems. Highly sensitive to outliers.

Log Mae

Log Mean Absolute Error for regression

Mean Absolute Scaled Error

Mean Absolute Scaled Error for time-series regression

Mean Squared Log Error

Mean Squared Log Error for regression

Median Absolute Error

Median Absolute Error for regression

Pearson Correlation

Pearson Correlation Coefficient for regression

Quantile Loss

Quantile Loss regression

Rmse With X

Custom RMSE Scorer that also gets X (original features) - for demo/testing purposes only

Top Decile

Median Absolute Error for predictions in the top decile

First N Char Cvte

Target-encode high cardinality categorical text by their first few characters in the string

Log Scale Target Encoding

Target-encode numbers by their logarithm

Germany Landers Holidays

Returns a flag for whether a date falls on a holiday for each of Germany's Bundeslaender.

Ip Address Features

Parses IP addresses and networks and extracts its properties.

Is Ramadan

Returns a flag for whether a date falls on Ramadan in Saudi Arabia

Singapore Public Holidays

Flag for whether a date falls on a public holiday in Singapore.

Usairportcode Origin Dest

Transformer to parse and augment US airport codes with geolocation info.

Usairportcode Origin Dest Geo Features

Transformer to augment US airport codes with geolocation info.

Uszipcode Features Database

Transformer to parse and augment US zipcodes with info from zipcode database.

Uszipcode Features Light

Lightweight transformer to parse and augment US zipcodes with info from zipcode database.

Auto Arima Forecast

Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models.

General Time Series Transformer

Demonstrates the API for custom time-series transformers.

Parallel Auto Arima Forecast

Parallel Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models.In this implementation, Time Group Models are fitted in parallel

Parallel Prophet Forecast

Parallel FB Prophet transformer is a time series transformer that predicts target using FBProphet models.

Parallel Prophet Forecast Using Individual Groups

Parallel FB Prophet transformer is a time series transformer that predicts target using FBProphet models.This transformer fits one model for each time group column values and is significantly fasterthan the implementation available in parallel_prophet_forecast.py.

Serial Prophet Forecast

Transformer that uses FB Prophet for time series prediction.Please see the parallel implementation for more information

Time Encoder Transformer

converts the Time Column to an ordered integer

Trading Volatility

Calculates Historical Volatility for numeric features (makes assumptions on the data)

Datetime Diff Transformer

Difference in time between two datetime columns

Datetime Encoder Transformer

Converts datetime column into an integer (milliseconds since 1970)

Days Until Dec2020

Creates new feature for any date columns, by computing the difference in days between the date value and 31st Dec 2020

Pe Data Directory Features