H2O.ai Catalog

Extend the power of Driverless AI with custom recipes and build your own AI!

m

Model Template

Template base class for a custom model recipe.

m

Exponential Smoothing

Linear Model on top of Exponential Weighted Moving Average Lags for Time-Series. Provide appropriate lags and past outcomes during batch scoring for best results.

m

Fb Prophet

Prophet by Facebook for TimeSeries with an example of parameter mutation.

m

Fb Prophet Parallel

Prophet by Facebook for TimeSeries with an example of parameter mutation.

m

Historic Mean

Historic Mean for Time-Series problems. Predicts the mean of the target for each timegroup for regression problems.

m

Catboost

CatBoost gradient boosting by Yandex. Currently supports regression and binary classification.

m

Daal Trees

Binary Classification and Regression for Decision Forest and Gradient Boosting based on Intel DAAL

m

Extra Trees

Extremely Randomized Trees (ExtraTrees) model from sklearn

m

H2O 3 Gbm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GBM

m

H2O 3 Models

H2O-3 Distributed Scalable Machine Learning Models (DL/GLM/GBM/DRF/NB/AutoML)

m

H2O Glm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GLM

m

Knearestneighbour

K-Nearest Neighbor implementation by sklearn. For small data (< 200k rows).

m

Libfm Fastfm

LibFM implementation of fastFM

m

Linear Svm

Linear Support Vector Machine (SVM) implementation by sklearn. For small data.

m

Logistic Regression

Logistic Regression based upon sklearn.

m

Nusvm

Nu-SVM implementation by sklearn. For small data.

m

Random Forest

Random Forest (RandomForest) model from sklearn

m

Lightgbm With Custom Loss

Modified version of Driverless AI's internal LightGBM implementation with a custom objective function (used for tree split finding).

m

Xgboost With Custom Loss

Modified version of Driverless AI's internal XGBoost implementation with a custom objective function (used for tree split finding).

m

Text Tfidf Model

Text classification / regression model using TFIDF

s

Huber Loss

Huber Loss for Regression or Binary Classification. Robust loss, combination of quadratic loss and linear loss.

s

Scorer Template

Template base class for a custom scorer recipe.

s

F3 Score

F3 Score

s

F4 Score

F4 Score

s

Precision

Precision: `TP / (TP + FP)`. Binary uses threshold of 0.5 (please adjust), multiclass uses argmax to assign labels.

s

Recall

Recall: `TP / (TP + FN)`. Binary uses threshold of 0.5 (please adjust), multiclass uses argmax to assign labels.

s

Average Mcc

Averaged Matthews Correlation Coefficient (averaged over several thresholds, for imbalanced problems). Example how to use Driverless AI's internal scorer.

s

Brier Loss

Brier Loss

s

Cost

Using hard-coded dollar amounts x for false positives and y for false negatives, calculate the cost of a model using: `(x * FP + y * FN) / N`

s

False Discovery Rate

False Discovery Rate: `FP / (FP + TP)` for binary classification - only recommended if threshold is adjusted`

s

Marketing Campaign

Computes the mean profit per outbound marketing letter, given a fraction of the population addressed, and fixed cost and reward

s

Profit

Uses domain information about user behavior to calculate the profit or loss of a model.

s

Hamming Loss

Hamming Loss - Misclassification Rate (1 - Accuracy)

s

Quadratic Weighted Kappa

Qudratic Weighted Kappa

s

Wape Scorer

Weighted Absoluted Percent Error

s

Cosh Loss

Hyperbolic Cosine Loss

s

Explained Variance

Explained Variance. Fraction of variance that is explained by the model.

s

Largest Error

Largest error for regression problems. Highly sensitive to outliers.

s

Log Mae

Log Mean Absolute Error for regression

s

Mean Absolute Scaled Error

Mean Absolute Scaled Error for time-series regression

s

Mean Squared Log Error

Mean Squared Log Error for regression

s

Median Absolute Error

Median Absolute Error for regression

s

Pearson Correlation

Pearson Correlation Coefficient for regression

s

Top Decile

Median Absolute Error for predictions in the top decile

t

How To Debug Transformer

Example how to debug a transformer outside of Driverless AI (optional)

t

How To Test From Py Client

Testing a BYOR Transformer the PyClient - works on 1.7.0 & 1.7.1-17

t

Transformer Template

Template base class for a custom transformer recipe.

t

Firstncharcvte

Target-encode high cardinality categorical text by their first few characters in the string

t

Log Scale Target Encoding

Target-encode numbers by their logarithm

t

Germany Landers Holidays

Returns a flag for whether a date falls on a holiday for each of Germany's Bundeslaender.

t

Ipaddress Features

Parses IP addresses and networks and extracts its properties.

t

Is Ramadan

Returns a flag for whether a date falls on Ramadan in Saudi Arabia

t

Singapore Public Holidays

Flag for whether a date falls on a public holiday in Singapore.

t

Usairportcode Origin Dest

Transformer to parse and augment US airport codes with geolocation info.

t

Usairportcode Origin Dest Geo Features

Transformer to augment US airport codes with geolocation info.

t

Uszipcode Features Database

Transformer to parse and augment US zipcodes with info from zipcode database.

t

Uszipcode Features Light

Lightweight transformer to parse and augment US zipcodes with info from zipcode database.

t

Auto Arima Forecast

Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models

t

General Time Series Transformer

Demonstrates the API for custom time-series transformers.

t

Parallel Auto Arima Forecast

Parallel Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models.In this implementation, Time Group Models are fitted in parallel

t

Parallel Prophet Forecast

Parallel FB Prophet transformer is a time series transformer that predicts target using FBProphet models.

t

Serial Prophet Forecast

Transformer that uses FB Prophet for time series prediction.Please see the parallel implementation for more information

t

Time Encoder Transformer

converts the Time Column to an ordered integer

t

Trading Volatility

Calculates Historical Volatility for numeric features (makes assumptions on the data)

t

Datetime Diff Transformer

Difference in time between two datetime columns

t

Datetime Encoder Transformer

Converts datetime column into an integer (milliseconds since 1970)

t

Days Until Dec2020

Creates new feature for any date columns, by computing the difference in days between the date value and 31st Dec 2020

t

Pe Data Directory Features

Extract LIEF features from PE files

t

Pe Exports Features

Extract LIEF features from PE files

t

Pe General Features

Extract LIEF features from PE files

t

Pe Header Features

Extract LIEF features from PE files

t

Pe Imports Features

Extract LIEF features from PE files

t

Pe Normalized Byte Count

Extract LIEF features from PE files

t

Pe Section Characteristics

Extract LIEF features from PE files

t

Audio Mfcc Transformer

Extract MFCC and spectrogram features from audio files

t

Azure Speech To Text

An example of integration with Azure Speech Recognition Service

t

Image Ocr Transformer

Convert a path to an image to text using OCR based on tesseract

t

Image Url Transformer

Convert a path to an image (JPG/JPEG/PNG) to a vector of class probabilities created by a pretrained ImageNet deeplearning model (Keras, TensorFlow).

t

Matrixfactorization

Collaborative filtering features using various techniques of Matrix Factorization for recommendations.Recommended for large data

t

Boxcox Transformer

Box-Cox Transform

t

Count Negative Values Transformer

Count of negative values per row

t

Count Positive Values Transformer

Count of positive values per row

t

Exp Diff Transformer

Exponentiated difference of two numbers

t

Log Transformer

Converts numbers to their Logarithm

t

Product

Products together 3 or more numeric features

t

Random Transformer

Creates random numbers

t

Round Transformer

Rounds numbers to 1, 2 or 3 decimals

t

Square Root Transformer

Converts numbers to the square root, preserving the sign of the original numbers

t

Sum

Adds together 3 or more numeric features

t

Yeojohnson Transformer

Yeo-Johnson Power Transformer

t

H2O3 Dl Anomaly

Anomaly score for each row based on reconstruction error of a H2O-3 deep learning autoencoder

t

Quantile Winsorizer

Winsorizes (truncates) univariate outliers outside of a given quantile threshold

t

Twosigma Winsorizer

Winsorizes (truncates) univariate outliers outside of two standard deviations from the mean.

t

Expandingmean

CatBoost-style target encoding. See https://youtu.be/d6UMEmeXB6o?t=818 for short explanation

t

Leaky Mean Target Encoder

Example implementation of a out-of-fold target encoder (leaky, not recommended)

t

Fuzzy Text Similarity Transformers

Row-by-row similarity between two text columns based on FuzzyWuzzy

t

Text Char Tfidf Count Transformers

Character level TFIDF and Count followed by Truncated SVD on text columns

t

Text Embedding Similarity Transformers

Row-by-row similarity between two text columns based on pretrained Deep Learning embedding space

t

Text Lang Detect Transformer

Detect the language for a text value using Google's 'langdetect' package

t

Text Meta Transformers

Extract common meta features from text

t

Text Named Entities Transformer

Extract the counts of different named entities in the text (e.g. Person, Organization, Location)

t

Text Pos Tagging Transformer

Extract the count of nouns, verbs, adjectives and adverbs in the text

t

Text Preprocessing Transformer

Preprocess the text column by stemming, lemmatization and stop word removal

t

Text Readability Transformers

Custom Recipe to extract Readability features from the text data ## About Readability Features ## References - https://github.com/shivam5992/textstat - http://www.readabilityformulas.com/free-readability-formula-tests.php

t

Text Sentiment Transformer

Extract sentiment from text using pretrained models from TextBlob

t

Text Similarity Transformers

Row-by-row similarity between two text columns based on common N-grams, Jaccard similarity, Dice similarity and edit distance.

t

Text Spelling Correction Transformers

Correct the spelling of text column

t

Text Topic Modeling Transformer

Extract topics from text column using LDA

t

Text Url Summary Transformer

Extract text from URL and summarizes it

t

Vader Text Sentiment Transformer

Extract sentiment from text using lexicon and rule-based sentiment analysis tool called VADER

t

Count Missing Values Transformer

Count of missing values per row

t

Missing Flag Transformer

Returns 1 if a value is missing, or 0 otherwise

t

Specific Column Transformer

Example of a transformer that operates on the entire original frame, and hence on any column(s) desired.

t

Simple Grok Parser

Extract column data using grok patterns

t

Strlen Transformer

Returns the string length of categorical values

t

To String Transformer

Converts numbers to strings

t

User Agent Transformer

A best effort transformer to determine browser device characteristics from a user-agent string

t

Signal Processing

This custom transformer processes signal files to create features used by DriverlessAI to solve a regression problem

t

Geodesic

Calculates the distance in miles between two latitude/longitude points in space

t

Myhaversine

Computes miles between first two *_latitude and *_longitude named columns in the data set

d

Groupagg

Aggregation features on numeric columns across multiple categorical columns

d

Airlines

Create airlines dataset

d

Airlines Joined Data Flights In Out

Create augmented airlines datasets

d

Airlines Joined Data Flights In Out Regression

Create augmented airlines datasets for regression

d

Airlines Multiple

Create airlines dataset

d

Catchallenge

Create cat challenge dataset

d

Data Template

Custom data recipe base class

d

Seattle Rain Modify

Transpose the Monthly Seattle Rain Inches data set for Time Series use cases

d

Seattle Rain Upload

Upload Monthly Seattle Rain Inches data set from data provided by the City of Seattle