H2O.ai Catalog

Extend the power of Driverless AI with custom recipes and build your own AI!

m

Monotonic Models

LightGBM/XGBoostGBM/DecisionTree with user-given monotonicity constraints (1/-1/0) for original numeric features

m

Exponential Smoothing

Linear Model on top of Exponential Weighted Moving Average Lags for Time-Series. Provide appropriate lags and past outcomes during batch scoring for best results.

m

Fb Prophet

Prophet by Facebook for TimeSeries with an example of parameter mutation.

m

Fb Prophet Parallel

Prophet by Facebook for TimeSeries with an example of parameter mutation.

m

Historic Mean

Historic Mean for Time-Series problems. Predicts the mean of the target for each timegroup for regression problems.

m

Calibratedclassifier

Calibrated Classifier Model: To calibrate predictions using Platt's scaling, Isotonic Regression or Splines

m

Catboost

CatBoost gradient boosting by Yandex. Currently supports regression and binary classification.

m

Daal Trees

Binary Classification and Regression for Decision Forest and Gradient Boosting based on Intel DAAL

m

Extra Trees

Extremely Randomized Trees (ExtraTrees) model from sklearn

m

Extremeclassifier

Extreme Classifier Model: To speed up train of multiclass model (100s of classes) for lightGBM. Caution: can only be used for AUC (or GINI) and accuracy metrics. Based on: Extreme Classification in Log Memory using Count-Min Sketch: https://arxiv.org/abs/1910.13830

m

H2O 3 Gbm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GBM

m

H2O 3 Models

H2O-3 Distributed Scalable Machine Learning Models (DL/GLM/GBM/DRF/NB/AutoML)

m

H2O Glm Poisson

H2O-3 Distributed Scalable Machine Learning Models: Poisson GLM

m

Knearestneighbour

K-Nearest Neighbor implementation by sklearn. For small data (< 200k rows).

m

Libfm Fastfm

LibFM implementation of fastFM

m

Linear Svm

Linear Support Vector Machine (SVM) implementation by sklearn. For small data.

m

Logistic Regression

Logistic Regression based upon sklearn.

m

Nusvm

Nu-SVM implementation by sklearn. For small data.

m

Random Forest

Random Forest (RandomForest) model from sklearn

m

Lightgbm Quantile Regression

Modified version of Driverless AI's internal LightGBM implementation with for quantile regression

m

Lightgbm Tweedie

Modified version of Driverless AI's internal LightGBM implementation with tweedie distribution

m

Lightgbm With Custom Loss

Modified version of Driverless AI's internal LightGBM implementation with a custom objective function (used for tree split finding).

m

Xgboost With Custom Loss

Modified version of Driverless AI's internal XGBoost implementation with a custom objective function (used for tree split finding).

m

Model Decision Tree Linear Combo

Decision tree plus linear model

m

Model Gam

Generalized Additive Model

m

Model Skopes Rules

Skopes rules

m

Model Ga2M

Explainable Boosting Machines (EBM), implementation of GA2M

m

Model Xnn

Explainable neural net

m

Finbert

Custom Bert model which uses FinBert pretrained weights. Can easily be adapted to other pretrained models, like SciBert.

m

Text Binary Count Logistic

Text classification model using binary count of words

m

Text Tfidf Model

Text classification / regression model using TFIDF

m

Text Tfidf Model Continuous

Text classification model using TFIDF

s

Huber Loss

Huber Loss for Regression or Binary Classification. Robust loss, combination of quadratic loss and linear loss.

s

F3 Score

F3 Score

s

F4 Score

F4 Score

s

Precision

Weighted Precision: `TP / (TP + FP)` at threshold for optimal F1 Score.

s

Recall

Weighted Recall: `TP / (TP + FN)` at threshold for optimal F1 Score.

s

Average Mcc

Averaged Matthews Correlation Coefficient (averaged over several thresholds, for imbalanced problems). Example how to use Driverless AI's internal scorer.

s

Brier Loss

Brier Loss

s

Cost

Using hard-coded dollar amounts x for false positives and y for false negatives, calculate the cost of a model using: `(x * FP + y * FN) / N`

s

Cost Access To Data

Same as CostBinary, but provides access to full Data

s

Cost Smooth

Using hard-coded dollar amounts x for false positives and y for false negatives, calculate the cost of a model using: `(1 - y_true) * y_pred * fp_cost + y_true * (1 - y_pred) * fn_cost`

s

False Discovery Rate

Weighted False Discovery Rate: `FP / (FP + TP)` at threshold for optimal F1 Score.

s

Logloss With Costs

Logloss with costs associated with each type of 4 outcomes - typically applicable to fraud use case

s

Marketing Campaign

Computes the mean profit per outbound marketing letter, given a fraction of the population addressed, and fixed cost and reward

s

Profit

Uses domain information about user behavior to calculate the profit or loss of a model.

s

Hamming Loss

Hamming Loss - Misclassification Rate (1 - Accuracy)

s

Map@K

Mean Average Precision @ k (MAP@k)

s

Quadratic Weighted Kappa

Qudratic Weighted Kappa

s

Wape Scorer

Weighted Absoluted Percent Error

s

Asymmetric Mae

MAE with a penalty that differs for positive and negative errors

s

Cosh Loss

Hyperbolic Cosine Loss

s

Explained Variance

Explained Variance. Fraction of variance that is explained by the model.

s

Largest Error

Largest error for regression problems. Highly sensitive to outliers.

s

Log Mae

Log Mean Absolute Error for regression

s

Mean Absolute Scaled Error

Mean Absolute Scaled Error for time-series regression

s

Mean Squared Log Error

Mean Squared Log Error for regression

s

Median Absolute Error

Median Absolute Error for regression

s

Pearson Correlation

Pearson Correlation Coefficient for regression

s

Quantile Loss

Quantile Loss regression

s

Rmse With X

Custom RMSE Scorer that also gets X (original features) - for demo/testing purposes only

s

Top Decile

Median Absolute Error for predictions in the top decile

t

First N Char Cvte

Target-encode high cardinality categorical text by their first few characters in the string

t

Log Scale Target Encoding

Target-encode numbers by their logarithm

t

Germany Landers Holidays

Returns a flag for whether a date falls on a holiday for each of Germany's Bundeslaender.

t

Ip Address Features

Parses IP addresses and networks and extracts its properties.

t

Is Ramadan

Returns a flag for whether a date falls on Ramadan in Saudi Arabia

t

Singapore Public Holidays

Flag for whether a date falls on a public holiday in Singapore.

t

Usairportcode Origin Dest

Transformer to parse and augment US airport codes with geolocation info.

t

Usairportcode Origin Dest Geo Features

Transformer to augment US airport codes with geolocation info.

t

Uszipcode Features Database

Transformer to parse and augment US zipcodes with info from zipcode database.

t

Uszipcode Features Light

Lightweight transformer to parse and augment US zipcodes with info from zipcode database.

t

Auto Arima Forecast

Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models.

t

General Time Series Transformer

Demonstrates the API for custom time-series transformers.

t

Parallel Auto Arima Forecast

Parallel Auto ARIMA transformer is a time series transformer that predicts target using ARIMA models.In this implementation, Time Group Models are fitted in parallel

t

Parallel Prophet Forecast

Parallel FB Prophet transformer is a time series transformer that predicts target using FBProphet models.

t

Parallel Prophet Forecast Using Individual Groups

Parallel FB Prophet transformer is a time series transformer that predicts target using FBProphet models.This transformer fits one model for each time group column values and is significantly fasterthan the implementation available in parallel_prophet_forecast.py.

t

Serial Prophet Forecast

Transformer that uses FB Prophet for time series prediction.Please see the parallel implementation for more information

t

Time Encoder Transformer

converts the Time Column to an ordered integer

t

Trading Volatility

Calculates Historical Volatility for numeric features (makes assumptions on the data)

t

Datetime Diff Transformer

Difference in time between two datetime columns

t

Datetime Encoder Transformer

Converts datetime column into an integer (milliseconds since 1970)

t

Days Until Dec2020

Creates new feature for any date columns, by computing the difference in days between the date value and 31st Dec 2020

t

Pe Data Directory Features

Extract LIEF features from PE files

t

Pe Exports Features

Extract LIEF features from PE files

t

Pe General Features

Extract LIEF features from PE files

t

Pe Header Features

Extract LIEF features from PE files

t

Pe Imports Features

Extract LIEF features from PE files

t

Pe Normalized Byte Count

Extract LIEF features from PE files

t

Pe Section Characteristics

Extract LIEF features from PE files

t

Audio Mfcc Transformer

Extract MFCC and spectrogram features from audio files

t

Azure Speech To Text

An example of integration with Azure Speech Recognition Service

t

Image Ocr Transformer

Convert a path to an image to text using OCR based on tesseract

t

Image Url Transformer

Convert a path to an image (JPG/JPEG/PNG) to a vector of class probabilities created by a pretrained ImageNet deeplearning model (Keras, TensorFlow).

t

Matrixfactorization

Collaborative filtering features using various techniques of Matrix Factorization for recommendations.Recommended for large data

t

Boxcox Transformer

Box-Cox Transform

t

Count Negative Values Transformer

Count of negative values per row

t

Count Positive Values Transformer

Count of positive values per row

t

Exp Diff Transformer

Exponentiated difference of two numbers

t

Log Transformer

Converts numbers to their Logarithm

t

Product

Products together 3 or more numeric features

t

Random Transformer

Creates random numbers

t

Round Transformer

Rounds numbers to 1, 2 or 3 decimals

t

Square Root Transformer

Converts numbers to the square root, preserving the sign of the original numbers

t

Sum

Adds together 3 or more numeric features

t

Truncated Svd All

Truncated SVD for all columns

t

Yeojohnson Transformer

Yeo-Johnson Power Transformer

t

H2O3 Dl Anomaly

Anomaly score for each row based on reconstruction error of a H2O-3 deep learning autoencoder

t

Quantile Winsorizer

Winsorizes (truncates) univariate outliers outside of a given quantile threshold

t

Twosigma Winsorizer

Winsorizes (truncates) univariate outliers outside of two standard deviations from the mean.

t

Expandingmean

CatBoost-style target encoding. See https://youtu.be/d6UMEmeXB6o?t=818 for short explanation

t

Leaky Mean Target Encoder

Example implementation of a out-of-fold target encoder (leaky, not recommended)

t

Continuous Texttransformer

🙄 Not available yet ...

t

Fuzzy Text Similarity Transformers

Row-by-row similarity between two text columns based on FuzzyWuzzy

t

Text Binary Count Transformer

Explainable Text transformer that uses binary counts of words using sklearn's CountVectorizer

t

Text Char Tfidf Count Transformers

Character level TFIDF and Count followed by Truncated SVD on text columns

t

Text Embedding Similarity Transformers

Row-by-row similarity between two text columns based on pretrained Deep Learning embedding space

t

Text Lang Detect Transformer

Detect the language for a text value using Google's 'langdetect' package

t

Text Meta Transformers

Extract common meta features from text

t

Text Named Entities Transformer

Extract the counts of different named entities in the text (e.g. Person, Organization, Location)

t

Text Pos Tagging Transformer

Extract the count of nouns, verbs, adjectives and adverbs in the text

t

Text Preprocessing Transformer

Preprocess the text column by stemming, lemmatization and stop word removal

t

Text Readability Transformers

Custom Recipe to extract Readability features from the text data

t

Text Sentiment Transformer

Extract sentiment from text using pretrained models from TextBlob

t

Text Similarity Transformers

Row-by-row similarity between two text columns based on common N-grams, Jaccard similarity, Dice similarity and edit distance.

t

Text Spelling Correction Transformers

Correct the spelling of text column

t

Text Topic Modeling Transformer

Extract topics from text column using LDA

t

Text Url Summary Transformer

Extract text from URL and summarizes it

t

Vader Text Sentiment Transformer

Extract sentiment from text using lexicon and rule-based sentiment analysis tool called VADER

t

Count Missing Values Transformer

Count of missing values per row

t

Missing Flag Transformer

Returns 1 if a value is missing, or 0 otherwise

t

Specific Column Transformer

Example of a transformer that operates on the entire original frame, and hence on any column(s) desired.

t

Simple Grok Parser

Extract column data using grok patterns

t

Strlen Transformer

Returns the string length of categorical values

t

To String Transformer

Converts numbers to strings

t

User Agent Transformer

A best effort transformer to determine browser device characteristics from a user-agent string

t

Signal Processing

This custom transformer processes signal files to create features used by DriverlessAI to solve a regression problem

t

Geodesic

Calculates the distance in miles between two latitude/longitude points in space

t

Myhaversine

Computes miles between first two *_latitude and *_longitude named columns in the data set

t

Dummy Pretransformer

Dummy Pre-Transformer to use as a template for custom pre-transformer recipes. This transformer consumes all features at once, adds 'pre:' to the names and passes them down to transformer level and GA as-is.

t

H2O 3 Coxph Pretransformer

Pre-transformer utilizing survival analysis modeling using CoxPH (Cox proportional hazard) using H2O-3 CoxPH function. It adds risk score produced by CoxPH model and drops stop_column feature used for survival modeling along with actual target as event.

d

Group Aggregation

Aggregation features on numeric columns across multiple categorical columns

d

K Means Clustering

Data Recipe to perform KMeans Clustering on a dataset.

d

Airlines

Create airlines dataset

d

Airlines Joined Data Flights In Out

Create augmented airlines datasets

d

Airlines Joined Data Flights In Out Regression

Create augmented airlines datasets for regression

d

Airlines Multiple

Create airlines dataset

d

Audio To Image

Data recipe to transform input audio to Mel spectrogramsThis data recipe makes the following steps:1. Reads audio file2. Converts audio file to the Mel spectrogram3. Save Mel spectrogram to .png image4. Upload image dataset to DAIRecipe is based on the Kaggle Freesound Audio Tagging 2019 challenge:https://www.kaggle.com/c/freesound-audio-tagging-2019To use the recipe follow the next steps:1. Download a subsample of the audio dataset from here:http://h2o-public-test-data.s3.amazonaws.com/bigdata/server/Image Data/freesound_audio.zip2. Unzip it and specify the path to the dataset in the DATA_DIR global variable3. Upload the dataset into Driverless AI using the Add Data Recipe optionThe transformed dataset is also available and could be directly uploaded to Driverless AI:http://h2o-public-test-data.s3.amazonaws.com/bigdata/server/Image Data/freesound_images.zip

d

Covidtracking Daily By States

Upload daily Covid Tracking (https://covidtracking.com) US States cases, hospitalization, recovery, test and death data

d

Create Transactional Data Or Convert To Iid

Example code to generate and convert transactional data to i.i.d. data.

d

Creditcard

Modify credit card dataset

d

Data Template

Custom data recipe base class

d

Feature Selection

🙄 Not available yet ...

d

Generate Random Int Columns

Data recipe to add one or more columns containing random integers.

d

Ieee Data Puddle

Data recipe to prepare data for Kaggle IEEE-CIS Fraud Detection https://www.kaggle.com/c/ieee-fraud-detection

d

Kaggle Bosch

Create Bosch competition datasets with leak

d

Kaggle Ieee Fraud

Data recipe to prepare data for Kaggle IEEE-CIS Fraud Detection https://www.kaggle.com/c/ieee-fraud-detection

d

Kaggle M5

Prepare data for m5 Kaggle Time-Series Forecast competition

d

Keywords Data

Check and match a list of words from a specific string column

d

Load Sas7Bdat

Data Recipe to load a single sas file__version__ = 0.1authored by @mtanco (Michelle Tanco)Required User Defined Inputs: name of file to load

d

Mnist

Prep and upload the MNIST datasset

d

Mozilla Deepspeech Wav2Txt

Speech to text using Mozilla's DeepSpeechSettings for this recipe:Assing MODEL_PATH global variable prior to usageAssign WAV_COLNAME global variable with proper column name from your dataset.This colums should contain absolute paths to .wav file which needs to be converted to text.General requirements to .wav's:1 channel (mono)16 bit16000 frequency

d

Nytimes Covid19 Cases Deaths By Counties

Upload daily COVID-19 cases and deaths in US by counties - NY Times github Source: nytimes/covid-19-data Coronavirus (Covid-19) Data in the United States https://github.com/nytimes/covid-19-data

d

Nytimes Covid19 Cases Deaths By States

Upload daily COVID-19 cases and deaths in US by states from NY Times github

d

Nytimes Covid19 Cases Deaths Us

Upload daily COVID-19 cases and deaths in US total from NY Times github

d

Owid Covid19 Cases Deaths By Countries

Upload daily COVID-19 cases and deaths by countries Source: Our World in Data. It is updated daily and includes data on confirmed cases, deaths, and testing. https://ourworldindata.org/coronavirus-source-data

d

Seattle Rain Modify

Transpose the Monthly Seattle Rain Inches data set for Time Series use cases

d

Seattle Rain Upload

Upload Monthly Seattle Rain Inches data set from data provided by the City of Seattle

d

Ts Fill N Cluster

Data Recipe to fill missing values in TS data and then create new data sets from TS Clustering

d

Two Sigma Rental

🙄 Not available yet ...

d

Video To Image

Data recipe to transform input video to the images.This data recipe makes the following steps:1. Reads video file2. Samples N uniform frames from the video file3. Detects all faces on each frame4. Crops the faces and saves them as imagesRecipe is based on the Kaggle Deepfake Detection Challenge:https://www.kaggle.com/c/deepfake-detection-challengeTo use the recipe follow the next steps:1. Download a small subsample of the video dataset from here:http://h2o-public-test-data.s3.amazonaws.com/bigdata/server/Image Data/deepfake.zip2. Unzip it and specify the path to the dataset in the DATA_DIR global variable3. Upload the dataset into Driverless AI using the Add Data Recipe optionThe transformed dataset is also available and could be directly uploaded to Driverless AI:http://h2o-public-test-data.s3.amazonaws.com/bigdata/server/Image Data/deepfake_frames.zip

d

Wav2Txt

Speech to text using Azure Cognitive ServicesSettings for this recipe:Assing AZURE_SERVICE_KEY and AZURE_SERVICE_REGION global variable prior to usageAssign WAV_COLNAME global variable with proper column name from your dataset.This colums should contain absolute paths to .wav file which needs to be converted to text.

d

Create Dataset From Mongodb Collection

Create dataset from MonogDB

d

Sentiment Score

Data recipe to get sentiment score using textblob

d

Sentiment Score Vader

Data recipe to get sentiment score using vader

d

Text Summarization

Data recipe to get summary of text using gensim

d

Tokenize Chinese

Chinese text tokenization using jieba package - https://github.com/fxsjy/jieba

d

Topic Modeling

Data recipe to perform topic modeling

d

Twitter Preprocessing Recipe

Preprocess the tweets by normalising username, removing unnecessatry punctuations, exapanding the hashtags

Build your own AI!

Copyright © H2O.ai 2020.