recpkg package

Submodules

recpkg.evaluation module

recpkg.evaluation.evaluate_model(ModelConstructor, model_name, X_train, X_test, y_train, y_test, seed_val, configs, plot=False, nb=False)

Evaluate multiple of model configs.

Parameters
  • ModelConstructor (KerasRecommender) – The constructor of the model which is being evaluated.

  • model_name (String) – The name of the model which is being evaluated.

  • X_train (ndarray of shape (n_samples, 2)) – This is the train set. An array where each row consists of a user and an item.

  • X_test (ndarray of shape (n_samples, 2)) – This is the test set. An array where each row consists of a user and an item.

  • y_train (ndarray of shape (n_samples,)) – This is the train set. An array where each entry denotes interactions between the corresponding user and item.

  • y_test (ndarray of shape (n_samples,)) – This is the train set. An array where each entry denotes interactions between the corresponding user and item.

  • seed_val (int) – A random seed.

  • configs (List[Dict[String, Object]]) – A list of dictionaries of keyword arguments to be applied in the model’s constructor.

  • plot (bool) – Should training plots be made?

  • nb (bool) – Whether or not model is running in a Jupyter notebook.

Returns

The configs, trained models, history dataframes, training plots, and groupwise evaluations.

Return type

Dict[String, Dict]

recpkg.evaluation.plot_metric_history(history_df, title='')

Plot each metric versus epochs.

Parameters
  • history_df (pandas.DataFrame) – A tidy dataframe with epoch, metric, and value columns.

  • title (String) – Text which will be prepended to the title of each graph.

Returns

The metric plots.

Return type

List[FacetGrid, ..]

recpkg.explicit module

class recpkg.explicit.FunkSVD(users, items, latent_factors=100, epochs=10, learning_rate=0.005, regularization_term=0.02, verbose=False, nb=False)

Bases: recpkg.recommenders.Recommender

Recommender implementing Funk SVD.

Funk SVD without global baselines.

Parameters
  • user (ndarray) – An array of the users.

  • item (ndarray) – An array of the items.

  • latent_factors (int) – The number of latent factors.

  • epochs (int) – The number of epochs to train the NN.

  • learning_rate (float) – The learning rate of the model.

  • regularization_term (float) – The regularization term of the model.

  • verbose (bool) – Whether or not to print verbose output.

  • nb (bool) – Whether or not model is running in a Jupyter notebook.

create_latent_factor_matrices()

Create matrices for the latent factors of the users and items.

Creates the matrices which represent the factorization of the user-item matrix. In the user latent factor matrix, the rows are the users and the columns are the latent factors. In the item latent factor matrix, the rows are latent factors and the columns are the items.

Returns

The latent factor matrices for users and items respectively.

Return type

Tuple[ndarray, ndarray]

fit(X=None, y=None)

Fit the recommender from the training dataset.

Parameters
  • X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

  • y (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

predict(X=None)

Predict the scores for the provided data.

Parameters

X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

Returns

Class labels for each data sample.

Return type

ndarray of shape (n_samples,)

predict_rating(user_i, item_i)

Predict the rating for an item by the given user.

Parameters
  • user_i (int) – The user index.

  • item_i (int) – The item index.

Returns

The predicted rating of the item by the user.

Return type

float

process_users_items()

Create dictionaries mapping user and item ids to indexes.

Replicates the functionality provided by Keras’s IntegerLookup.

train_pair(user, item, actual_rating)

Train the model on a single user-item pair.

Parameters
  • user (int) – The user id.

  • item (int) – The item id.

  • actual_rating (float) – The rating of the item by the user.

Returns

The difference between the true and predicted ratings.

Return type

float

class recpkg.explicit.MatrixFactorization(n_factors=100, epochs=10, optimizer=<tensorflow.python.keras.optimizer_v2.gradient_descent.SGD object>, loss=<tensorflow.python.keras.losses.MeanSquaredError object>, metrics=[<tensorflow.python.keras.metrics.RootMeanSquaredError object>, <tensorflow.python.keras.metrics.MeanAbsoluteError object>], seed=None, user_input=None, item_input=None, user_preprocessing_layers=None, item_preprocessing_layers=None)

Bases: recpkg.recommenders.KerasRecommender

Recommender implementing Funk SVD with a NN.

Parameters
  • n_factors (int) – The number of latent factors.

  • epochs (int) – The number of epochs to train the NN.

  • optimizer (keras.optimizers.Optimizer) – The model’s optimizer.

  • loss (keras.losses.Loss) – The loss function.

  • metrics (List[keras.metrics.Metric, ..]) – The metric functions.

  • seed (int) – A random seed.

  • user_input (keras.Input) – An input for the users.

  • item_input (keras.Input) – An input for the items.

  • user_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the users.

  • item_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the items.

static create_core_layers(n_factors, user_layers, item_layers)

Creates the core layers of the MF model.

Returns the hidden layers of the model. Specifically, the ones between the inputs and the visible, output layer.

Parameters
  • n_factors (int) – The number of latent factors

  • user_layers (keras.layers.Layer) – The input or preprocessing layers for the users.

  • item_layers (keras.layers.Layer) – The input or preprocessing layers for the items.

Returns

The core layers of the model.

Return type

keras.layers.Layer

create_model()

Creates a new MF model.

recpkg.implicit module

class recpkg.implicit.GeneralizedMatrixFactorization(n_factors=8, epochs=10, optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object>, loss=<tensorflow.python.keras.losses.BinaryCrossentropy object>, metrics=[<tensorflow.python.keras.metrics.BinaryAccuracy object>], seed=None, user_input=None, item_input=None, user_preprocessing_layers=None, item_preprocessing_layers=None)

Bases: recpkg.recommenders.KerasRecommender

Recommender implementing the GMF architecture.

Parameters
  • n_factors (int) – The number of latent factors.

  • epochs (int) – The number of epochs to train the NN.

  • optimizer (keras.optimizers.Optimizer) – The model’s optimizer.

  • loss (keras.losses.Loss) – The loss function.

  • metrics (List[keras.metrics.Metric, ..]) – The metric functions.

  • seed (int) – A random seed.

  • user_input (keras.Input) – An input for the users.

  • item_input (keras.Input) – An input for the items.

  • user_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the users.

  • item_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the items.

static create_core_layers(n_factors, user_layers, item_layers, user_dense_kwdargs={}, item_dense_kwdargs={})

Creates the core layers of the GMF model.

Returns the hidden layers of the model. Specifically, the ones between the inputs and the visible, output layer.

Parameters
  • n_factors (int) – The number of latent factors.

  • user_layers (keras.layers.Layer) – The input or preprocessing layers for the users.

  • item_layers (keras.layers.Layer) – The input or preprocessing layers for the items.

  • user_dense_kwdargs (Dict) – The keyword arguments for the user dense layer.

  • item_dense_kwdargs (Dict) – The keyword arguments for the item dense layer.

Returns

The core layers of the model.

Return type

keras.layers.Layer

create_model()

Creates a new GMF model.

get_core_layers_kwdargs()

Returns the appropriate kwdargs for pretraining core layers.

Returns

The keyword arguments for the user and item dense layers.

Return type

Tuple[Dict, Dict]

get_output_weights()

Returns the kernel and bias for the output layer of this model.

Returns

The kernel and bias.

Return type

List[ndarray, Optional[ndarray]]

class recpkg.implicit.ItemPopularity

Bases: sklearn.base.BaseEstimator

Recommender based solely on interactions per item.

fit(X=None, y=None)

Fit the recommender from the training dataset.

Parameters
  • X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

  • y (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

predict(X=None)

Predict the scores for the provided data.

Parameters

X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

Returns

Class labels for each data sample.

Return type

ndarray of shape (n_samples,)

class recpkg.implicit.MultiLayerPerceptron(n_factors=8, n_hidden_layers=4, epochs=10, optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object>, loss=<tensorflow.python.keras.losses.BinaryCrossentropy object>, metrics=[<tensorflow.python.keras.metrics.BinaryAccuracy object>], seed=None, user_input=None, item_input=None, user_preprocessing_layers=None, item_preprocessing_layers=None)

Bases: recpkg.recommenders.KerasRecommender

Recommender implementing the MLP architecture.

Parameters
  • n_factors (int) – The number of latent factors.

  • n_hidden_layers (int) – The number of hidden layers.

  • epochs (int) – The number of epochs to train the NN.

  • optimizer (keras.optimizers.Optimizer) – The model’s optimizer.

  • loss (keras.losses.Loss) – The loss function.

  • metrics (List[keras.metrics.Metric, ..]) – The metric functions.

  • seed (int) – A random seed.

  • user_input (keras.Input) – An input for the users.

  • item_input (keras.Input) – An input for the items.

  • user_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the users.

  • item_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the items.

static create_core_layers(n_factors, n_hidden_layers, user_layers, item_layers, hidden_layers_kwdargs=[])

Creates the core layers of the MLP model.

Returns the hidden layers of the model. Specifically, the ones between the inputs and the visible, output layer.

Parameters
  • n_factors (int) – The number of latent factors.

  • user_layers (keras.layers.Layer) – The input or preprocessing layers for the users.

  • item_layers (keras.layers.Layer) – The input or preprocessing layers for the items.

  • hidden_layers_kwdargs (List[Dict, ..]) – The keyword arguments for each hidden layer.

Returns

The core layers of the model.

Return type

keras.layers.Layer

create_model()

Creates a new MLP model.

get_core_layers_kwdargs()

Returns the appropriate kwdargs for pretraining core layers.

Returns

The keyword arguments for the hidden layers.

Return type

Dict[String, Object]

get_output_weights()

Returns the kernel and bias for the output layer of this model.

Returns

The kernel and bias.

Return type

List[ndarray, Optional[ndarray]]

class recpkg.implicit.NeuralMatrixFactorization(gmf_n_factors=8, mlp_n_factors=8, mlp_n_hidden_layers=4, gmf_trained=None, mlp_trained=None, alpha=0.5, epochs=10, optimizer=<tensorflow.python.keras.optimizer_v2.gradient_descent.SGD object>, loss=<tensorflow.python.keras.losses.BinaryCrossentropy object>, metrics=[<tensorflow.python.keras.metrics.BinaryAccuracy object>], seed=None, user_input=None, item_input=None, user_preprocessing_layers=None, item_preprocessing_layers=None)

Bases: recpkg.recommenders.KerasRecommender

Recommender implementing the NeuMF architecture, an ensemble of GMF/MLP.

Parameters
  • gmf_n_factors (int) – The number of latent factors for GMF.

  • mlp_n_factors (int) – The number of latent factors for MLP.

  • mlp_n_hidden_layers (int) – The number of hidden layers.

  • gmf_trained (GeneralizedMatrixFactorization) – A trained GMF model of the same number of factors.

  • mlp_trained (MultiLayerPerceptron) – A trained MLP model of the same number of factors and hidden layers.

  • alpha (float) – The tradeoff between MLP and GMF.

  • epochs (int) – The number of epochs to train the NN.

  • optimizer (keras.optimizers.Optimizer) – The model’s optimizer.

  • loss (keras.losses.Loss) – The loss function.

  • metrics (List[keras.metrics.Metric, ..]) – The metric functions.

  • seed (int) – A random seed.

  • user_input (keras.Input) – An input for the users.

  • item_input (keras.Input) – An input for the items.

  • user_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the users.

  • item_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the items.

create_model()

Creates a new NeuMF model.

Returns

The NeuMF model. It will be pretrained if trained models are provided in the constructor.

Return type

keras.Model

recpkg.metrics module

recpkg.metrics.dcg_score(items)

Calculate the discounted cumulative gain.

Parameters

items (List[float, ..]) – The list of ranked items.

Returns

The DCG score.

Return type

float

recpkg.metrics.ndcg_score(items)

Calculate the normalized discounted cumulative gain.

Parameters

items (List[float, ..]) – The list of ranked items.

Returns

The NDCG score.

Return type

float

recpkg.metrics.perform_groupwise_evaluation(X_test, y_test, y_pred)

Calculate HR@10 and NDCG@10 by user.

Parameters
  • X_test (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

  • y_test (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

  • y_pred (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

Returns

The HR@10 and NDCG@10.

Return type

Dict[str, float]

recpkg.model_selection module

recpkg.model_selection.LeaveMembersOut(*lists, groups=None, n_val=1, n_test=1, seed=None)

Returns indices to split data into train, val, and test sets.

Returns indices of train, test, and validation sets based on the given number of validation and test items per group. The function accepts a variable number of lists, which is included for consistency with similar scikit-learn functions. The length of the lists is used to determine the number of indices. If a list of groups is specified, the specified number of members of each group will be placed in the validation and test sets.

Parameters
  • *lists (List[List, ..]) – One or more lists from which to leave members

  • They should be the same length. (out.) –

  • groups (List) – A list by which the indices may be grouped

  • user ids) This should be the same length as the provided lists. ((e.g.) –

  • n_val (int) – The number of members to be left out for the val set.

  • n_test (int) – The number of members to be left out for the test set.

  • seed (int) – A random seed.

Returns

Lists of indicies for the train, val, and test sets, respectively.

Return type

Tuple[List, List, List]

recpkg.model_selection.sample_n_non_interactions(X, y, user_id, n=100, seed=None)

Samples non-interactions for a given user.

Returns X (user, item) and y (zeros) np arrays of N (100 by default) items which the user has not interacted with.

Parameters
  • X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

  • y (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

  • user_id (int) – The unique identifier of a user.

  • n (int) – The number of non-interactions to sample.

  • seed (int) – A random seed.

Returns

The X and y arrays of non interactions.

Return type

Tuple[ndarray of shape (n_samples, 2), ndarray of shape (n_samples,)])

recpkg.preprocessing module

recpkg.preprocessing.get_standard_layers(values, name=None)

Returns input layer and standard preprocessing layers for given values.

Returns the input and preprocessing layers for the given integer values. The preprocessing consists of IntegerLookup and one-hot encoding via CategoryEncoding.

Parameters
  • values (ndarray) – The integer values of the desired input.

  • name (String) – The name of the values.

Returns

The input and preprocessing layers.

Return type

Tuple[Layer, Layer]

recpkg.recommenders module

class recpkg.recommenders.KerasRecommender(epochs=10, optimizer=<tensorflow.python.keras.optimizer_v2.adam.Adam object>, loss=<tensorflow.python.keras.losses.BinaryCrossentropy object>, metrics=[<tensorflow.python.keras.metrics.BinaryAccuracy object>], seed=None, user_input=None, item_input=None, user_preprocessing_layers=None, item_preprocessing_layers=None)

Bases: recpkg.recommenders.Recommender

Abstract class for recommenders built with Keras models.

Parameters
  • epochs (int) – The number of epochs to train the NN.

  • optimizer (keras.optimizers.Optimizer) – The model’s optimizer.

  • loss (keras.losses.Loss) – The loss function.

  • metrics (List[keras.metrics.Metric, ..]) – The metric functions.

  • seed (int) – A random seed.

  • user_input (keras.Input) – An input for the users.

  • item_input (keras.Input) – An input for the items.

  • user_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the users.

  • item_preprocessing_layers (keras.layers.Layer) – Preprocessing layers for the items.

create_model()

Creates a new Keras model.

fit(X=None, y=None)

Fit the recommender from the training dataset.

Parameters
  • X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

  • y (ndarray of shape (n_samples,)) – An array where each entry denotes interactions between the corresponding user and item.

predict(X=None)

Predict the scores for the provided data.

Parameters

X (ndarray of shape (n_samples, 2)) – An array where each row consists of a user and an item.

Returns

Class labels for each data sample.

Return type

ndarray of shape (n_samples,)

class recpkg.recommenders.Recommender

Bases: sklearn.base.BaseEstimator

Abstract class for recommenders.

Module contents