Pre-Processing Exporter Module

def any_in(seq_a, seq_b)[source]

Checks for common elements in two given sequence elements

Parameters:
  • seq_a (list) – A list of items
  • seq_b (list) – A list of items
Returns:

Return type:

Returns a boolean value if any item of seq_a belongs to seq_b or visa versa

def binarizer(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s Binarizer

Parameters:
  • trfm – Contains the Sklearn’s Binarizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Binarizer preprocessing.

Return type:

dictionary

def cat_imputer(trfm, col_names)[source]

Generates pre-processing elements for sklearn-pandas’ CategoricalImputer

Parameters:
  • trfm – Contains the Sklearn’s Imputer preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Imputer preprocessing.

Return type:

dictionary

def count_vectorizer(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s CountVectorizer

Parameters:
  • trfm – Contains the Sklearn’s CountVectorizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to CountVectorizer preprocessing.

Return type:

dictionary

def get_class_name(cls)[source]

Provides the class name for the given instance

Parameters:cls – Contains the Sklearn’s preprocessing instance
Returns:
Return type:Returns the class name of the pre-processed object.
def get_derived_colnames(trfm_name, col_names, *args)[source]

Generates derived column names for a given transformer

Parameters:
  • trfm_name (String) – Name of the derived field to be assigned after preprocessing
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pml_pp – Returns a list that contains names of the preprocessed features.

Return type:

list

def get_pml_derived_flds(trfm, col_names, **kwargs)[source]

Generates elements related to pre-processing for a given transformer object

Parameters:
  • trfm – Contains the Sklearn’s preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pml_pp – Returns a dictionary that contains attributes related to any preprocessing function .

Return type:

dictionary

def get_preprocess_val(ppln_sans_predictor, initial_colnames, model)[source]

Generates elements related to pre-processing

Parameters:
  • model – Contains an instance of Sklearn model
  • ppln_sans_predictor – Contains an instance of Sklearn Pipeline
  • initial_colnames (list) – Contains list of feature/column names.
Returns:

pml_pp – Returns a dictionary that contains data related to pre-processing

Return type:

dictionary

def imputer(trfm, col_names, **kwargs)[source]

Generates pre-processing elements for Scikit-Learn’s Imputer

Parameters:
  • trfm – Contains the Sklearn’s Imputer preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Imputer preprocessing.

Return type:

dictionary

def lag(trfm, col_names)[source]

Generates pre-processing elements for Nyoka’s Lag

Parameters:
  • trfm – Contains the Nyoka’s Lag instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Lag preprocessing.

Return type:

dictionary

def lbl_binarizer(trfm, col_names, **kwargs)[source]

Generates pre-processing elements for Scikit-Learn’s LabelBinarizer

Parameters:
  • trfm – Contains the Sklearn’s Label Binarizer preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Label Binarizer preprocessing.

Return type:

dictionary

def lbl_encoder(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s LabelEncoder

Parameters:
  • trfm – Contains the Sklearn’s LabelEncoder preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to LabelEncoder preprocessing.

Return type:

dictionary

def max_abs_scaler(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s MaxAbsScaler

Parameters:
  • trfm – Contains the Sklearn’s MaxabsScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to MaxabsScaler preprocessing.

Return type:

dictionary

def min_max_scaler(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s MinMaxScaler

Parameters:
  • trfm – Contains the Sklearn’s MinMaxScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to MinMaxScaler preprocessing.

Return type:

dictionary

def one_hot_encoder(trfm, col_names, **kwargs)[source]

Generates pre-processing elements for Scikit-Learn’s OneHotEncoder

Parameters:
  • trfm – Contains the Sklearn’s One hot encoder preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Label Binarizer preprocessing.

Return type:

dictionary

def pca(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s PCA

Parameters:
  • trfm – Contains the Sklearn’s PCA preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to PCA preprocessing.

Return type:

dictionary

def polynomial_features(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s PolynomialFeatures

Parameters:
  • trfm – Contains the Sklearn’s PolynomialFeatures preprocessing instance.
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to PolynomialFeatures preprocessing.

Return type:

dictionary

def rbst_scaler(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s RobustScaler

Parameters:
  • trfm – Contains the Sklearn’s RobustScaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to RobustScaler preprocessing.

Return type:

dictionary

def std_scaler(trfm, col_names, **kwargs)[source]

Generates pre-processing elements for Scikit-Learn’s StandardScaler

Parameters:
  • trfm – Contains the Sklearn’s Standard Scaler preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to Standard Scaler preprocessing.

Return type:

dictionary

def tfidf_vectorizer(trfm, col_names)[source]

Generates pre-processing elements for Scikit-Learn’s TfIdfVectorizer

Parameters:
  • trfm – Contains the Sklearn’s TfIdfVectorizer preprocessing instance
  • col_names (list) – Contains list of feature/column names. The column names may represent the names of preprocessed attributes.
Returns:

pp_dict – Returns a dictionary that contains attributes related to TfIdfVectorizer preprocessing.

Return type:

dictionary