Normalize continuous values in accordance with the PMML element NormContinuous.

xform_min_max(wrap_object, xform_info = NA, map_missing_to = NA, ...)

Arguments

wrap_object

Output of xform_wrap or another transformation function.

xform_info

Specification of details of the transformation.

map_missing_to

Value to be given to the transformed variable if the value of the input variable is missing.

...

Further arguments passed to or from other methods.

Value

R object containing the raw data, the transformed data and data statistics.

Details

Given input data in a xform_wrap format, normalize the given data values to lie between provided limits.

Given an input variable named InputVar, the name of the transformed variable OutputVar, the desired minimum value the transformed variable may have low_limit, the desired maximum value the transformed variable may have high_limit, and the desired value of the transformed variable if the input variable value is missing missingVal, the xform_min_max command including all the optional parameters is in the format:

formInfo = "InputVar -> OutputVar[low_limit,high_limit]"
map_missing_to = "missingVal"

There are two ways to refer to variables. The first way is to use the variable's column number; given the data attribute of the boxData object, this would be the order at which the variable appears. This can be indicated in the format "column#". The second way is to refer to the variable by its name.

The name of the transformed variable is optional; if the name is not provided, the transformed variable is given the name: "derived_" + original_variable_name. Similarly, the low and high limit values are optional; they have the default values of 0 and 1 respectively. missingValue is an optional parameter as well. It is the value of the derived variable if the input value is missing.

If no input variable names are provided, by default all numeric variables are transformed. Note that in this case a replacement value for missing input values cannot be specified; the same applies to the low_limit and high_limit parameters.

See also

Author

Tridivesh Jena

Examples

# Load the standard iris dataset:
data(iris)

# First wrap the data:
iris_box <- xform_wrap(iris)

# Normalize all numeric variables of the loaded iris dataset to lie
# between 0 and 1. These would normalize "Sepal.Length", "Sepal.Width",
# "Petal.Length", "Petal.Width" to the 4 new derived variables named
# derived_Sepal.Length, derived_Sepal.Width, derived_Petal.Length,
# derived_Petal.Width.
iris_box_1 <- xform_min_max(iris_box)

# Normalize the 1st column values of the dataset (Sepal.Length) to lie
# between 0 and 1 and give the derived variable the name "dsl".
iris_box_1 <- xform_min_max(iris_box, xform_info = "column1 -> dsl")

# Repeat the above operation; adding the new transformed variable to
# the iris_box object.
iris_box <- xform_min_max(iris_box, xform_info = "column1 -> dsl")

# Transform Sepal.Width(the 2nd column).
# The new transformed variable will be given the default name
# "derived_Sepal.Width".
iris_box_3 <- xform_min_max(iris_box, xform_info = "column2")

# Repeat the same operation as above, this time using the variable name.
iris_box_4 <- xform_min_max(iris_box, xform_info = "Sepal.Width")

# Repeat the same operation as above, now assigning the transformed variable,
# "derived_Sepal.Width", the value of 0.5 if the input value of the
# "Sepal.Width" variable is missing.
iris_box_5 <- xform_min_max(iris_box, xform_info = "Sepal.Width", "map_missing_to=0.5")

# Transform Sepal.Width(the 2nd column) to lie between 2 and 3.
# The new transformed variable will be given the default name
# "derived_Sepal.Width".
iris_box_6 <- xform_min_max(iris_box, xform_info = "column2->[2,3]")

# Repeat the above transformation, this time the transformed variable
# lies between 0 and 10.
iris_box_7 <- xform_min_max(iris_box, xform_info = "column2->[,10]")