3 Writing a New Method
3.1 Methods and Generic Functions
Many functions are built as "generic" functions. The idea is, the function checks
what kind of object it is required to act upon, and then "dispatches" the appropriate
"method" (uses the right algorithm or function). If you peak inside functions like
plot
or scale
, you will see that they consist of a call to the function UseMethod
!
scale
function (x, center = TRUE, scale = TRUE)
UseMethod("scale")
<bytecode: 0x0000022fdcb18ca0>
<environment: namespace:base>
However, being "generic" does not necessarily mean a function will work for all kinds
of objects you might want. For instance, scale
has a default
algorithm, but does
not handle all data.frame
s, only those that are entirely numeric.
methods(scale)
[1] scale.default
see '?methods' for accessing help and source code
head(scale(mtcars))
mpg cyl disp hp drat
Mazda RX4 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137
Mazda RX4 Wag 0.1508848 -0.1049878 -0.57061982 -0.5350928 0.5675137
Datsun 710 0.4495434 -1.2248578 -0.99018209 -0.7830405 0.4739996
Hornet 4 Drive 0.2172534 -0.1049878 0.22009369 -0.5350928 -0.9661175
Hornet Sportabout -0.2307345 1.0148821 1.04308123 0.4129422 -0.8351978
Valiant -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078
wt qsec vs am gear
Mazda RX4 -0.610399567 -0.7771651 -0.8680278 1.1899014 0.4235542
Mazda RX4 Wag -0.349785269 -0.4637808 -0.8680278 1.1899014 0.4235542
Datsun 710 -0.917004624 0.4260068 1.1160357 1.1899014 0.4235542
Hornet 4 Drive -0.002299538 0.8904872 1.1160357 -0.8141431 -0.9318192
Hornet Sportabout 0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192
Valiant 0.248094592 1.3269868 1.1160357 -0.8141431 -0.9318192
carb
Mazda RX4 0.7352031
Mazda RX4 Wag 0.7352031
Datsun 710 -1.1221521
Hornet 4 Drive -1.1221521
Hornet Sportabout -0.5030337
Valiant -1.1221521
scale(iris)
## Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
The problem here is that data set iris
contains a factor variable. I would like the
scale function to just act on the numeric vectors within any data frame, ignoring
factors, character vectors, and logical vectors. Because scale
is a generic function,
this is easy to do!
I'll go through these typical steps to write a function to scale (center/standardize) all the numeric variables in a data frame:
- Write an example that works
- Turn that into a function
- Test and refine: bomb-proofing, make into method
I'll want my function to return the whole data frame, with just the appropriate variables
(re)scaled. So my final step within the function will be to return
a data.frame.
3.1.1 Step 1: A working example
Make a copy of the data frame and figure out which columns are scalable. Use scale()
on those columns (the default method coerces them into a matrix), returning a matrix.
Use the matrix to write back to the data frame.
# we'll be given a data frame
x <- iris
cols <- sapply(iris, is.numeric)
scaledvars <- scale(iris[, cols])
x[, cols] <- scaledvars
# we'll return(x)
Check your results.
head(x)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
str(x)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num -0.898 -1.139 -1.381 -1.501 -1.018 ...
$ Sepal.Width : num 1.0156 -0.1315 0.3273 0.0979 1.245 ...
$ Petal.Length: num -1.34 -1.34 -1.39 -1.28 -1.34 ...
$ Petal.Width : num -1.31 -1.31 -1.31 -1.31 -1.31 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
3.1.2 Step 2: Edit into a function
Here we just replace iris
with the parameter dfr
.
In RStudio you can use the menus Code -- Extract Function.
scale_df <- function(dfr) {
x <- dfr
cols <- sapply(dfr, is.numeric)
scaledvars <- scale(dfr[, cols])
x[, cols] <- scaledvars
return(x) # or just "x"
}
3.1.3 Step 3: Test it on something
Preferably a couple of things you expect to work, and a couple of things you expect to fail.
z <- scale_df(iris)
head(z)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
3.2 Refinement one: testing the input
Make sure dfr
is a data frame!
scale_df2 <- function(dfr) {
if (!is.data.frame(dfr)) {stop("dfr must be a data frame")}
x <- dfr
cols <- sapply(dfr, is.numeric)
scaledvars <- scale(dfr[, cols])
x[, cols] <- scaledvars
return(x)
}
z <- scale_df2(iris)
head(z)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
scale_df2(iris$Sepal.Length)
Error in scale_df2(iris$Sepal.Length): dfr must be a data frame
3.3 Refinement two: make it a method
We'll chose a function name that makes this a "method" of the generic function, scale()
.
scale.data.frame <- function(dfr) {
if (!is.data.frame(dfr)) {stop("dfr must be a data frame")}
x <- dfr
cols <- sapply(dfr, is.numeric)
scaledvars <- scale.default(dfr[, cols]) # otherwise we get a recursive loop
x[, cols] <- scaledvars
return(x)
}
z <- scale.data.frame(iris)
head(z)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
scale.data.frame(iris$Sepal.Length)
Error in scale.data.frame(iris$Sepal.Length): dfr must be a data frame
3.3.1 Here is the magic!
z <- scale(iris)
head(z)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
str(z)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num -0.898 -1.139 -1.381 -1.501 -1.018 ...
$ Sepal.Width : num 1.0156 -0.1315 0.3273 0.0979 1.245 ...
$ Petal.Length: num -1.34 -1.34 -1.39 -1.28 -1.34 ...
$ Petal.Width : num -1.31 -1.31 -1.31 -1.31 -1.31 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(scale(iris$Sepal.Length)) # this now works by the default method
[,1]
[1,] -0.8976739
[2,] -1.1392005
[3,] -1.3807271
[4,] -1.5014904
[5,] -1.0184372
[6,] -0.5353840
attributes(scale(iris$Sepal.Length)) # notice the attributes at the end
$dim
[1] 150 1
$`scaled:center`
[1] 5.843333
$`scaled:scale`
[1] 0.8280661
3.4 Refinement three: better return
Better error message, keep attributes.
scale.data.frame <- function(dfr) {
if (!is.data.frame(dfr)) {stop(paste(deparse(substitute(dfr)), "must be a data frame"))}
x <- dfr
cols <- sapply(dfr, is.numeric)
scaledvars <- scale.default(dfr[, cols]) # otherwise we get a recursive loop
x[, cols] <- scaledvars
attr(x, "scaled:center") <- attr(scaledvars, "scaled:center")
attr(x, "scaled:scale") <- attr(scaledvars, "scaled:scale")
return(x)
}
z <- scale(iris)
head(z)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 -0.8976739 1.01560199 -1.335752 -1.311052 setosa
2 -1.1392005 -0.13153881 -1.335752 -1.311052 setosa
3 -1.3807271 0.32731751 -1.392399 -1.311052 setosa
4 -1.5014904 0.09788935 -1.279104 -1.311052 setosa
5 -1.0184372 1.24503015 -1.335752 -1.311052 setosa
6 -0.5353840 1.93331463 -1.165809 -1.048667 setosa
str(z)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num -0.898 -1.139 -1.381 -1.501 -1.018 ...
$ Sepal.Width : num 1.0156 -0.1315 0.3273 0.0979 1.245 ...
$ Petal.Length: num -1.34 -1.34 -1.39 -1.28 -1.34 ...
$ Petal.Width : num -1.31 -1.31 -1.31 -1.31 -1.31 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
- attr(*, "scaled:scale")= Named num [1:4] 0.828 0.436 1.765 0.762
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
The error message will only be used if someone tries to bypass scale()
and use scale.data.frame()
directly
scale.data.frame(iris$Sepal.Length)
Error in scale.data.frame(iris$Sepal.Length): iris$Sepal.Length must be a data frame
3.5 Refinement four: passing parameters
Pass scale
and center
options to scale()
scale.data.frame <- function(dfr, ...) {
if (!is.data.frame(dfr)) {stop(paste(deparse(substitute(dfr)), "must be a data frame"))}
x <- dfr
cols <- sapply(dfr, is.numeric)
scaledvars <- scale.default(dfr[, cols], ...) # otherwise we get a recursive loop
x[, cols] <- scaledvars
attr(x, "scaled:center") <- attr(scaledvars, "scaled:center")
attr(x, "scaled:scale") <- attr(scaledvars, "scaled:scale")
return(x)
}
z <- scale(iris, scale=FALSE)
str(z)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num -0.743 -0.943 -1.143 -1.243 -0.843 ...
$ Sepal.Width : num 0.4427 -0.0573 0.1427 0.0427 0.5427 ...
$ Petal.Length: num -2.36 -2.36 -2.46 -2.26 -2.36 ...
$ Petal.Width : num -0.999 -0.999 -0.999 -0.999 -0.999 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "scaled:center")= Named num [1:4] 5.84 3.06 3.76 1.2
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
3.6 Exercises
Plot
does not have specific methods for logical or character vectors. For logical vectors, if coerces them to numeric type, then plots, for characters it just gives up. Write two functions that create bar charts for these types of vectors, and make themplot
methods.Write a
mean
method for data frames.
Last revised: 12/26/2014