6 R Language Elements

To write R scripts, it helps to be able to read the R documentation. We will begin with some of the jargon used to describe the R language and then look at the rules for writing R commands that the computer will be able to interpret.

Then we will examine some of the standard elements of the help pages, and look at how new commands are added in the form of packages.

The fundamental unit of work in R is the expression or statement. R evaluates statements.

Expressions are composed of data objects, functions, and special characters.

One of the most basic expressions is assigning data values to a name. Typical style would put one statement per line.

x <- rnorm(10, mean = 5)
y <- rnorm(12, mean = 7)

Let’s dig into the details.

  • x is the name of a data object

  • <- is the assignment operator. Operators have a left-hand side and a right-hand side.

  • rnorm() is a function, including the parentheses

  • 10 and mean = 5 are function arguments, or parameters. mean is an argument name. The = is an assignment operator for function arguments. 5 is the value given for the mean argument.

You can think of each piece of an expression as a word, or token. A token is generally a name (of a data object, a function, or an argument), an operator (like <- or +), or another special character like parentheses, brackets, and braces.

If we want to keep the result of a function, we must assign it to an object. Otherwise, any operations we perform are strictly temporary.

6.1 Comments

We use comments in our code to write notes for humans to read, and to disable sections of code (perhaps temporarily).

The # symbol is the comment token. Any text on a line after a # character is ignored by R.

Try this example, which contains two comments:

x <- rnorm(25, mean = 5)
y <- rnorm(20, mean = 7)
# a two-sample t-test
t.test(x, y, var.equal = TRUE) # classic t-test

    Two Sample t-test

data:  x and y
t = -9.0367, df = 43, p-value = 1.702e-11
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.850001 -1.810033
sample estimates:
mean of x mean of y 
 4.830499  7.160516 

6.2 Capitalization

Capitalization matters. Try

X <- rnorm(3, mean = 3)
x <- Rnorm(3, mean = 3)
Error in Rnorm(3, mean = 3): could not find function "Rnorm"
x <- rnorm(3, Mean = 3)
Error in rnorm(3, Mean = 3): unused argument (Mean = 3)

In the first statement, we get a new vector, X capitalized. Be careful! While this is valid code, it might have been a typo!

In the second statement, we get an error about an unrecognized function. The function name should have been lower case: rnorm.

In the third statement, we get an error about an unrecognized argument. The argument name should have been lower case: mean.

If you decide to use capitalization when you name objects, try to do so in a consistent style.

6.3 White Space

White space between tokens does not matter, except for line breaks. White space used well makes your code much easier for humans to read and understand.

Try:

x<-rnorm(10,mean=5)
x <- rnorm ( 10 , mean = 5 )
x <- rnorm(10, mean = 5)

These are all valid code. In there first statement, there is no white space at all. In the second statement, there is white space between every single token. The third statement balances the two, some white space for readability, but not so much as to take up unnecessary space. Where you have one white space, you can have many white spaces.

Again, using white space will make your code easier for humans to read and understand, especially if you use it in a consistent way.

6.4 Line Breaks

An R statement may extend over more than one line. As long as an expression is incomplete at the end of a line, R will continue reading the next line before evaluating the statement.

Try this example:

x <-
  rnorm(5, mean = 3)

This is valid code. In fact, if you highlight and run just one line, the RStudio Console presents you with a + prompt, indicating you have a dangling expression. (If you use Ctrl-Enter, instead, RStudio reads both lines!)

A little caution is required with the placement of parentheses and operators: you may place an open parenthesis or an operator before a line break, but not after.

Compare these examples:

y <- 3 + 4
z <- 3
  + 4
[1] 4

The first line is a complete statement, assigning the value 7 to y.

Written as above, the second line is also a complete statement, assigning the value 3 to z. Then the third line is simply a request to print the value 4.

6.5 Style

Try to write your code in a consistent and conventional manner. White space around operators make them easier to spot. White space between function arguments make them easier to distinguish. White space to indent blocks of code that run together makes it easier to see the flow of processing in a script.

Consistency makes your code easier to debug, and easier for people (your future self, colleagues, consultants) to read. You may find it helpful to consult an established style guide, such as The tidyverse style guide.

6.6 Exercises

Adjust the capitalization, white space, and line breaks in this code so that it runs.

a 
<- 
runif(15)

b <
- rnorm(10
, 1)

t.Test (A, b)