# 6 Logical

In R, logical values are stored as a distinct data type. Logical
values are used both in statistical
modeling and in data management. In modeling, logical vectors are
often called *indicator* or *dummy* variables. In data management,
logical values also serve as *conditional* indicators.

## 6.1 Logical Values

There are three logical values:

`TRUE`

`FALSE`

`NA`

(“missing”, or “unknown value”)

The names `T`

and `F`

are used by default as aliases for TRUE and
FALSE, but be aware that you can redefine T and F. Do not do this
accidentally! The *names* `T`

and `F`

print the *values* `TRUE`

and `FALSE`

.

`T`

`[1] TRUE`

`F`

`[1] FALSE`

See `help(logical)`

and `help(NA)`

.

## 6.2 Logical Operators

R has the typical binary comparison operators (see `help(Comparison)`

).
These take data of arbitrary type as inputs (x and y) and return logical
values.

- Greater than,
`x > y`

, or equal to,`x >= y`

- Less than,
`x < y`

, or equal to,`x <= y`

- Equal to,
`x == y`

- Not equal,
`x != y`

R also has the typical Boolean operators for creating compound logical expressions. These take logical values as inputs (x and y) and return logical values.

- And,
`x & y`

- Or,
`x | y`

- Not,
`!x`

### 6.2.1 Making Comparisons

As with the mathematical operators, logical operators work pairwise with the elements of two vectors, returning a vector of comparisons. Where one vector is shorter than the other, recycling occurs.

In this first example, we set up an arbitrary numeric vector, \(a\), and ask if each element of \(a\) is a 3.

```
a <- c(1.1, 3, 5.3, 2) # a numeric vector
f <- (a == 3) # a vector of comparisons
f
```

`[1] FALSE TRUE FALSE FALSE`

We can make comparisons with character values, too, but be aware that the result can depend on what language R thinks you are working in!

```
A <- c("a", "b", "e")
A > "d"
```

`[1] FALSE FALSE TRUE`

Comparison of two vectors is done element by
element. (The term “vectorized” is sometimes used to mean
this sort of operation). In this example, each element
of \(a\) is compared to *one* corresponding integer in 1 to 4.

`a > 1:4`

`[1] TRUE TRUE TRUE FALSE`

If the two vectors being compared are of different lengths, recycling occurs just as we saw with numeric operators.

```
b <- c(1.1, 2)
a != b # silent recycling
```

`[1] FALSE TRUE TRUE FALSE`

`a > 2:4 # noisy recycling`

`Warning in a > 2:4: longer object length is not a multiple of shorter object length`

`[1] FALSE FALSE TRUE FALSE`

A somewhat different kind of comparison is the value match. Here
we ask if values in the left-hand vector are elements of the *set*
represented by the right-hand vector. Despite the use of two
vectors, these are no longer
pairwise comparisons and there is no recycling. The length of the output
is equal to the length of the vector on the left-hand side. In the first
example, `2 %in% a`

, `2`

has length one, so the output has one element.
In the second example, `1:4 %in% a`

, `1:4`

has length four, so the output
has four elements.

`2 %in% a`

`[1] TRUE`

`1:4 %in% a`

`[1] FALSE TRUE TRUE FALSE`

These two operators can be used for different purposes. To illustrate, first create a hypothetical dataset where two columns, `state2020`

and `state2021`

, contain information about the states where an individual lived in 2020 and 2021, respectively.

```
n <- 10
set.seed(455)
dat <-
data.frame(id = 1:n,
state2020 = sample(state.name[1:5], n, replace = T),
state2021 = sample(state.name[1:5], n, replace = T))
```

Use `==`

for elementwise comparisons. The variable `same_state`

represents, “Did this individual live in the same state for both years?”

`dat$same_state <- dat$state2020 == dat$state2021`

Use `%in%`

if you want to find cases that take a value found in some set. The variable `AL_AK_2021`

represents, “Did somebody live in either Arizona or Alaska in 2021?”

`dat$AL_AK_2021 <- dat$state2021 %in% c("Arizona", "Alaska")`

For illustration purposes, we will also create a variable that uses `==`

to compare a column to a vector with more than one value (a set). This code will recycle our vector `c("Arizona", "Alaska")`

, comparing observation one to “Arizona”, observation two to “Alaska”, observation three to “Arizona”, and so on. This is decidedly a bad idea; results will vary if we sort, add, or remove observations.

```
# DO NOT DO THIS - USE %in% INSTEAD
dat$AL_AK_2021_v2 <- dat$state2021 == c("Arizona", "Alaska")
```

Now, print the dataframe to see the `TRUE`

s and `FALSE`

s in our new variables.

`dat`

```
id state2020 state2021 same_state AL_AK_2021 AL_AK_2021_v2
1 1 Arkansas Alabama FALSE FALSE FALSE
2 2 California Alaska FALSE TRUE TRUE
3 3 Alabama California FALSE FALSE FALSE
4 4 California Arizona FALSE TRUE FALSE
5 5 Arkansas California FALSE FALSE FALSE
6 6 Alaska Alaska TRUE TRUE TRUE
7 7 Arizona Arkansas FALSE FALSE FALSE
8 8 Alabama Alabama TRUE FALSE FALSE
9 9 Alabama California FALSE FALSE FALSE
10 10 Alaska Alabama FALSE FALSE FALSE
```

Different operators are useful for different research questions.

### 6.2.2 Boolean Algebra

We also have the usual operators (“and”, “or”, “not”) for combining logical inputs to produce a logical outcome.

```
# &, "and" - satisfy both conditions
(a == 2) & (a < 5)
```

`[1] FALSE FALSE FALSE TRUE`

```
# |, "or" - satisfy at least one condition
(a == 2) | (a < 5)
```

`[1] TRUE TRUE FALSE TRUE`

### 6.2.3 Missing Values

The logical status of missing values is treated somewhat differently in R than in some other statistical software (Stata, SAS, SPSS). Where in some languages the result of a comparison is either true or false, in R a comparison may produce a “missing” or “unknown” result.

```
b <- c(1:4, NA)
b > 3 # in Stata the final value is "true"
```

`[1] FALSE FALSE FALSE TRUE NA`

`b == 3 # in Stata and SAS the final value is "false"`

`[1] FALSE FALSE TRUE FALSE NA`

`b < 3 # in SAS the final value is "true"`

`[1] TRUE TRUE FALSE FALSE NA`

Likewise, Boolean operations on missing values produce missing results.

When *checking* for missing values a common mistake is to
use a comparison. However, in R we use a testing
function.

`b == NA # not useful, but doesn't produce an error!`

`[1] NA NA NA NA NA`

`is.na(b) # the proper way to check`

`[1] FALSE FALSE FALSE FALSE TRUE`

## 6.3 Functions with Logical Vectors

`ifelse()`

is a function we will use a lot to recode and modify values in First Steps with Dataframes. To use `ifelse()`

, give it three arguments: a logical test, a value to return if the test evaluates to `TRUE`

, and a value to return if the test evaluates to `FALSE`

.

Make a vector of the numbers one to five.

`x <- 1:5`

The first way to use `ifelse()`

is to recode all values to make a binary variable:

`ifelse(x >= 3, "three_or_more", "less_than_three")`

`[1] "less_than_three" "less_than_three" "three_or_more" "three_or_more" "three_or_more" `

We can also use `ifelse()`

to change some values, and leave others as is. To do so, supply a value in either the second or third argument of `ifelse()`

, and put the name of the vector in the other.

For example, we could change one specific value. Wherever `x`

is 3, make it missing, and return its original value otherwise:

`ifelse(x == 3, NA, x)`

`[1] 1 2 NA 4 5`

We could return the corresponding element from another vector. Here, the test evaluates to `TRUE`

in the third position, so the third element of `11:15`

is returned (13):

`ifelse(x == 3, 11:15, x)`

`[1] 1 2 13 4 5`

Other applications include top-coding or bottom-coding data:

`ifelse(x > 4, 4, x) # top-code at 4`

`[1] 1 2 3 4 4`

`ifelse(x < 2, 2, x) # bottom-code at 2`

`[1] 2 2 3 4 5`

We can compare a vector to a set with `%in%`

:

`ifelse(x %in% c(1, 3, 5), "odd", x)`

`[1] "odd" "2" "odd" "4" "odd"`

And of course, complex expressions with Boolean operators are allowed:

`ifelse(x < 3 | x > 3, "not_three", x)`

`[1] "not_three" "not_three" "3" "not_three" "not_three"`

### 6.3.1 Coercion

A *generic function* is a function which uses different
*methods* (implements different algorithms) depending
on the class and type of the input data. (Recall the discussion
in the chapter on Data Class.) A very few generic
functions have specific methods for logical vectors, while
most functions will coerce logical vectors to either a
numeric vector or a factor.

`summary(f) # produces counts, but also notes mode`

```
Mode FALSE TRUE
logical 3 1
```

If you have worked with other statistical software, you won’t be surprised that very often logical values are automatically coerced to the integers 0 and 1.

`mean(f) # coerced to numeric, a proportion`

`[1] 0.25`

`f + 1 # coercion in binary operators, too`

`[1] 1 2 1 1`

You may also be aware that where numeric values are coerced into logical values, 0 is FALSE and anything else is TRUE (unless it is missing). (Recall Exercise 3 from Data Types.)

`as.logical(-1:2)`

`[1] TRUE FALSE TRUE TRUE`

The fact that we can coerce logical values into zeroes and ones is extremely useful in data wrangling when it is used in combination with `sum()`

or `mean()`

.

The `cyl`

column of `mtcars`

takes on several values:

`mtcars$cyl`

` [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4`

A logical comparison can tell us which values are equal to 4:

`mtcars$cyl == 4`

```
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[29] FALSE FALSE FALSE TRUE
```

Taking the sum of this comparison will coerce the `TRUE`

s to 1 and `FALSE`

s to 0, so that the sum is equal to the number of 4s in `mtcars$cyl`

. The mean, then, is the proportion of values equal to 4.

`sum(mtcars$cyl == 4)`

`[1] 11`

`mean(mtcars$cyl == 4)`

`[1] 0.34375`

We can also check the number of cars that have 1 in both `vs`

and `am`

. Recall that comparisons will be made elementwise. Using the `&`

operator allows us to check, whether the first pair of `vs`

and `am`

is all `TRUE`

, whether the second pair is all `TRUE`

, and so on. Summing this will give us the number of cases in `mtcars`

that have a 1 for both. Looking through the numeric values for these two variables, it looks like they are both 1 in seven positions.

`mtcars$vs`

` [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1`

`mtcars$am`

` [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1`

`mtcars$vs == 1`

```
[1] FALSE FALSE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
[15] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
[29] FALSE FALSE FALSE TRUE
```

`mtcars$am == 1`

```
[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
[29] TRUE TRUE TRUE TRUE
```

`mtcars$vs == 1 & mtcars$am == 1`

```
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[15] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE
[29] FALSE FALSE FALSE TRUE
```

`sum(mtcars$vs == 1 & mtcars$am == 1)`

`[1] 7`

Later, we will use this property of logical values to Quantify Missing Data.

## 6.4 Testing Equality

There is one logical comparison that is particularly problematic
when made by a computer: *equality*. Checking the equality of
logical values, character values, and integer values is straightforward,
but numeric values with decimal precision (stored as “doubles”) are
often imprecise. Think of the decimal representation of \(1/3\), or
0.3333333, which must be truncated at some point: 0.3333… cannot continue *forever*.

(A computer works with binary representations, but the problem is conceptually the same.)

Even simple mathematical operations can introduce numerical deviations.

```
a <- 0.5 - 0.2 # 0.3
b <- 0.4 - 0.1 # 0.3
a == b # Probably not what you expected!
```

`[1] FALSE`

`a - b # a small difference, but not exactly zero`

`[1] -5.551115e-17`

We have two general approaches for handling this imprecision
with comparisons of numeric vectors. In the special case
where we want to know of all elements of two vectors are
equivalent, we have a summary function `all.equal`

. In
the more general case, we test that the differences between
two vectors are less than a numerical *tolerance*.

```
# An example with vectors
x <- seq(0, 0.5, by = 0.1)
y <- seq(0.1, 0.6, by=0.1)-0.1
x == y # Not what you hoped for? (the third element ....)
```

`[1] TRUE TRUE FALSE TRUE TRUE TRUE`

`x - y`

`[1] 0.000000e+00 0.000000e+00 -2.775558e-17 0.000000e+00 0.000000e+00 0.000000e+00`

`all.equal(x,y) # checking all are equal`

`[1] TRUE`

The smallest precision available on your computer is given by

`.Machine$double.eps`

`[1] 2.220446e-16`

We commonly take our maximum imprecision to be the square root
of that value. So if we check for *numerical equivalence*, we get
the result we expected earlier with `==`

.

```
tol <- sqrt(.Machine$double.eps)
x-y < tol
```

`[1] TRUE TRUE TRUE TRUE TRUE TRUE`

## 6.5 Exercises

A typical use of logical comparison is to create an indicator variable. Create an object called

`high_mpg`

that indicates whether a given value of`mtcars$mpg`

has a value greater than the mean of`mtcars$mpg`

.What proportion of values in

`mtcars$mpg`

are greater than the mean?

## 6.6 Advanced Exercises

Another use of logical vectors is to select observations from another vector. Use the

`high_mpg`

object created above to select values of`mtcars$disp`

associated with high gas mileage. Calculate the mean of this vector.We have seen how logical values are often coerced to numeric, and numeric to logical. However, important differences remain between the two types. Consider this example, which seems like it should produce the same results in two different ways. Why are two different vectors returned?

`v <- 1:4 v[c(T,F,T,F)]`

`[1] 1 3`

`v[c(1,0,1,0)]`

`[1] 1 1`