# Stata for Students: Correlations

Correlations are a measure of how strongly related two quantitative variables are. It can only perfectly measure linear relationships, but a linear relationship will serve as a first approximation to many other kinds of relationships. You can calculate correlations for categorical variables and the results you get will sometimes point you in the right direction, but there are better ways to describe relationships involving categorical variables.

Correlation coefficients range from -1 to 1. A positive correlation coefficient means the two variables tend to move together: an observation which has a high value for one variable is likely to have a high variable for the other, and vice versa. The larger the coefficient the stronger the relationship. A negative correlation coefficient means they tend to move in opposite directions: observations with a high value for one variable are likely to have a low value for the other. Variables which are independent will have a correlation of zero, but variables which are related but not in a linear way can also have a correlation of zero.

## Setting Up

If you plan to carry out the examples in this article, make sure you've downloaded the GSS sample to your U:\SFS folder as described in Managing Stata Files. Then create a do file called cor.do in that folder that loads the GSS sample as described in Doing Your Work Using Do Files. If you plan on applying what you learn directly to your homework, create a similar do file but have it load the data set used for your assignment.

## Calculating Correlations

The correlate command, often abbreviated cor, calculates correlations. List the variables you want correlations for after the command.

cor sei10 educ height weight

This gives you the correlations between the respondent's socioeconomic status, years of education, height, and weight. They are given in the form of a matrix, but only half of the matrix is shown because it is symmetric:

```(obs=114)

|    sei10     educ   height   weight
-------------+------------------------------------
sei10 |   1.0000
educ |   0.6205   1.0000
height |   0.2466   0.1868   1.0000
weight |   0.1048  -0.0224   0.5282   1.0000```

This shows that the correlation between socioeconomic status and education is .6205, which is fairly high. The correlation between socioeconomic status and height, .2466, is weaker, but it's interesting that its positive at all. Keep in mind that correlation does not imply causation. We cannot tell from these results whether high socioeconomic status causes people to grow taller or being tall causes people to have higher socioeconomic status (both can be true, and there's evidence for both theories), or if something else causes people to both grow taller and have higher socioeconomic status.

The correlation between weight and education is essentially zero, but the negative number indicates that people with higher levels of education are likely to have lower levels of weight. It's just a very small effect. On the other hand, given that education and height are positively correlated and height and weight are strongly positively correlated, this raises the possibility that education and weight might have a stronger negative relationship if we could control for height. Multivariate regression allows us to explore that possibility.

## Calculating Covariances

cor sei10 educ height weight, cov

```(obs=114)

|    sei10     educ   height   weight
-------------+------------------------------------
sei10 |  510.103
educ |  43.4237  9.59983
height |  22.7511  2.36376  16.6884
weight |  99.2858 -2.91236  90.4648  1757.94```

Covariances are not bound to fall in the range of -1 to 1, and depend on both how much the variables vary together and how much they vary overall. But the interpretations of positive and negative numbers are similar. The diagonal of the matrix gives you the variance of each variable, or its standard deviation squared.

## Complete Do File

capture log close
log using cor.log, replace

clear all
set more off

use gss_sample

cor sei10 educ height weight
cor sei10 educ height weight, cov

log close

Last Revised: 11/17/2016