Statistics: Correlation

Introduction

Correlation is an association between two variables, it does not determine causal relationships.The relationship between one smoking cigarettes and the risk of lung cancer would be an example of a correlation. There is a present connection between these two variables however, this relationship alone does not determine causation.

CORREL is the function you can use in Excel to calculate the correlation coefficient of two cell ranges in order to determine the relationship between two variables. Correlation coefficients fall between -1 and 1.

A positive (+) value (coefficient > 0) indicates a positive relationship , meaning that if the value of one array increases, the values in the other array also increases.

A negative (-) value (coefficient < 0) indicates a negative relationship between the arrays, meaning that as one of the array values increases, the other decreases.

The closer the correlation coefficient is to 0, the weaker the correlation, and if the correlation coefficient is 0, there is no relationship between values. Please see the table below for guidance on interpreting the correlation coefficient.

Functions

To calculate whether data in Excel are correlated, you can use the CORREL function:

=CORREL(array1, array2)

Both array1, a range of cell values, and array 2, a second range of cell values, are required to complete this function.

 

Some common errors that occur when using CORREL are:

  • If the selected cell ranges referenced in the CORREL function contain text, logical values, or empty cells, these values are ignored. However, cells that contain zero values are included in the calculation
  • You will receive a #N/A error if array1 and array2 have a different number of data points
  • You will receive a #DIV/0! error if either array1 or array2 is empty [or if s (the standard deviation) of their values equals zero]

Practice

Practice using the CORREL function by answering the questions below in the highlighted cell.

  1. What is the correlation coefficient between access to electricity (% of population) and ‚Äč‚Äčcommercial service exports (current US$)?
    Answer

    Correlation: 0.200685264

  2. Just from this information, can we say that there is any causal relationship between the two?
    Answer

    No, the CORREL function only describes the correlation in the data, not whether one variable is affecting another variable.

Conclusion

Being able to calculate the correlation of two arrays of data is helpful across professions to better understand whether or not there is a relationship between two variables, and if so, what that relationship is. Understanding the correlation between different phenomena helps us in many fields of research and in life because it allows us to draw inferences on how X is likely to change if Y changes based on their relationship. It is important to remember however that correlation does not imply causation.

  • In finance, correlation is used to compare stock or share prices to the market index value to better comprehend its true value
  • In the business world, correlation can lead to further understanding of sales trends, for example, it allows a company to compare their sales of air conditioners in the winter versus the summer
  • In research, correlation is used to understand if there is a relationship between two variables, for example, height and prevalence of back injuries, or age and income earned
css.php