# Data Visualization: Scatter plots

## Introduction

Scatter plots are used as visualizations to show if there is a relationship between two variables: variable X and variable Y.

They also can show patterns in large data sets such as linear or non-linear trends, clusters, and outliers. Adding a trend line to a scatter plot enhances your ability to see trends and whether there is any correlation between the variables.

Scatter plots are often a good first step to visualizing data before applying more rigorous statistical tests and are commonly used in economics, econometrics, finance, and the media.

## Uses

To create a scatter plot in Excel:

- First, you must have a data table
- Select your data table > Insert > X Y Scatter> under scatter, pick a chart

### Chart formatting tips

To modify the format of the scatter plot:

- You can click on the various elements on the graph and a format tab will appear on the right. This allows you to change colors, line width, the position of titles, and more.
- Under the Chart Design tab, you can also modify details like the title, labels, and legend. If at any point you wish to switch the data displayed on the x-axis and y-axis, you can simply click on Switch Row/Column.
- You can also add a line of best fit, which shows the correlation of data points (called a trend line in Excel by going to Add Chart Element, selecting the Trendline, and choosing the type of line you want to show.

## Practice

Using the World Bank Data, visualize each country’s GINI coefficient on a scatter plot. Which countries have the highest and lowest values? What trends do you see?

## Conclusion

When you wish to show that there is, or is not a relationship between two variables, a scatter plot can be a useful visual aid. They also show patterns in large data sets, and adding in a trend line can enhance your ability to see trends and whether there is any correlation between the variables. Although not used as often as bar and line charts, scatter plots can be powerful visuals and are often a good first step to visualizing data before applying more rigorous statistical tests.