This post follows directly on from my last, “A First Look at Visualising Data With R and ggplot2”, so if you are new to ggplot2 check that one out first!
Today we are going to be continuing looking at the Palmer Archipelago dataset, this time for creating scatter plots.
As before the dataset being used can be downloaded directly (in csv format) from Kaggle or imported directly into R with the palmerpenguins package.

Artwork by @allison_horst.
Importing the Data
As previously we will be needing the ggthemes and ggplot2 packages. This time we will also be using the dplyr package for data manipulation and the ggExtra package for plotting distributions.
library(dplyr)
library(ggplot2)
library(ggthemes)
library(ggExtra)
penguins <- read.csv("penguins_size.csv")
Next we convert the ‘species’, ‘island’ and ‘sex’ variables to factors.
penguins[c('species', 'island', 'sex')] <-
lapply(penguins[c('species', 'island', 'sex')], as.factor)
penguins <- penguins |> na.omit() |>
rename('Island' = 'island', 'Species' = 'species', 'Sex' = 'sex') |>
filter(!(Sex == "."))
Simple Scatter
To create a basic scatter plot we can use geom_point().
penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm))+
geom_point()

Making Improvements
We can improve this plot by adding labels, colour and a theme.
penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm))+
geom_point(shape = 16, colour = "#FF4F00")+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth in Penguins in the Palmer Archipelago") +
theme_hc() +
geom_rangeframe()

Looking at our plot it is clear that there appear to be some distinct clusters. Before investigating these, let’s add a line of best fit with geom_smooth(method="lm").
penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm))+
geom_point(shape = 16, colour = "#FF4F00")+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth in Penguins in the Palmer Archipelago") +
geom_smooth(method="lm")+
theme_economist_white()+
geom_rangeframe()
print(cor(penguins$culmen_depth_mm, penguins$culmen_length_mm))

Output:
-0.2286256
Marginal Plots
We can add marginal plots with ggMarginal from the ggExtra package to show the distribution of the data.
plot <- penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm))+
geom_point(shape = 16, colour = "#FF4F00")+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth in Penguins in the Palmer Archipelago") +
geom_smooth(method="lm")+
theme_calc()
ggMarginal(plot, type="histogram", fill = "#FF4F00", size=5, bins = 12)

ggMarginal(plot, type="boxplot", fill = "#FF4F00", size=15)

ggMarginal(plot, type="density", fill = "#FF4F00", size=10)

Investigating Clustering
To investigate the clusters we can add colour = Species to aes.
penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm, colour = Species))+
geom_point(shape = 16)+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth by Species") +
geom_smooth(method="lm")+
scale_fill_few()+
theme_few()

correlation <- penguins |>
group_by(Species) |>
summarise(correlation = cor(culmen_length_mm, culmen_depth_mm))
print(correlation)
Output:
Species correlation
Adelie 0.3858132
Chinstrap 0.6535362
Gentoo 0.6540233
penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm, colour = Island))+
geom_point(shape = 16)+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth by Island") +
geom_smooth(method="lm")+
theme_foundation()

penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm, colour = Sex))+
geom_point()+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth by Sex") +
geom_smooth(method="lm")+
theme_solarized()

penguins |>
ggplot(aes(x = culmen_length_mm, y = culmen_depth_mm, colour = Sex, shape=Species))+
geom_point()+
labs(x = "Culmen Length (mm)", y = "Culmen Depth (mm)",
title = "Culmen Length vs. Depth by Sex and Species") +
geom_smooth(method="lm")+
theme_excel_new()

Facet Plots
ggplot(penguins, aes(x=culmen_length_mm, y = culmen_depth_mm, colour = Species))+
geom_point()+
geom_smooth(method="lm")+
facet_grid(Island~Species, scales="free", space="free_x") +
labs(x="Culmen Length (mm)", y="Culmen Depth (mm)",
title="Culmen Length vs Depth by Species and Island")+
theme_base()

ggplot(penguins, aes(x=culmen_length_mm, y = culmen_depth_mm, colour = Sex))+
geom_point()+
geom_smooth(method="lm")+
facet_grid(Sex~Species, scales="free", space="free_x") +
labs(x="", y="Penguin Count",
title="Culmen Length vs Depth by Species and Sex")+
theme_stata()

Conclusion
I hope that this post has been a useful introduction to scatter plots with ggplot2. Why not try investigating the relationships between some of the other numeric variables for this data such as body mass or flipper length?