R and its ggplot2 package are wonderful tools for visualising data. In this post, we will explore some of the basics of plotting with ggplot2 by creating bar charts using the famous Palmer Archipelago dataset.

Artwork by @allison_horst.
The dataset being used can be downloaded directly (in csv format) from Kaggle or imported directly into R with the palmerpenguins package.
Importing the Data
Move the dataset to the location of the R script you will be plotting in, or use a relative path. Remember that for R to find your file you may need to set your current working directory, you can do this in RStudio by clicking Session <- Set Working Directory <- To Source File Location in the banner, or by running the setwd("/your_path_here") command.
You can import the penguins dataset using the read.csv() function, built into R.
# Import the penguins dataset using the read.csv() function
penguins <- read.csv("penguins_size.csv")
# View the first few entries of the dataframe
print(penguins[1:5,])
Output:
species island culmen_length_mm culmen_depth_mm flipper_length_mm
1 Adelie Torgersen 39.1 18.7 181
2 Adelie Torgersen 39.5 17.4 186
3 Adelie Torgersen 40.3 18.0 195
4 Adelie Torgersen NA NA NA
5 Adelie Torgersen 36.7 19.3 193
body_mass_g sex
1 3750 MALE
2 3800 FEMALE
3 3250 FEMALE
4 NA <NA>
5 3450 FEMALE
Cleaning Up
Viewing the dataset by printing the first few rows has revealed our first issue, this data has several NA values. A great way to visualise the amount of data missing in a given dataframe in R is using the vis_miss function from the naniar library. You may need to install this by running install.packages('naniar').
# install.packages('naniar')
library(naniar)
vis_miss(penguins)

Reassuringly the dataset has very few missing values. The easiest way to deal with these will be to exclude them using na.omit() which simply removes each row in a dataframe that has any NA values in it.
penguins <- na.omit(penguins)
vis_miss(penguins)

Another good idea when working with a new dataset is to make sure that any categorical variables are treated as factors in R. This can be done with as.factor(col) and makes sure that plots of categorical variables work correctly.
penguins$sex <- as.factor(penguins$sex)
penguins$island <- as.factor(penguins$island)
penguins$species <- as.factor(penguins$species)
We can also change the names of any columns. Below I have changed the names of the variables we will be plotting to be capitalised so that they will look a little nicer in legends.
names(penguins)[names(penguins) == 'island'] <- 'Island'
names(penguins)[names(penguins) == 'species'] <- 'Species'
names(penguins)[names(penguins) == 'sex'] <- 'Sex'
There is one more issue with the dataset in its current form. The sex for one observation is missing, instead containing just a full stop. A helpful side effect of converting our categorical variables to factors is that we can see this easily by printing the levels of each factor variable.
print(levels(penguins$Sex))
Output:
[1] "." "FEMALE" "MALE"
print(levels(penguins$Island))
Output:
[1] "Biscoe" "Dream" "Torgersen"
print(levels(penguins$Species))
Output:
[1] "Adelie" "Chinstrap" "Gentoo"
To handle this we can use the filter() function from the dplyr library. The ! before (Sex == ".") means that rather than returning the dataset with only rows where the sex of the penguin is ”.” the function will do the opposite and select all rows where the sex does not equal ”.”.
library(dplyr)
penguins <- filter(penguins, !(Sex == "."))
We are now ready to start plotting. For this first look at ggplot2 we will focus on bar plots.
Creating Plots
To create any plot with ggplot2 we first need to create the plot area with the ggplot() function. For all plots we will need to specify the data being used and any aesthetics we wish to pass through to the graphs we will be plotting.
library(ggplot2)
ggplot(data = penguins, aes(x=Species)) +
geom_bar()

Intuitively, we add new elements to a plot with +. For this tutorial we use geom_bar() for a bar plot but other plots available include geom_point() for a scatter plot, geom_col() for a column plot or geom_line() for a line plot. We could even add multiple plots to the same axes.
penguins |>
ggplot(aes(x=Island, fill=Species)) +
geom_bar()

We can enhance our plots by adding some labels using labs() to add a title, x-axis and y-axis.
penguins |>
ggplot(aes(x=Species, fill=Species)) +
geom_bar()+
labs(title="Penguins in the Palmer Archipelago",
x = "Species",
y="Penguin Count")

penguins |>
ggplot(aes(x=Island, fill=Species)) +
geom_bar(position = "dodge2")+
labs(title="Penguins in the Palmer Archipelago",
x = "Island",
y="Penguin Count")

Themes allow us to customise our plots further. There are many built into ggplot2 however my favourite, easy to implement, themes are those in the ggthemes package.
library(ggthemes)
penguins |>
ggplot(aes(x=Species, fill=Species)) +
geom_bar()+
labs(title="Penguins in the Palmer Archipelago",
x="Species",
y="Penguin Count") +
geom_rangeframe() +
theme_hc() +
scale_fill_hc()+
theme(legend.position = "none")

penguins |>
ggplot(aes(x=Island, fill=Species)) +
geom_bar(position = "dodge2")+
labs(title="Penguins in the Palmer Archipelago",
x = "Island",
y="Penguin Count",
fill="Species") +
geom_rangeframe() +
scale_fill_economist()+
theme_economist()

penguins |>
ggplot(aes(x=Sex, fill=Species)) +
geom_bar(position = "dodge2")+
labs(title="Penguins in the Palmer Archipelago",
x = "Sex",
y="Penguin Count",
fill="Species") +
geom_rangeframe() +
scale_fill_few()+
theme_calc()

Combining Plots
We can use a facet grid to combine all of the information from our plots so far into a single, easy to read plot.
library(MASS)
library(reshape2)
library(reshape)
penguin_2 <- melt(penguins, id = c('culmen_length_mm', 'culmen_depth_mm',
'flipper_length_mm', 'body_mass_g',
'Species','Sex'))
sex.labs <- c("Male", "Female")
names(sex.labs) <- c("MALE", "FEMALE")
ggplot(penguin_2, aes(x=value, fill = Species))+
geom_bar(position = "dodge2")+
facet_grid(Sex~variable,
scales="free",
space="free_x",
labeller = labeller(Sex=sex.labs))+
labs(x="",
y="Penguin Count",
title="Penguins in the Palmer Archipelago")+
theme_hc()+
scale_fill_manual(values=c("#FF8100", "#C25ECA", "#067476"))

Conclusion
This final plot shows us the distribution of penguins across each island, for each species and for both sexes.
In the next post we will begin looking at the other variables in the dataset such as body mass and flipper length and look at if these vary based on sex, island or species.