How to Create Stacked Histograms in R

A stacked histogram is a graphical representation of data where the individual values are placed on top of each other. This type of histogram is useful for seeing data distribution and spotting patterns. With stacked histograms, we plot more than one population on one graph. Using the hist() and ggplot() functions, you can create stacked histograms. This guide explains how to create stacked histograms in R.

Using hist() Function to Create Stacked Histogram in R

You should use the hist() function to create a stacked histogram when you have two different variables that are to be plotted on the same graph. The hist() is a built-in function in R that takes a vector as input and creates a histogram.

To create a stacked histogram of two variables, we need to call the hist() function twice, as there are two variables. When you call the hist() function a second time, make sure to include the “add” argument. This argument plots the first and the second histogram on a single plot.

Syntax

hist(x, col=NULL,add=TRUE)

Arguments

  • x = It is the vector of data that you want to create the histogram from.
  • col = It is the color of the bars.
  • add = TRUE to merge the histograms.

Example

In this example, we will create a stacked histogram from two different variables.

 #creating 1st variable
x = c(50,40,45,20,25)
#creating 2nd variable
y = c(15,5,40,10,45)
#creating histogram of x
hist(x,col = "green")
#creating histogram of y and stacking x
hist(y,col = "red",add=TRUE)

Output

hist - stacked hist
Two variables stacked on a single plot.

In the above example, we created a histogram of variable x and stacked it on top of the histogram of variable y.

Using ggplot() Function to Create Stacked Histogram in R

While the hist() function is used when you have two different variables, ggplot() is used when you have one variable with different categories. ggplot() is not a built-in function in R.

Syntax of ggplot()

ggplot(data=, aes(x=, fill=)) + geom_histogram()

Argument

  • data = It is the data frame you want to use to create the plot.
  • aes() = It is used to specify the variable to use on axes.
  • x = The variable to map on the x-axis.
  • fill = It defines the categorical variable based on which you want to color the histogram.
  • geom_histogram() = It specifies that you want to create a histogram.

Installing ggplot2

ggplot() comes from the ggplot2 package. To install and load the package, run the following code:

# install ggplot2 package
install.packages('ggplot2')
# load ggplot2 package
library('ggplot2')

Example

To demonstrate how to use the ggplot() to create a stacked histogram, we will use the PlantGrowth data set. This data set is included with the base R installation. The data set contains information on the experiments to compare yields. It has two columns: weight and group.

In this example, we will plot the histogram of mpg column using the ggplot() function.

ggplot(data=PlantGrowth, aes(x=weight,fill=group)) + geom_histogram()

Output

ggplot - stacked hist
Stacked histogram using the ggplot() function.

In the above example, we created a plot containing stacked histograms. We specified the weight column to be plotted and then colored the histogram based on the column group.

Leave a Comment