sample() Function in R

The sample() function allows you to select a random set of data points from a vector. This enables you to perform statistical analysis on a sample of a data set without having to input all the data manually. This guide explains how to use the sample() function in R.

Syntax

sample(x, size, replace, probability)

Arguments

  • x = a vector of data points from which you want to select a random sample.
  • size = the number of data points you want to select.
  • replace = can be TRUE or FALSE. If TRUE, the values can be repeated in the sample. By default, it is set to FALSE.
  • prob = a vector of probability weights.

Now that we’ve gone over the basics of the function, let’s look at some examples of how to use it.

Example 1: Creating a sample without replacement

We can use the sample() function to select a random sample of data points without replacement. This means that once a data point is selected, it will not be selected again. Let’s say we have a vector x that contains 100 data points. To select a random sample of 10 data points from this vector, we use the following code:

#creating a vector of 100 values
x = c(1:100)
#generating a sample of 10 values
sample(x, size = 10)

Output

[1] 10 62 56 97 64 9 26 87 52 41

In this example, the function returned a vector containing ten randomly selected data points from x, where no data point was selected more than once. Also, note that values are not repeated because, by default, the replace argument is set to be FALSE.

Example 2: Creating a sample with replacement

Let’s take the same example used in Example 1. We can use the sample() function to select a random sample of 10 data points from this vector with replacement of the values using the following code:

#creating a vector of 100 values
x = c(1:100)
#generating a sample of 10 values
sample(x, size = 10, replace=TRUE)

Output

[1] 28 34 32 68 67 91 88 2 14 88

In this example, the function returned a vector containing ten randomly selected data points from x. Also, note that values may be repeated because we have set the replace argument as TRUE. In this example, 88 got repeated twice.

Example 3: Creating a sample with uneven probabilities

In the following example, we will define each element’s probabilities in the data set. Kindly note that the probability value should be between 0 to 1.

#creating a vector of 5 values
x = c(25,50,75,100,125)
#generating a sample of 3 values
sample(x, size = 3, replace=TRUE, prob = c(0.7,0.3,0.8,0.6,0.7))

Output

[1] 75 25 25

In the above example, we have defined the probability weights of each element using the prob argument. Notice that the value 75 has the highest probability of 0.8 and thus was represented in the sample output.

Example 4: Reordering the elements inside a vector

If you use the sample() function without defining the size, the function will return the same data set but with reordering it in ascending order.

#creating a vector of 5 values
x = c(25,50,75,100,125)
#reordering the data
sample(x)

Output

[1] 25 50 100 125 75

Leave a Comment