Let’s understand one of the frequently used functions, sample() in R. In data analysis, taking samples of the data is the most common process done by the analysts. To study and understand the data, sometimes taking a sample is the best way and it is mostly true in case of big data.
R offers the standard function sample() to take a sample from the datasets. Many business and data analysis problems will require taking samples from the data. The random data is generated in this process with or without replacement, which is illustrated in the below sections.
Let’s roll into the topic!!!
You may wonder, what is taking samples with replacement?
Well, while you are taking samples from a list or a data, if you specify replace=TRUE or T, then the function will allow repetition of values.
Follow the below example which clearly explains the case.
In this case, we are going to take samples without replacement. The whole concept is shown below.
In this case of without replacement, the function replace=F is used and it will not allow the repetition of values.
As you may experience that when you take the samples, they will be random and change each time. In order to avoid that or if you don’t want different samples each time, you can make use of set.seed() function.
set.seed() - set.seed function will produce the same sequence when you run it.
This case is illustrated below, execute the below code to get the same random samples each time.
In this section, we are going to generate samples from a dataset in Rstudio.
This code will take the 10 rows as a sample from the ‘ToothGrowth’ dataset and display it. In this way, you can take the samples of the required size from the dataset.
In this section, we are going to use the set.seed() function to take the samples from the dataset.
Execute the below code to generate the samples from the data set using set.seed().
You will get the same rows when you execute the code multiple times. The values won’t change as we have used the set.seed() function.
Well, we will understand this concept with the help of a problem.
Problem: A gift shop has decided to give a surprise gift to one of its customers. For this purpose, they have collected some names. The thing is to choose a random name out of the list.
Hint: use the sample() function to generate random samples.
As you can see below, every time you run this code, it generates a random sample of participant names.
With the help of the above examples and concepts, you have understood how you can generate random samples and extract specific data from a dataset.
Some of you may feel relaxed if I say that R allows you to set the probabilities, as it may solve many problems. Let’s see how it works with the help of a simple example.
Let’s think of a company that is able to manufacture 10 watches. Among these 10 watches, 20% of them are found defective. Let’s illustrate this with the help of the below code.
You can also try for different probability adjustments as shown below.
In this tutorial, you have learned how to generate the sample from the dataset, vector, and a list with or without replacement. The set.seed() function is helpful when you are generating the same sequence of samples.
Try taking samples from various datasets available in R and also you can import some CSV files to take samples with probability adjustments as shown.
More study: R documentation
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.