Dplyr shrub volume data basics (solution)

Exercise

Dr. Morales is interested in studying the factors controlling the size and carbon storage of shrubs. She has conducted an experiment looking at the effect of three different treatments on shrub volume at four different locations. She has placed the data file on the web for you to download:

If the file shrub-volume-data.csv is not already in your working directory (it probably is if you’re taking this class using Posit Cloud) then download it into your working directory.

Get familiar with the data by importing it using read_csv() and use dplyr to complete the following tasks.

  1. Select the data from the length column (using select).
  2. Select the data from the site and experiment columns (using select).
  3. Add a new column named area containing the area of the shrub, which is the length times the width (using mutate).
  4. Sort the data by length (using arrange).
  5. Filter the data to include only plants with heights greater than 5 (using filter).
  6. Filter the data to include only plants with heights greater than 4 and widths greater than 2 (using , or & to include two conditions).
  7. Filter the data to include only plants from Experiment 1 or Experiment 3 (using | for “or”).
  8. Remove rows with null values in the height column (using drop_na)
  9. Create a new data frame called shrub_volumes that includes all of the original data and a new column containing the volumes (length * width * height), and display it.
Output solution

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Rows: 15 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (5): site, experiment, length, width, height

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Select the data from the length column and print it out.
# A tibble: 15 × 1
   length
    <dbl>
 1    2.2
 2    2.1
 3    2.7
 4    3  
 5    3.1
 6    2.5
 7    1.9
 8    1.1
 9    3.5
10    2.9
11    4.5
12    1.2
13    2.6
14    1.8
15    3.1
  1. Select the data from the site and experiment columns and print it out.
# A tibble: 15 × 2
    site experiment
   <dbl>      <dbl>
 1     1          1
 2     1          2
 3     1          3
 4     2          1
 5     2          2
 6     2          3
 7     3          1
 8     3          2
 9     3          3
10     4          1
11     4          2
12     4          3
13     5          1
14     5          2
15     5          3
  1. Add a new column named area containing the area of the shrub, which is the width times the height (using mutate).
# A tibble: 15 × 6
    site experiment length width height  area
   <dbl>      <dbl>  <dbl> <dbl>  <dbl> <dbl>
 1     1          1    2.2   1.3    9.6  2.86
 2     1          2    2.1   2.2    7.6  4.62
 3     1          3    2.7   1.5    2.2  4.05
 4     2          1    3     4.5    1.5 13.5 
 5     2          2    3.1   3.1    4    9.61
 6     2          3    2.5   2.8    3    7   
 7     3          1    1.9   1.8    4.5  3.42
 8     3          2    1.1   0.5    2.3  0.55
 9     3          3    3.5   2      7.5  7   
10     4          1    2.9   2.7    3.2  7.83
11     4          2    4.5   4.8    6.5 21.6 
12     4          3    1.2   1.8    2.7  2.16
13     5          1    2.6   0.8   NA    2.08
14     5          2    1.8  NA      5.2 NA   
15     5          3    3.1   2.2   NA    6.82
  1. Sort the data by length (using arrange).
# A tibble: 15 × 5
    site experiment length width height
   <dbl>      <dbl>  <dbl> <dbl>  <dbl>
 1     3          2    1.1   0.5    2.3
 2     4          3    1.2   1.8    2.7
 3     5          2    1.8  NA      5.2
 4     3          1    1.9   1.8    4.5
 5     1          2    2.1   2.2    7.6
 6     1          1    2.2   1.3    9.6
 7     2          3    2.5   2.8    3  
 8     5          1    2.6   0.8   NA  
 9     1          3    2.7   1.5    2.2
10     4          1    2.9   2.7    3.2
11     2          1    3     4.5    1.5
12     2          2    3.1   3.1    4  
13     5          3    3.1   2.2   NA  
14     3          3    3.5   2      7.5
15     4          2    4.5   4.8    6.5
  1. Filter the data to include only plants with heights greater than 5 (using filter).
# A tibble: 5 × 5
   site experiment length width height
  <dbl>      <dbl>  <dbl> <dbl>  <dbl>
1     1          1    2.2   1.3    9.6
2     1          2    2.1   2.2    7.6
3     3          3    3.5   2      7.5
4     4          2    4.5   4.8    6.5
5     5          2    1.8  NA      5.2
  1. Filter the data to include only plants with heights greater than 4 and widths greater than 2 (using filter).
# A tibble: 2 × 5
   site experiment length width height
  <dbl>      <dbl>  <dbl> <dbl>  <dbl>
1     1          2    2.1   2.2    7.6
2     4          2    4.5   4.8    6.5
  1. Filter the data to include only plants from Experiment 1 or Experiment 3.
# A tibble: 10 × 5
    site experiment length width height
   <dbl>      <dbl>  <dbl> <dbl>  <dbl>
 1     1          1    2.2   1.3    9.6
 2     1          3    2.7   1.5    2.2
 3     2          1    3     4.5    1.5
 4     2          3    2.5   2.8    3  
 5     3          1    1.9   1.8    4.5
 6     3          3    3.5   2      7.5
 7     4          1    2.9   2.7    3.2
 8     4          3    1.2   1.8    2.7
 9     5          1    2.6   0.8   NA  
10     5          3    3.1   2.2   NA  
  1. Remove rows with null values in the height column (using drop_na)
# A tibble: 13 × 5
    site experiment length width height
   <dbl>      <dbl>  <dbl> <dbl>  <dbl>
 1     1          1    2.2   1.3    9.6
 2     1          2    2.1   2.2    7.6
 3     1          3    2.7   1.5    2.2
 4     2          1    3     4.5    1.5
 5     2          2    3.1   3.1    4  
 6     2          3    2.5   2.8    3  
 7     3          1    1.9   1.8    4.5
 8     3          2    1.1   0.5    2.3
 9     3          3    3.5   2      7.5
10     4          1    2.9   2.7    3.2
11     4          2    4.5   4.8    6.5
12     4          3    1.2   1.8    2.7
13     5          2    1.8  NA      5.2
  1. Create a new data frame called shrub_volumes that includes all of the original data and a new column containing the volumes (length * width * height), and display it.
# A tibble: 15 × 6
    site experiment length width height volume
   <dbl>      <dbl>  <dbl> <dbl>  <dbl>  <dbl>
 1     1          1    2.2   1.3    9.6  27.5 
 2     1          2    2.1   2.2    7.6  35.1 
 3     1          3    2.7   1.5    2.2   8.91
 4     2          1    3     4.5    1.5  20.2 
 5     2          2    3.1   3.1    4    38.4 
 6     2          3    2.5   2.8    3    21   
 7     3          1    1.9   1.8    4.5  15.4 
 8     3          2    1.1   0.5    2.3   1.26
 9     3          3    3.5   2      7.5  52.5 
10     4          1    2.9   2.7    3.2  25.1 
11     4          2    4.5   4.8    6.5 140.  
12     4          3    1.2   1.8    2.7   5.83
13     5          1    2.6   0.8   NA    NA   
14     5          2    1.8  NA      5.2  NA   
15     5          3    3.1   2.2   NA    NA