Portal Data Iteration Without Loops (solution)

Exercise

This exercise covers iteration without loops in R using Portal data. You’ll practice vectorization, apply functions, and integration with dplyr using real ecological data from the Portal Project.

If surveys.csv, species.csv, and plots.csv are not in your working directory then download them.

Load the three data files using read_csv.

1. Create a vectorized function called estimate_metabolic_rate that takes weight as input and returns metabolic rate using the equation: metabolic_rate = 0.073 * weight ^ 0.75. Run it on the following vector:

weights <- c(15, 25, 35, 45, 20, 70, 72).

2. Use mutate() and estimate_metabolic_rate to create a version of the data in surveys with a column called metabolic_rate for all animals that have weight measurements. Remove the rows without metabolic rates. Select the year, species_id, and metabolic_rate columns.

3. Create a function called classify_by_weight that takes a single weight value and returns: - “small” if weight < 20g - “medium” if weight is 20-50g - “large” if weight > 50g - “unknown” if weight is missing (NA)

Use sapply to apply classify_by_weight to the weights vector from (1).

4. Use mutate, classify_by_weight, and the surveys table to produce a data frame that has data on the year, plot_id, species_id, and weight_class (where weight_class is the output of classify_by_weight). Join this data with the plots table to add information on plot_type. Filter the data to only include data where plot_type is “Control”.

5. Group the results of (4) based on plot_id and weight_class (using group_by) and count the number of individuals in each group (using summarize).

6. Create a function called energy_budget() that takes genus, species, and weight as inputs (you’ll need to join the surveys and species tables to get this data together). It should return daily energy needs for each individual in surveys based on the values of genus and species using the following equations: - If genus is “Dipodomys” : energy = 0.065 * avg_weight ^ 0.75 * 24 - If genus is “Chaetodipus” and species is “penicillatus”: energy = 0.080 * avg_weight ^ 0.75 * 24 - If genus is “Chaetodipus” and species is “baileyi”: energy = 0.26 * avg_weight ^ 0.75 * 24 - All other species: energy = 0.073 * avg_weight ^ 0.75 * 24

Run the function with mapply() and the following inputs: - genus: c("Dipodomys", "Chaetodipus", "Neotoma") - species: c("merriami", "penicillatus", "albigula") - weight: c(45, 22, 156)

7. Use mutate and rowwise to calculate energy budget for each individual in surveys. Drop rows with NA for the new energy_budget column. Group and summarize the data to get an total energy budget for each combination of year, month, and day by summing all of the values of energy_budget in each group.

Output solution


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Rows: 35549 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): species_id, sex
dbl (7): record_id, month, day, year, plot_id, hindfoot_length, weight

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 54 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): species_id, genus, species, taxa

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 24 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): plot_type
dbl (1): plot_id

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Create a vectorized function for metabolic rate:

[1] 0.5564054 0.8161648 1.0504464 1.2683299 0.6903914 1.7666332 1.8043560

Add metabolic rate column to surveys data:

# A tibble: 32,283 × 3
    year species_id metabolic_rate
   <dbl> <chr>               <dbl>
 1  1977 DM                  1.16 
 2  1977 DM                  1.33 
 3  1977 DM                  0.912
 4  1977 DM                  1.29 
 5  1977 DM                  1.07 
 6  1977 DO                  1.41 
 7  1977 PF                  0.347
 8  1977 OX                  0.742
 9  1977 DM                  1.05 
10  1977 PF                  0.314
# ℹ 32,273 more rows

Create classification function and test it:

[1] "small"  "medium" "medium" "medium" "medium" "large"  "large"

Join with plots data and filter for Control plots:

# A tibble: 15,660 × 5
# Rowwise: 
    year plot_id species_id weight_class plot_type
   <dbl>   <dbl> <chr>      <chr>        <chr>    
 1  1977       2 NL         unknown      Control  
 2  1977       2 DM         unknown      Control  
 3  1977       2 PE         unknown      Control  
 4  1977       8 DM         unknown      Control  
 5  1977       4 DM         unknown      Control  
 6  1977       2 PP         unknown      Control  
 7  1977       4 PF         unknown      Control  
 8  1977      11 DS         unknown      Control  
 9  1977      14 DM         unknown      Control  
10  1977      11 DM         unknown      Control  
# ℹ 15,650 more rows

Group by plot_id and weight_class and count:

# A tibble: 32 × 3
   plot_id weight_class count
     <dbl> <chr>        <int>
 1       2 large          516
 2       2 medium        1271
 3       2 small          287
 4       2 unknown        120
 5       4 large          404
 6       4 medium        1191
 7       4 small          271
 8       4 unknown        103
 9       8 large          480
10       8 medium        1030
# ℹ 22 more rows

Create energy budget function and test with mapply:

  Dipodomys Chaetodipus     Neotoma 
   27.10404    19.50376    77.33526

Use mutate and rowwise to calculate energy budget for each individual in surveys.

`summarise()` has grouped output by 'year', 'month'. You can override using the
`.groups` argument.

# A tibble: 624 × 4
# Groups:   year, month [281]
    year month   day total_energy_budget
   <dbl> <dbl> <dbl>               <dbl>
 1  1977     8    19               294. 
 2  1977     8    20               555. 
 3  1977     8    21                45.3
 4  1977     9    11               465. 
 5  1977     9    12               411. 
 6  1977     9    13               248. 
 7  1977    10    16               756. 
 8  1977    10    17               618. 
 9  1977    10    18               205. 
10  1977    11    12               759. 
# ℹ 614 more rows