Portal data aggregation (solution)

Exercise

If the file surveys.csv is not already in your working directory download it.

Load surveys.csv into R using read_csv().

  1. Use the group_by() and summarize() functions to get a count of the number of individuals in each species ID.
  2. Use the group_by() and summarize() functions to get a count of the number of individuals in each species ID in each year.
  3. Use the filter(), drop_na(), group_by(), and summarize() functions to get the mean mass of species DO in each year.
Output solution

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Rows: 35549 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): species_id, sex
dbl (7): record_id, month, day, year, plot_id, hindfoot_length, weight

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  1. Use the group_by() and summarize() functions to get a count of the number of individuals in each species ID.
# A tibble: 49 × 2
   species_id count
   <chr>      <int>
 1 AB           303
 2 AH           437
 3 AS             2
 4 BA            46
 5 CB            50
 6 CM            13
 7 CQ            16
 8 CS             1
 9 CT             1
10 CU             1
# ℹ 39 more rows
  1. Use the group_by() and summarize() functions to get a count of the number of individuals in each species ID in each year.
`summarise()` has grouped output by 'species_id'. You can override using the
`.groups` argument.
# A tibble: 535 × 3
# Groups:   species_id [49]
   species_id  year count
   <chr>      <dbl> <int>
 1 AB          1980     5
 2 AB          1981     7
 3 AB          1982    34
 4 AB          1983    41
 5 AB          1984    12
 6 AB          1985    14
 7 AB          1986     5
 8 AB          1987    35
 9 AB          1988    39
10 AB          1989    31
# ℹ 525 more rows
  1. Use the filter(), group_by(), and summarize() functions to get the mean mass of species DO in each year.
# A tibble: 26 × 2
    year avg_mass
   <dbl>    <dbl>
 1  1977     42.7
 2  1978     45  
 3  1979     45.9
 4  1980     48.1
 5  1981     49.1
 6  1982     47.9
 7  1983     47.2
 8  1984     48.4
 9  1985     48.0
10  1986     49.4
# ℹ 16 more rows