[1] "DNA" "RNA" "UNKNOWN" "RNA" "RNA"
Making choices dna or rna iteration (solution)
Exercise
This is a follow-up to DNA or RNA.
Write a function, dna_or_rna(sequence)
, that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. Since all the function will know about the material is the sequence the only way to tell the difference between DNA and RNA is that RNA has the base Uracil ("u"
) instead of the base Thymine ("t"
). Have the function return one of three outputs: "DNA"
, "RNA"
, or "UNKNOWN"
.
Copy and paste the following sequence data into your script:
sequences = c("ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg", "gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau", "gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc", "guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu", "gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc")
- Use the function you wrote and a
for
loop to create a vector of sequence types for the values insequences
- Use the function and a
for
loop to create a data frame that includes a column of sequences and a column of their types - Use the function and
sapply
to create a vector of sequence types for the values insequences
- Use the function, and
dplyr
to create a data frame that inclues a column of sequences and a column of their types
Optional: For a little extra challenge make your function work with both upper and lower case letters, or even strings with mixed capitalization
Output solution
sequences
1 ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg
2 gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau
3 gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc
4 guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu
5 gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc
sequence_types
1 DNA
2 RNA
3 UNKNOWN
4 RNA
5 RNA
ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg
"DNA"
gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau
"RNA"
gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc
"UNKNOWN"
guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu
"RNA"
gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc
"RNA"
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# A tibble: 5 × 2
# Rowwise:
sequence sequence_types
<chr> <chr>
1 ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaa… DNA
2 gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuuc… RNA
3 gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc UNKNOWN
4 guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuua… RNA
5 gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucaugua… RNA