[1] "DNA"
[1] "RNA"
[1] "UNKNOWN"
Write a function that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. RNA has the base Uracil ("u"
) instead of the base Thymine ("t"
), so sequences with u’s are RNA, sequences with t’s are DNA, and sequences with neither are unknown.
You can check if a string contains a character (or a longer substring) in R using the str_detect
function from the stringr
package: str_detect(string, substring)
, which will return TRUE
if substring
is present in string
. So str_detect(sequence, "u")
will check if the string in the sequence
variable has the base u
.
Name the function dna_or_rna()
and have it take sequence
as an argument. Have the function return one of three outputs: "DNA"
, "RNA"
, or "UNKNOWN"
. Call the function on each of the following sequences.
seq1 <- "ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg"
seq2 <- "gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau"
seq3 <- "gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc"
Challenge (optional): Figure out how to make your function work with both upper and lower case letters, or even strings with mixed capitalization.
[1] "DNA"
[1] "RNA"
[1] "UNKNOWN"