Tidy Data Rules
Make it a rectangle
- Only rows and columns, no additional structure
- One column for each type of information
- One row for each observation (i.e., data point)
One cell one value
- Every cell contains one piece of information
Don’t confuse the computer
- Don’t use colors, fonts, italics, or anything visual as data. It’s hard to tell the computer to treat yellow cells or bolded numbers differently.
- Avoid spaces in names. Computers use spaces to separate commands. Use
_
or CamelCase to include multiple words.
- Avoid special characters like @ * and ^. These often mean special things to computers, which can make data harder to work with.
Be clear and consistent
- Use short meaningful names.
- Use consistent names, abbreviations, and capitalizations
- Use good null values (not -999, blanks good, some prefer NA etc. but language specific)
- Write dates as YYYY-MM-DD or have separate Year, Month, and Day columns
Use one table for each category of data
- Avoid duplicated chunks of data using multiple tables
- Use one table for each category of data
Data Entry
Quality Assurance in Excel
- Access data validation tools :
Data -> Data Validation
- Limit data types entered using
Allow
- Restrict values entered using
Data
- Restrict values to a list of choices by selectiing
List
from Allow
and either entering a list of values in Sources
separated by commas or selecting a range of cells from another sheet in the spreadsheet. This will also give you a dropdown list of values for data entry.
Entering Dates
- To avoid dates being converted into a format than can cause issues
- Select the Date column
Home
-> Number
-> Dropdown -> Text
- This will ensure that dates will remain in the form you entered them and will export properly for analysis in R or other languages