Data Entry and Storage (reference card)

Tidy Data Rules

Make it a rectangle

  • Only rows and columns, no additional structure
  • One column for each type of information
  • One row for each observation (i.e., data point)

One cell one value

  • Every cell contains one piece of information

Don’t confuse the computer

  • Don’t use colors, fonts, italics, or anything visual as data. It’s hard to tell the computer to treat yellow cells or bolded numbers differently.
  • Avoid spaces in names. Computers use spaces to separate commands. Use _ or CamelCase to include multiple words.
  • Avoid special characters like @ * and ^. These often mean special things to computers, which can make data harder to work with.

Be clear and consistent

  • Use short meaningful names.
  • Use consistent names, abbreviations, and capitalizations
  • Use good null values (not -999, blanks good, some prefer NA etc. but language specific)
  • Write dates as YYYY-MM-DD or have separate Year, Month, and Day columns

Use one table for each category of data

  • Avoid duplicated chunks of data using multiple tables
  • Use one table for each category of data

Export data into easy to read formats

  • Save data in plain text files.

Data Entry

Quality Assurance in Excel

  • Access data validation tools : Data -> Data Validation
  • Limit data types entered using Allow
  • Restrict values entered using Data
  • Restrict values to a list of choices by selectiing List from Allow and either entering a list of values in Sources separated by commas or selecting a range of cells from another sheet in the spreadsheet. This will also give you a dropdown list of values for data entry.

Entering Dates

  • To avoid dates being converted into a format than can cause issues
  • Select the Date column
  • Home -> Number -> Dropdown -> Text
  • This will ensure that dates will remain in the form you entered them and will export properly for analysis in R or other languages