Prints a warning if any of the specied formatting rules don't pass (silent otherwise). Table-specific versions are convenience functions that call data_check_table() with appropriate defaults. The data_check function is a wrapper that calls all 3 versions (cust, lic, sale) together.

data_check_table(df, df_name, primary_key, required_vars, allowed_values)

data_check_cust(df, df_name = "cust", primary_key = "cust_id",
  required_vars = c("cust_id", "sex", "birth_year"),
  allowed_values = list(sex = c(1, 2, NA), birth_year =
  c(1900:substr(Sys.Date(), 1, 4), NA)))

data_check_lic(df, df_name = "lic", primary_key = "lic_id",
  required_vars = c("lic_id", "type", "duration"),
  allowed_values = list(type = c("fish", "hunt", "combo"), duration =
  1:99))

data_check_sale(df, df_name = "sale", primary_key = NULL,
  required_vars = c("cust_id", "lic_id", "year", "month", "res"),
  allowed_values = list(year = c(2000:substr(Sys.Date(), 1, 4)), month =
  1:12, res = c(1, 0, NA)))

Arguments

df

data frame: table to check

df_name

character: name of relevant data table ("cust", "lic", or "sale")

primary_key

character: name of variable that acts as primary key, which should be unique and non-missing. NULL indicates no primary key in table.

required_vars

character: variables that should be included

allowed_values

list: named list with allowed values for specific variables

Details

Developer note: data_check_table() is itself a wrapper for several internal functions (see data_internal).

See also

Other functions to check data format: data_check, data_foreign_key, data_internal, variable_allowed_values

Examples

library(dplyr) # produce various format warnings data(cust) bind_rows(cust, cust) %>% data_check_cust()
#> Warning: cust: Primary key (cust_id) not unique: 30000 keys and 60000 rows
cust$birth_year[1] <- 2100 data_check_cust(cust)
#> Warning: cust$birth_year: Contains values that aren't allowed: 2100
data(lic) select(lic, -duration) %>% data_check_lic()
#> Warning: lic: 1 Missing variable(s): duration
mutate(lic, duration = 0) %>% data_check_lic()
#> Warning: lic$duration: Contains values that aren't allowed: 0
data(sale) sale$year[1] <- NA data_check_sale(sale)
#> Warning: sale$year: Contains values that aren't allowed: NA