Prints a warning if any of the specied formatting rules don't pass (silent otherwise).
Table-specific versions are convenience functions that call data_check_table()
with appropriate defaults. The data_check
function is a wrapper
that calls all 3 versions (cust, lic, sale) together.
data_check_table(df, df_name, primary_key, required_vars, allowed_values) data_check_cust(df, df_name = "cust", primary_key = "cust_id", required_vars = c("cust_id", "sex", "birth_year"), allowed_values = list(sex = c(1, 2, NA), birth_year = c(1900:substr(Sys.Date(), 1, 4), NA))) data_check_lic(df, df_name = "lic", primary_key = "lic_id", required_vars = c("lic_id", "type", "duration"), allowed_values = list(type = c("fish", "hunt", "combo"), duration = 1:99)) data_check_sale(df, df_name = "sale", primary_key = NULL, required_vars = c("cust_id", "lic_id", "year", "month", "res"), allowed_values = list(year = c(2000:substr(Sys.Date(), 1, 4)), month = 1:12, res = c(1, 0, NA)))
df | data frame: table to check |
---|---|
df_name | character: name of relevant data table ("cust", "lic", or "sale") |
primary_key | character: name of variable that acts as primary key, which should be unique and non-missing. NULL indicates no primary key in table. |
required_vars | character: variables that should be included |
allowed_values | list: named list with allowed values for specific variables |
Developer note: data_check_table() is itself a wrapper
for several internal functions (see data_internal
).
Other functions to check data format: data_check
,
data_foreign_key
,
data_internal
,
variable_allowed_values
library(dplyr) # produce various format warnings data(cust) bind_rows(cust, cust) %>% data_check_cust()#> Warning: cust: Primary key (cust_id) not unique: 30000 keys and 60000 rowscust$birth_year[1] <- 2100 data_check_cust(cust)#> Warning: cust$birth_year: Contains values that aren't allowed: 2100#> Warning: lic: 1 Missing variable(s): duration#> Warning: lic$duration: Contains values that aren't allowed: 0data(sale) sale$year[1] <- NA data_check_sale(sale)#> Warning: sale$year: Contains values that aren't allowed: NA