Produces warnings if any checks fail (stays silent on success).
This function is simply a wrapper for several calls to data_check_table
and
data_foreign_key
. Rules are designed to ensure:
primary keys are unique and non-missing
all required variables are present
variables only contain prescribed values
foreign keys are present in relevant primary key table (e.g., all sale$cust_id can be found in cust$cust_id)
data_check(cust, lic, sale)
cust | data frame: customer table (primary key = "cust_id") |
---|---|
lic | data frame: license types table (primary key = "lic_id") |
sale | data frame: transactions table (foreign keys = "lic_id", "cust_id") |
Other functions to check data format: data_check_table
,
data_foreign_key
,
data_internal
,
variable_allowed_values
library(dplyr) data(cust, lic, sale) # a successful check passes silently data_check(cust, lic, sale) # introduce some warnings cust <- filter(cust, cust_id > 5) cust <- bind_rows(cust, cust) cust$res[1] <- "Canada" lic$lic_id[1] <- NA sale$month <- NULL sale$year[1] <- "-2010" sale$year[2] <- 0 data_check(cust, lic, sale)#> Warning: cust: Primary key (cust_id) not unique: 29995 keys and 59990 rows#> Warning: cust: Primary key (cust_id) is missing 5 value(s) present in the sale table#> Warning: lic: Primary key (lic_id) contains missing values#> Warning: lic: Primary key (lic_id) is missing 1 value(s) present in the sale table#> Warning: sale: 1 Missing variable(s): month#> Warning: sale$year: Contains values that aren't allowed: -2010, 0