ledc

Data Quality, lightning fast

rulebox

We have a new tool in the ledc framework! Rulebox, a toolbox for rules, combines the powers of many of the ledc repositories in one single place. Here are the main features at a glance:

  • It offers exploration of a dataset, from simple statistics to algorithms for finding outliers and generation or rules. Currently, we support Postgres databases and csv files.

  • There is support for encoding validations with regexes and grexes, sigma rules and functional dependencies in a simple text-based file format (.rbx). Once you have such a file, you can immediately generate simple reports on where constraints are violated.

  • It can communicate with instances of ledc server to find information about validation.

  • A wide range of repair engines to produce corrections of a dataset are at the tips of your fingers.

  • There is support for a wide range of reasoning steps on constraints, including database normalization and Field Code Forest (FCF) variable elimination.

Code, documentation and license information can be found in the rulebox repository on gitlab. A precompiled jar file is available in the package registry.