Errors in data… You can be pretty sure they exist, but how do you locate them? And if you locate them, how do you make your data error-free in the best possible way?
These questions are certainly not easy to answer, but we’ve made a module in ledc to help you with this. This new module is called sigma and introduces a generic type of rule called a sigma-rule. In simple terms, a sigma-rule provides a concise way to describe rows in your data that are not permitted. For such rules, there exists an elegant and efficient method to find minimal repairs of the data. These are error-free corrections of the data that are obtained by changing the original data only minimally. The sigma module offers several implementations of repair engines and offers a wide range of cost models to encode specific error patterns that might be present in your data. In particular, the powerfull parker engine allows to combine sigma rules with key constraints.
Code, documentation, examples and license information can be found in the sigma repository on gitlab.