ledc

Data Quality, lightning fast

ledc-sqlmerge

Merge two rows in a database. A simple problem right?

In practice, it turns out this problem can be more complicated than you would think at first glance. When you dig deeper, the two main difficulties you will find are (a) avoiding loss of information and (b) maintaining consistency at all times.

The first problem is produced by the fact that you must try to represent information previously held by two rows in a single rows. This might not be possible (e.g., the rows you merge can differ too much) and choices must be made about which information you keep and which not. When those choices are made, it is not so difficult to turn them into an update for one row and a delete of the other row.

This leads to the second problem: doing updates and deletes make changes in databases and these potentially violate constraints that are present. Especially foreign key dependencies are troublesome in this regard and propagating the changes made by merging two rows is a tedious task.

The ledc framework offers some help in solving this problem. The new ledc module sqlmerge is a simple tool that helps you in producing a sql script that models a simple merge in a table as well as all propagated changes in other tables. It ensures “consistency-at-all-time” without make assumptions on constraint deferring. Code, documentation, examples and license information can be found in the sqlmerge repository on gitlab. If you just want to try sqlmerge on your database, you can download the executable in the registries section.