![]() I advocate a consolidation area (PSA) that is an exact copy of the source system with all unique versions of all records that have ever been retrieved. It is often easier to use PSA as the source of comparison, but if you don’t have a PSA then you may need to look for other options.ĭepending on your architecture, you likely have one of three types of layers between the source system and the target dimensional model. If your source system is being backed up regularly, then you may try to detect deletions by comparing prior versions of the source system to the current version of the source system. Of course, the trigger will be created on the source system so you will need adequate privileges. Personally, I don’t recommend it, but it is an option. This method may or may not be supported by your source system and has some known limitations. Use triggers to store deleted record details. Again, you will need source system authority to implement this feature. Check with your source system to see if this is an option. Unlike CDS, however, last change only persist the “last change” to each record. Like CDC, last change will automatically store record manipulation details. To implement this you will need to have adequate control over the source system. We can then query the CDC data to pull out deletes directly. Change data capture can records all record manipulations for you automatically as they occur. You will need to understand how logs function in the source system database to determine if this is an option. There is a chance that the deleted record still exist in the source system transaction log. Yes, this can be a very expensive process, but with some filters and fancy comparison logic we can often find ways to limit the comparison to only records with timestamps newer than the prior period comparison. Anything that exist in PSA, but does not exist in the source has been deleted. This means that the real question is, “what has been deleted since I last collected source records?” To determine this we can compare every record in our consolidation area (PSA) to what still exist in the source system. The idea is that you have collected records from the source system on a regular interval. Again, I’m not going into implementation details. So, if the record is gone, then how can we detect this deletion? Here are a few options. Hard Delete with No Trail – This scenario here is that the source system has hard deleted a record and left no trace of the record. You may need to do a little fishing, but in this case detecting a deleted record is just a matter of finding the history which may reside outside of the original location. Hard Delete with Audit Trail – Some source systems will delete records, but will also story this history in some way to support undo and audit trail type functionality. Detecting deletes means that you simply need to select the records where the record has been flagged as having been deleted. What is a soft delete? Instead of deleting a record, the record is marked as being deleted typically by a dedicated column. Source System Soft Deletes – If you are lucky enough to have a source system that soft deletes records, then I recommend buying a lottery ticket right away. I’ll consider focusing on individual methods in future posts. ![]() This post is not going to go into the implementation details of each method. Below is a list of some of the most common methods. There are a number of ways to detect deleted records in source systems. Should this data be deleted? How do we maintain referential integrity? Model Impact – Our dimensional model includes data that was deleted in the source system.How do we handle deleted records in the historical store? Consolidation – Between the source system and the target dimensional model should include a consolidation area in most designs.Detection – The record has been deleted from the source system. ![]() Dealing with deleted records can be broken down into three phases. Deletes, on the other hand, require special attention and are the focus of this post. Inserts and updates are pretty straight forward to detect and plan for. ![]() Three types of changes can occur Insert, Update, and Delete. Data warehousing has a clear set of objectives such as data persistence, single easy to navigate data model, fast query performance, etc… While it is not the role of the data warehouse to mimic the data in source systems, the data warehouse clearly must account for changes in the source system.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |