Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Mobile

Synchronizing Schema Changes and Managing Referential Integrity Violations


Dealing With Referential Integrity Violations

One of the consequences of making schema and other changes in an evolving research and development environment is the inevitable introduction of program and design errors, including referential integrity violations that occur only on the remote database. This can happen because the remote database holds a subset of the rows that exist on the consolidated; a critical row may be present on the consolidated but missing on the remote, resulting in a referential integrity or foreign key violation.

The MobiLink synchronization software is designed to be robust and fault tolerant in practice, and to be easy to administer without DBAs at remote sites. One of results of this design goal is that MobiLink, by default, silently handles referential integrity violations that occur on the remote database while data is being downloaded. It does this by deleting any rows that get in the way of referential integrity.

Listing 8 shows an example of this behavior. The "verbose diagnostic logging" option has been turned on to reveal that two referential integrity violations were detected on the remote database during the download, and that the offending rows in sbx_data_access and sbx_beeper were deleted.

I. 2006-07-30 11:16:08.
   Resolving referential integrity violations on table
      sbx_data_access, role FK_SBX_DATA_REF_SBX_CONT
I. 2006-07-30 11:16:08.
      delete from "DBA"."sbx_data_access" from "DBA"."sbx_data_access" ft
      where not exists ( select 1 from "DBA"."sbx_contact" pt
      where ( ft."contact_db_id" = pt."contact_db_id" )
      and ( ft."contact_id" = pt."contact_id" ) )
      and ( ft."contact_db_id" is not null
      and ft."contact_id" is not null )
I. 2006-07-30 11:16:08.
      1 rows deleted.
I. 2006-07-30 11:16:08.
   Resolving referential integrity violations on table
      sbx_beeper, role FK_SBX_BEEP_REF_SBX_FIEL
I. 2006-07-30 11:16:09.
      delete from "DBA"."sbx_beeper" from "DBA"."sbx_beeper" ft
      where not exists ( select 1 from "DBA"."sbx_field_case" pt
      where ( ft."field_case_id" = pt."field_case_id" ) )
      and ( ft."field_case_id" is not null )
I. 2006-07-30 11:16:09.

   1 rows deleted.
Listing 8: MOBILINK RESOLVES RI VIOLATIONS ON THE REMOTE DATABASE

Silently handling RI violations isn't always desirable. In fact, even logging the problem at the remote site might not be good enough when administration is being carried out centrally at the consolidated database. At Simbex, this challenge was met by implementing the RI violation tracking table called dbmlsync_ri_violation in Lisitng 9. MobiLink was allowed to continue resolving the violations, but the synchronization process was customized to record each violation in this tracking table and to upload that information to the consolidated database.

CREATE TABLE dbmlsync_ri_violation (
   ml_username VARCHAR ( 128 ) NOT NULL,
   happened_at TIMESTAMP NOT NULL DEFAULT timestamp,
   unique_id UNSIGNED BIGINT NOT NULL
   DEFAULT global autoincrement(1000000),
   publication_name VARCHAR ( 128 ) NOT NULL,
   foreign_key_table_name VARCHAR ( 128 ) NOT NULL,
   primary_key_table_name VARCHAR ( 128 ) NOT NULL,
   role_name VARCHAR ( 128 ) NOT NULL,
   script_version VARCHAR ( 128 ) NOT NULL,
   query_to_run_on_consolidated_database LONG VARCHAR NOT NULL,
   PRIMARY KEY ( ml_username, happened_at, unique_id ) );
Listing 9: THE RI VIOLATION TRACKING TABLE

Rows in this tracking table answer two important questions: Which RI constraints were violated, and which rows were the troublemakers? The first question is answered by the foreign_key_table_name, primary_key_table_name, and role_name columns. The second (and often most important) question is answered by the query_to_run_on_consolidated_database column which contains actual SELECT statements that can be run on the consolidated database to display the rows that caused problems on the remote database.

For the violations shown earlier in Listing 8, the values in query_to_run_on_consolidated_database look like this:

SELECT * FROM sbx_data_access
   WHERE STRING ( sbx_data_access.data_access_db_id, ', ', sbx_data_access.data_access_id ) = '2, 17'
SELECT * FROM sbx_beeper
   WHERE STRING ( sbx_beeper.beeper_id ) = '1'

Here are the other columns in the tracking table:

ml_username identifies the remote database.
happened_at is the date/time of the RI violation.
unique_id serves to guarantee uniqueness for the tracking table's primary key.
publication_name identifies which MobiLink publication is involved.
foreign_key_table_name names the child table.
primary_key_table_name names the parent.
role_name is the name of the foreign key constraint.
script_version identifies which MobiLink script version is in use.

Rows are inserted in the tracking table by another "hook" procedure, called sp_hook_dbmlsync_download_log_ri_violation, in Listing 10. When a procedure with this name exists in the remote database, the MobiLink client component dbmlsync.exe calls it whenever an RI violation is detected.

   CREATE PROCEDURE sp_hook_dbmlsync_download_log_ri_violation()
   BEGIN
   DECLARE @child_table_name VARCHAR ( 128 );
   DECLARE @parent_table_name VARCHAR ( 128 );
   DECLARE @role_name VARCHAR ( 128 );
   DECLARE @sql LONG VARCHAR;
   DECLARE @child_primary_key_column_name_list LONG VARCHAR;
   DECLARE LOCAL TEMPORARY TABLE #child_primary_key_column_value_list (
      child_primary_key_column_value_list LONG VARCHAR );
-- 1. Determine the offending foreign key relationship.
   SET @child_table_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Foreign key table' );
   SET @parent_table_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Primary key table' );
   SET @role_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Role name' );
-- 2. Build the child primary key column name list: CT.
   CT_PCOL, ', ', CT.CT_PCOL,
   SELECT LIST ( STRING ( @child_table_name,
             '.',
             child_primary_key_column_name ),
      ', '', '', ' ORDER BY child_primary_key_column_order )
   INTO @child_primary_key_column_name_list
   FROM rroad_v_child_primary_key
WHERE parent_table_name = @parent_table_name
   AND child_table_name = @child_table_name
   AND role_name = @role_name;
-- 3. Build and run the INSERT SELECT to get one or more offending child
   -- primary key column value strings into the temporary table
   -- #child_primary_key_column_value_list.
   SELECT STRING ( 'INSERT #child_primary_key_column_value_list SELECT STRING ( ',
      @child_primary_key_column_name_list,

      ' ) \x0d\x0a FROM ',
      @child_table_name,
      ' WHERE NOT EXISTS ( SELECT * FROM ',
      @parent_table_name,
      '\x0d\x0a WHERE ',
         LIST ( STRING ( @parent_table_name,
         '.',
             parent_primary_key_column_name,
             ' = ',
             @child_table_name,
             '.',
             child_foreign_key_column_name ),
         ' AND ' ORDER BY parent_primary_key_column_order ),
       '\x0d\x0a AND ',
       LIST ( STRING ( @child_table_name,
            '.',
            child_foreign_key_column_name,
            ' IS NOT NULL ' ),
       'AND ' ORDER BY parent_primary_key_column_order ),
       ' ) ' )
     INTO @sql
     FROM rroad_v_foreign_key
   WHERE parent_table_name = @parent_table_name
     AND child_table_name = @child_table_name
     AND role_name = @role_name
   GROUP BY parent_table_name,
        child_table_name,
        role_name;
   EXECUTE IMMEDIATE @sql;
-- 4. Insert one row in dbmlsync_ri_violation for each offending child row.
INSERT dbmlsync_ri_violation (
   ml_username,
   publication_name,
   foreign_key_table_name,
   primary_key_table_name,
   role_name,
   script_version,
   query_to_run_on_consolidated_database )
SELECT ( SELECT value FROM #hook_dict WHERE name = 'MobiLink user' ),
   ( SELECT value FROM #hook_dict WHERE name = 'publication_0' ),
   @child_table_name,
   @parent_table_name,
   @role_name,
   ( SELECT value FROM #hook_dict WHERE name = 'script version' ),
   STRING ( 'SELECT * FROM ',
      @child_table_name,
      ' WHERE STRING ( ',
      @child_primary_key_column_name_list,
      ' ) = ''',
      REPLACE ( child_primary_key_column_value_list, '''', '''''' ),
      '''' )
   FROM #child_primary_key_column_value_list;
END; -- sp_hook_dbmlsync_download_log_ri_violation

Listing 10: "HOOK" PROCEDURE TO RECORD RI VIOLATIONS

The procedure in Listing 10 is constructed to be schema-independent; in other words, it does not depend on any special knowledge of the Simbex database design, so it should not be affected by changes to that design. The processing is done step-by-step:

Step 1 handles that fact that MobiLink passes parameters to the hook procedures indirectly, via a special temporary table called #hook_dict. Rows in this table contain parameter name and value pairs, with the parameter names depending on which hook procedure is being executed. In this case, the parameters of interest are the parent and child table names and the foreign key role or constraint name involved in the RI violation.

Step 2 uses the LIST aggregate function and the STRING scalar function to construct a specially-formatted string of primary key column names from the child table, for use in later processing. The LIST function works like other aggregate function such as SUM in that it processes a group of rows to return a single value. Unlike other aggregate functions, however, LIST preserves information gathered from the individual rows, and returns that information in a list. Here is the syntax: LIST ( element, separator ORDER BY order_by ) where one or more element expressions are gathered into an ordered list with separators between each element.

The STRING function converts each argument to a string and concatenates them all into one string return value. This function is extremely valuable for building strings in SQL, especially strings that contain SQL commands that are to be passed to EXECUTE IMMEDIATE. In this case STRING is used to build each LIST element as a child primary key column name dot-qualified with the table name.

Step 2 also uses a custom-written view called rroad_v_child_primary to get the child primary key column names involved in the RI violation by selecting information from the SQL Anywhere system catalog tables:

CREATE VIEW rroad_v_child_primary_key AS
SELECT parent_systable.table_name AS parent_table_name,
   child_systable.table_name AS child_table_name,
   child_pkey_syscolumn.column_name AS child_primary_key_column_name,
   child_pkey_syscolumn.column_id AS child_primary_key_column_order,
   sysforeignkey.role AS role_name
FROM SYS.SYSTABLE AS parent_systable
       INNER JOIN SYS.SYSFOREIGNKEY AS sysforeignkey
          ON parent_systable.table_id = sysforeignkey.primary_table_id
     INNER JOIN SYS.SYSTABLE AS child_systable
         ON child_systable.table_id = sysforeignkey.foreign_table_id
     INNER JOIN SYS.SYSCOLUMN AS child_pkey_syscolumn
        ON child_systable.table_id = child_pkey_syscolumn.table_id
WHERE child_pkey_syscolumn.pkey = 'Y';

Step 3 uses the string from Step 2, plus more calls to LIST and STRING, to fill the string variable @sql with an INSERT statement that looks like this template:

INSERT #child_primary_key_column_value_list
SELECT STRING ( CT.CT_PCOL, ', ', CT.CT_PCOL )
FROM CT
WHERE NOT EXISTS ( SELECT *
   FROM PT
WHERE PT.PT_PCOL = CT.CT_FCOL
         AND PT.PT_PCOL = CT.CT_FCOL
         AND CT.CT_FCOL IS NOT NULL
         AND CT.CT_FCOL IS NOT NULL );

The template above contains some abbreviations to represent actual values in the generated string: CT stands for child table, PT for parent, PCOL means primary key column, and FCOL means foreign key column. The template shows a two-column compound key; in reality, Simbex tables contain a varying number of columns in their primary keys.

The generated NOT EXISTS subquery looks for violations of the foreign key constraint, and the call to STRING gathers all the child primary key values into one string that will be used in the final query uploaded to the consolidated database. This gathering of key values into one string is done to simplify handling compound primary keys by treating them as if they consisted of a single column.

The EXECUTE IMMEDIATE statement in Step 3 executes the generated INSERT, with the result being a single-column temporary table called #child_primary_key_column_value_list that is used in the next step. This table is defined in the DECLARE LOCAL TEMPORARY TABLE statement near the top of Listing 10.

Step 3 also uses another custom-written view called rroad_v_foreign_key to get all of the table and column names involved in the RI violation:

CREATE VIEW rroad_v_foreign_key AS
SELECT parent_systable.table_name AS parent_table_name,
   parent_pkey_syscolumn.column_name AS parent_primary_key_column_name,
   parent_pkey_syscolumn.column_id AS parent_primary_key_column_order,
   child_systable.table_name AS child_table_name,
   child_fkey_syscolumn.column_name AS child_foreign_key_column_name,
   sysforeignkey.role AS role_name
FROM SYS.SYSTABLE AS parent_systable
   INNER JOIN SYS.SYSFOREIGNKEY AS sysforeignkey
        ON parent_systable.table_id = sysforeignkey.primary_table_id
   INNER JOIN SYS.SYSTABLE AS child_systable
        ON child_systable.table_id = sysforeignkey.foreign_table_id
   INNER JOIN SYS.SYSFKCOL AS sysfkcol
        ON sysforeignkey.foreign_table_id = sysfkcol.foreign_table_id
  and sysforeignkey.foreign_key_id = sysfkcol.foreign_key_id
   INNER JOIN SYS.SYSCOLUMN AS parent_pkey_syscolumn
        ON parent_systable.table_id = parent_pkey_syscolumn.table_id
     AND sysfkcol.primary_column_id = parent_pkey_syscolumn.column_id
   INNER JOIN SYS.SYSCOLUMN AS child_fkey_syscolumn
        ON child_systable.table_id = child_fkey_syscolumn.table_id
     AND sysfkcol.foreign_column_id = child_fkey_syscolumn.column_id;

Step 4 builds the actual row in the tracking table dbmlsync_ri_violation. It uses the string from Step 2 and the temporary table from Step 3 to build the final query that looks like this template:

SELECT * FROM CT
    WHERE STRING ( CT.CT_PCOL, ', ', CT.CT_PCOL ) = 'xxx'

Figure 5 shows the final result of all the work performed by the procedure. The statements uploaded to the consolidated database that can be used to display the child table rows that caused the referential integrity violations on the remote database.

[Click image to view at full size]
Figure 5: Using dbmlsync_ri_violation to investigate a problem.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.