Synchronizing Schema Changes and Managing Referential Integrity Violations

Synchronization at the database row level, rather than application message level, is very helpful in reducing application complexity


October 25, 2006
URL:http://www.drdobbs.com/mobile/synchronizing-schema-changes-and-managin/193402348

As a research and development company that is continually modifying its data collection and analysis techniques, Simbex faced several database synchronization challenges as the software continued to change and evolve after its deployment into customer environments. In this article, I discuss two of those challenges:

Several features of the SQL Anywhere database and MobiLink synchronization software were used to solve these challenges, including:

The Simbex HIT System

On April 28, 2002, at a NASCAR race in Fontana California, Dale Earnhardt Jr. hit a concrete wall at 130 miles per hour. If he had been wearing a Simbex-equipped helmet, Earnhardt might not have been able to hide the severity of the head injury he received that day. Instead, Earnhardt suffered in secrecy and he did not reveal the full extent of his injury until several months later.

Simbex is a New Hampshire-based research and development company specializing in biomechanical feedback systems. Their premier product is the Head Impact Telemetry System, a real-time hardware and software system designed to measure and record blows to the head. Initially developed for use in college and high-school football, the HIT System lets sideline staff monitor the impact history for all players simultaneously, as well as providing pager alerts whenever a player receives a serious impact or series of impacts.

Impacts are measured by accelerometers in each helmet, visible as the small round knobs in Figure 1.

Figure 1: HIT System hardware mounted in a football helmet.

This data is sent wirelessly to the HIT System sideline controller in Figure 2.

Figure 2: HIT System sideline controller at a Virginia Tech football game.

A laptop computer is used to store, analyze and display the data locally; Figure 3 shows one of the displays where the impact history for one player is represented graphically.

Figure 3: HIT Impact Analysis screen display.

The HIT System sideline computer stores the impact data in a SQL Anywhere relational database system from Sybase's iAnywhere. As well as being analyzed and displayed out on the field, the data is also uploaded to a central SQL Anywhere database at Simbex headquarters for further analysis and study. The upload is handled by the MobiLink synchronization software that ships with SQL Anywhere, and I am proud to say I played a small role in the development of the HIT System by providing the MobiLink synchronization setup.

Synchronizing Schema Changes

Being a true research and development company, Simbex violates one of the simplest rules-of-thumb in database synchronization: If you don't want trouble, don't change the schema. A schema change is hard enough with one database, but with two or 10 or 100 databases in active use, it can be an administrative nightmare. In a research environment, it's also a common occurrence: nothing stays the same for long.

To help reduce the administrative pain, central control of schema changes was built into the HIT System synchronization process. Using a new table called dbmlsync_sql_download in Listing 1 (with "dbmlsync" referring to the MobiLink client component dbmlsync.exe), schema changes can be synchronized out to remote databases along at the same time that changes to the data are synchronized.

CREATE TABLE dbmlsync_sql_download (
   sql_download_id UNSIGNED BIGINT NOT NULL DEFAULT autoincrement,
   db_id INTEGER NOT NULL,
   sql LONG VARCHAR NOT NULL,
   modification_date TIMESTAMP NOT NULL DEFAULT timestamp,
   PRIMARY KEY ( sql_download_id ) );
Listing 1: THE SQL DOWNLOAD TABLE

Rows in dbmlsync_sql_download are inserted into the central or "consolidated" database at Simbex and downloaded to the remote sideline controller databases. The columns are filled as follows:

sql_download_id is automatically incremented to uniquely identify each row, and it also specifies the processing order if two SQL scripts in this table apply to the same remote database.

db_id identifies which remote sideline controller database this SQL script applies to; this column is specific to Simbex, but most MobiLink synchronization setups have a similar "remote database identifier" column.

sql contains the SQL script to be downloaded.

modification_date is automatically set when the SQL script is inserted and updated when the script is changed, and it is used to determine which rows contain fresh SQL that must be downloaded.

A two-step process is used to create rows in dbmlsync_sql_download. First, a text file is coded containing the SQL script, and second, that text file is loaded into a new row in dbmlsync_sql_download.

Listing 2 is an example of a SQL script used to add a new column to a table on the remote database; in this example, the filename is alter_remote_version_v2m.sql.

ALTER TABLE DBA.sbx_activity
   ADD type INTEGER NOT NULL DEFAULT 0;
--)RUN

ALTER PUBLICATION DBA.PC MODIFY
   TABLE DBA.sbx_activity ( activity,
                            description,
                            creation_date,
                            modified_by,
                            created_by,
                            deleted,
                            type );
--)RUN
ALTER SYNCHRONIZATION SUBSCRIPTION TO DBA."PC"
   MODIFY OPTION ScriptVersion='PC_v2m';
--)RUN
Listing 2: A SAMPLE SQL SCRIPT TO BE DOWNLOADED

The first ALTER TABLE command in Listing 2 adds the new "type" column to the end of each row the sbx_activity table, and initializes that column to 0 for all existing rows.

The next two commands are MobiLink overhead; they are necessary to keep the MobiLink synchronization setup to date. The ALTER PUBLICATION command tells MobiLink to include the new type column in the next synchronization. The ALTER SYNCHRONIZATION SUBSCRIPTION command tells MobiLink that a different set of scripts on the consolidated database, identified as version "PC_v2m", must now be used when synchronizing with this remote database.

The special --)RUN comment lines in Listing 2 serve to identify commands that are to be executed separately the remote database; this process will be handled by code shown later in this article.

Testing schema changes ahead of time is extremely important in a synchronizing environment; even the smallest mistake can bring the synchronization process to a halt when the software detects a mismatch or when the database software detects an integrity violation. Gathering separate commands into a single SQL script makes coding and testing a schema change easier, as opposed to storing each command as a separate entry.

Once the SQL script file is created, it is loaded into the consolidated database using the INSERT in Listing 3. In this example, the db_id of 2 identifies which sideline controller database the SQL is going to, and the SQL Anywhere function xp_read_file is called to convert the script file alter_remote_version_v2m.sql into a LONG VARCHAR string.

INSERT dbmlsync_sql_download ( db_id, sql )
   VALUES ( 2, xp_read_file ( 'alter_remote_version_v2m.sql' ) )
Listing 3: LOADING THE SQL SCRIPT INTO THE DOWNLOAD TABLE

Listing 4 shows the download_cursor script that the MobiLink server runs on the consolidated database to determine which rows in dbmlsync_sql_download are to be downloaded to which remote database.

CALL ml_add_table_script ( 'PC_v2', 'dbmlsync_sql_download', 'download_cursor',
'SELECT sql_download_id,
      db_id,
      sql
   FROM dbmlsync_sql_download
WHERE modification_date > ? -- last_download
   AND db_id = CAST ( ? AS INTEGER ) -- ml_username' );
Listing 4: DOWNLOAD_CURSOR SCRIPT FOR THE SQL DOWNLOAD TABLE

The WHERE clause contains two "?" placeholders which are filled in by MobiLink at run time. The first "?" is replaced by the date and time of the previous successful download, and that is used to choose rows for this download that have been inserted or updated since that point.

The second "?" is the "MobiLink user name" which serves to uniquely identify the remote database that is being synchronized. In the case of Simbex, the db_id values 1, 2, 3 are also used as MobiLink user names for simplicity.

Listing 5 shows the procedure that executes the SQL scripts when they reach the remote database. It is a user-written procedure, but because it has been given the special name sp_hook_dbmlsync_schema_upgrade the MobiLink client component dbmlsync.exe will automatically call this procedure as soon as all upload and download processing is complete.

MobiLink recognizes several different sp_hook procedure names like sp_hook_dbmlsync_schema_upgrade, and if you code them they will be called. This one is designed for processing schema changes like our example, and it offers several advantages over a solution you might create to run outside the MobiLink synchronization process:

  1. The sp_hook_dbmlsync_schema_upgrade procedure is called automatically; you don't have to build any special logic on the remote database side to determine when a schema change needs to be applied.
  2. It is OK to ALTER a table involved in synchronization in this procedure without first removing the table from the synchronization definition. If you do it outside this procedure, you have to take that extra step first.
  3. The procedure is called after synchronization is complete so all changes made to the remote have already been uploaded. This is important because you can't ALTER a table involved in synchronization if any updates using the old schema are still waiting to be uploaded.

The procedure in Listing 5 is driven by a cursor fetch loop using the streamlined FOR statement supported by SQL Anywhere. This cursor retrieves all rows in the dbmlsync_sql_download table that have not yet been processed, and that fact is determined by a NOT EXISTS subquery on the dbmlsync_sql_download_ack table.

   CREATE PROCEDURE sp_hook_dbmlsync_schema_upgrade()
   BEGIN
   DECLARE @special_marker VARCHAR ( 100 );
   DECLARE @special_marker_length BIGINT;
   DECLARE @sql_part_pos BIGINT;
   DECLARE @special_marker_pos BIGINT;
   DECLARE @sql_part_length BIGINT;
   DECLARE @sql_part LONG VARCHAR;
   DECLARE @sqlstate VARCHAR ( 5 );
   DECLARE @errormsg VARCHAR ( 32767 );
   DECLARE @sql_part_number UNSIGNED INTEGER;
-- Process one or more rows of downloaded sql that have not been processed before, either in whole or in part.
   FOR f_fetch AS c_fetch NO SCROLL CURSOR FOR
      SELECT sql_download_id AS @sql_download_id,
          db_id     AS @db_id,
          sql       AS @downloaded_sql
      FROM dbmlsync_sql_download
      WHERE NOT EXISTS ( SELECT *
                  FROM dbmlsync_sql_download_ack
                  WHERE dbmlsync_sql_download_ack.sql_download_id
                      = dbmlsync_sql_download.sql_download_id )
      ORDER BY sql_download_id
      FOR READ ONLY
      DO
-- Process each part of @downloaded_sql that is delimited by the special comment marker.
   SET @special_marker = STRING ( '--', ')RUN' );
   SET @special_marker_length = LENGTH ( @special_marker );
   SET @sql_part_number = 1;
   SET @sql_part_pos = 1;
   SET @special_marker_pos = LOCATE ( @downloaded_sql, @special_marker, @sql_part_pos );
   
    WHILE @special_marker_pos > 0 LOOP
    -- 1. Extract the SQL part from the full string.
      SET @sql_part_length = @special_marker_pos - @sql_part_pos + @special_marker_length;
      SET @sql_part = SUBSTR ( @downloaded_sql, @sql_part_pos, @sql_part_length );
    -- 2. Execute the SQL part in a "try/catch" block
         BEGIN
            SET @sqlstate = '00000'; -- no error yet
            SET @errormsg = '';
            EXECUTE IMMEDIATE @sql_part;
            EXCEPTION
              WHEN OTHERS THEN

              -- Record the error but don't stop.
              SELECT SQLSTATE, ERRORMSG() INTO @sqlstate, @errormsg;
         END;
    -- 3. Record the result.
         INSERT dbmlsync_sql_download_ack VALUES (
            @sql_download_id, @sql_part_number, @db_id,
            @sql_part, @sqlstate, @errormsg, DEFAULT );
    -- 4. Look for the next SQL part.
            SET @sql_part_number = @sql_part_number + 1;
            SET @sql_part_pos = @special_marker_pos + @special_marker_length;
            SET @special_marker_pos = LOCATE ( @downloaded_sql, @special_marker, @sql_part_pos );
         END LOOP;
END FOR;
END; -- sp_hook_dbmlsync_schema_upgrade
Listing 5: "HOOK" PROCEDURE TO RUN THE SQL SCRIPT

Inside the cursor fetch FOR loop in Listing 5, a nested WHILE loop breaks the SQL script into the individual parts separated by the special RUN comment marker, and inside that WHILE loop each SQL part is processed step-by-step:

Step 1 extracts the SQL part from the larger string.

Step 2 uses EXECUTE IMMEDIATE to run the individual SQL part against the remote database. The EXCEPTION clause implements "try/catch" processing for the EXECUTE IMMEDIATE: any error is recorded in the two local variables @sqlstate and @sqlcode, but processing then continues with no further diagnostics or error handling.

Step 3 records everything about what happened with the EXECUTE IMMEDIATE in the dbmlsync_sql_download_ack table; rows in this table are uploaded to the consolidated database as "acknowledgements" of the various SQL commands that were received and processed. LOCATE searches forward in a string for the first occurrence of another string after a specified start position, returning zero when that substring is not found.

Listing 6 shows the dbmlsync_sql_download_ack table that is used in Step 3 described above.

CREATE TABLE dbmlsync_sql_download_ack (
   sql_download_id UNSIGNED BIGINT NOT NULL,
   sql_part_number UNSIGNED INT NOT NULL,
   db_id INTEGER NOT NULL,
   sql_part LONG VARCHAR NOT NULL,
   sql_part_sqlstate VARCHAR ( 5 ) NOT NULL,
   sql_part_errormsg VARCHAR ( 32767 ) NOT NULL DEFAULT '',
   creation_date TIMESTAMP NULL DEFAULT current timestamp,
   PRIMARY KEY ( sql_download_id, sql_part_number ) );
Listing 6: THE SQL DOWNLOAD ACKNOWLEDGEMENT TABLE

The dbmlsync_sql_download_ack table contains the sql_download_id and db_id values from the corresponding rows in dbmlsync_sql_download, plus these extra columns:

Listing 7 shows the upload_insert script that the MobiLink server runs on the consolidated database whenever a new row in dbmlsync_sql_download_ack is uploaded. This upload is a very important part of the schema change process because it allows changes to be tracked and checked centrally, with no DBA present at the remote location.

CALL ml_add_table_script ( 'PC_v2', 'dbmlsync_sql_download_ack', 'upload_insert',
'INSERT dbmlsync_sql_download_ack (
   sql_download_id,
   sql_part_number,
   db_id,
   sql_part,
   sql_part_sqlstate,
   sql_part_errormsg,
   creation_date )
VALUES ( ?, ?, ?, ?, ?, ?, ? )' );
Listing 7: UPLOAD_INSERT SCRIPT FOR THE ACKNOWLEDGMENT TABLE

Figure 4 shows what the dbmlsync_sql_download_ack table looks like on the consolidated database after the SQL script in Listing 8 has been processed on a remote database and the acknowledgement rows uploaded in return. In this case everything worked, no error messages.

[Click image to view at full size]
Figure 4: Confirming the changes with dbmlsync_sql_download_ack.

Dealing With Referential Integrity Violations

One of the consequences of making schema and other changes in an evolving research and development environment is the inevitable introduction of program and design errors, including referential integrity violations that occur only on the remote database. This can happen because the remote database holds a subset of the rows that exist on the consolidated; a critical row may be present on the consolidated but missing on the remote, resulting in a referential integrity or foreign key violation.

The MobiLink synchronization software is designed to be robust and fault tolerant in practice, and to be easy to administer without DBAs at remote sites. One of results of this design goal is that MobiLink, by default, silently handles referential integrity violations that occur on the remote database while data is being downloaded. It does this by deleting any rows that get in the way of referential integrity.

Listing 8 shows an example of this behavior. The "verbose diagnostic logging" option has been turned on to reveal that two referential integrity violations were detected on the remote database during the download, and that the offending rows in sbx_data_access and sbx_beeper were deleted.

I. 2006-07-30 11:16:08.
   Resolving referential integrity violations on table
      sbx_data_access, role FK_SBX_DATA_REF_SBX_CONT
I. 2006-07-30 11:16:08.
      delete from "DBA"."sbx_data_access" from "DBA"."sbx_data_access" ft
      where not exists ( select 1 from "DBA"."sbx_contact" pt
      where ( ft."contact_db_id" = pt."contact_db_id" )
      and ( ft."contact_id" = pt."contact_id" ) )
      and ( ft."contact_db_id" is not null
      and ft."contact_id" is not null )
I. 2006-07-30 11:16:08.
      1 rows deleted.
I. 2006-07-30 11:16:08.
   Resolving referential integrity violations on table
      sbx_beeper, role FK_SBX_BEEP_REF_SBX_FIEL
I. 2006-07-30 11:16:09.
      delete from "DBA"."sbx_beeper" from "DBA"."sbx_beeper" ft
      where not exists ( select 1 from "DBA"."sbx_field_case" pt
      where ( ft."field_case_id" = pt."field_case_id" ) )
      and ( ft."field_case_id" is not null )
I. 2006-07-30 11:16:09.

   1 rows deleted.
Listing 8: MOBILINK RESOLVES RI VIOLATIONS ON THE REMOTE DATABASE

Silently handling RI violations isn't always desirable. In fact, even logging the problem at the remote site might not be good enough when administration is being carried out centrally at the consolidated database. At Simbex, this challenge was met by implementing the RI violation tracking table called dbmlsync_ri_violation in Lisitng 9. MobiLink was allowed to continue resolving the violations, but the synchronization process was customized to record each violation in this tracking table and to upload that information to the consolidated database.

CREATE TABLE dbmlsync_ri_violation (
   ml_username VARCHAR ( 128 ) NOT NULL,
   happened_at TIMESTAMP NOT NULL DEFAULT timestamp,
   unique_id UNSIGNED BIGINT NOT NULL
   DEFAULT global autoincrement(1000000),
   publication_name VARCHAR ( 128 ) NOT NULL,
   foreign_key_table_name VARCHAR ( 128 ) NOT NULL,
   primary_key_table_name VARCHAR ( 128 ) NOT NULL,
   role_name VARCHAR ( 128 ) NOT NULL,
   script_version VARCHAR ( 128 ) NOT NULL,
   query_to_run_on_consolidated_database LONG VARCHAR NOT NULL,
   PRIMARY KEY ( ml_username, happened_at, unique_id ) );
Listing 9: THE RI VIOLATION TRACKING TABLE

Rows in this tracking table answer two important questions: Which RI constraints were violated, and which rows were the troublemakers? The first question is answered by the foreign_key_table_name, primary_key_table_name, and role_name columns. The second (and often most important) question is answered by the query_to_run_on_consolidated_database column which contains actual SELECT statements that can be run on the consolidated database to display the rows that caused problems on the remote database.

For the violations shown earlier in Listing 8, the values in query_to_run_on_consolidated_database look like this:

SELECT * FROM sbx_data_access
   WHERE STRING ( sbx_data_access.data_access_db_id, ', ', sbx_data_access.data_access_id ) = '2, 17'
SELECT * FROM sbx_beeper
   WHERE STRING ( sbx_beeper.beeper_id ) = '1'

Here are the other columns in the tracking table:

ml_username identifies the remote database.
happened_at is the date/time of the RI violation.
unique_id serves to guarantee uniqueness for the tracking table's primary key.
publication_name identifies which MobiLink publication is involved.
foreign_key_table_name names the child table.
primary_key_table_name names the parent.
role_name is the name of the foreign key constraint.
script_version identifies which MobiLink script version is in use.

Rows are inserted in the tracking table by another "hook" procedure, called sp_hook_dbmlsync_download_log_ri_violation, in Listing 10. When a procedure with this name exists in the remote database, the MobiLink client component dbmlsync.exe calls it whenever an RI violation is detected.

   CREATE PROCEDURE sp_hook_dbmlsync_download_log_ri_violation()
   BEGIN
   DECLARE @child_table_name VARCHAR ( 128 );
   DECLARE @parent_table_name VARCHAR ( 128 );
   DECLARE @role_name VARCHAR ( 128 );
   DECLARE @sql LONG VARCHAR;
   DECLARE @child_primary_key_column_name_list LONG VARCHAR;
   DECLARE LOCAL TEMPORARY TABLE #child_primary_key_column_value_list (
      child_primary_key_column_value_list LONG VARCHAR );
-- 1. Determine the offending foreign key relationship.
   SET @child_table_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Foreign key table' );
   SET @parent_table_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Primary key table' );
   SET @role_name =
      ( SELECT value FROM #hook_dict WHERE name = 'Role name' );
-- 2. Build the child primary key column name list: CT.
   CT_PCOL, ', ', CT.CT_PCOL,
   SELECT LIST ( STRING ( @child_table_name,
             '.',
             child_primary_key_column_name ),
      ', '', '', ' ORDER BY child_primary_key_column_order )
   INTO @child_primary_key_column_name_list
   FROM rroad_v_child_primary_key
WHERE parent_table_name = @parent_table_name
   AND child_table_name = @child_table_name
   AND role_name = @role_name;
-- 3. Build and run the INSERT SELECT to get one or more offending child
   -- primary key column value strings into the temporary table
   -- #child_primary_key_column_value_list.
   SELECT STRING ( 'INSERT #child_primary_key_column_value_list SELECT STRING ( ',
      @child_primary_key_column_name_list,

      ' ) \x0d\x0a FROM ',
      @child_table_name,
      ' WHERE NOT EXISTS ( SELECT * FROM ',
      @parent_table_name,
      '\x0d\x0a WHERE ',
         LIST ( STRING ( @parent_table_name,
         '.',
             parent_primary_key_column_name,
             ' = ',
             @child_table_name,
             '.',
             child_foreign_key_column_name ),
         ' AND ' ORDER BY parent_primary_key_column_order ),
       '\x0d\x0a AND ',
       LIST ( STRING ( @child_table_name,
            '.',
            child_foreign_key_column_name,
            ' IS NOT NULL ' ),
       'AND ' ORDER BY parent_primary_key_column_order ),
       ' ) ' )
     INTO @sql
     FROM rroad_v_foreign_key
   WHERE parent_table_name = @parent_table_name
     AND child_table_name = @child_table_name
     AND role_name = @role_name
   GROUP BY parent_table_name,
        child_table_name,
        role_name;
   EXECUTE IMMEDIATE @sql;
-- 4. Insert one row in dbmlsync_ri_violation for each offending child row.
INSERT dbmlsync_ri_violation (
   ml_username,
   publication_name,
   foreign_key_table_name,
   primary_key_table_name,
   role_name,
   script_version,
   query_to_run_on_consolidated_database )
SELECT ( SELECT value FROM #hook_dict WHERE name = 'MobiLink user' ),
   ( SELECT value FROM #hook_dict WHERE name = 'publication_0' ),
   @child_table_name,
   @parent_table_name,
   @role_name,
   ( SELECT value FROM #hook_dict WHERE name = 'script version' ),
   STRING ( 'SELECT * FROM ',
      @child_table_name,
      ' WHERE STRING ( ',
      @child_primary_key_column_name_list,
      ' ) = ''',
      REPLACE ( child_primary_key_column_value_list, '''', '''''' ),
      '''' )
   FROM #child_primary_key_column_value_list;
END; -- sp_hook_dbmlsync_download_log_ri_violation

Listing 10: "HOOK" PROCEDURE TO RECORD RI VIOLATIONS

The procedure in Listing 10 is constructed to be schema-independent; in other words, it does not depend on any special knowledge of the Simbex database design, so it should not be affected by changes to that design. The processing is done step-by-step:

Step 1 handles that fact that MobiLink passes parameters to the hook procedures indirectly, via a special temporary table called #hook_dict. Rows in this table contain parameter name and value pairs, with the parameter names depending on which hook procedure is being executed. In this case, the parameters of interest are the parent and child table names and the foreign key role or constraint name involved in the RI violation.

Step 2 uses the LIST aggregate function and the STRING scalar function to construct a specially-formatted string of primary key column names from the child table, for use in later processing. The LIST function works like other aggregate function such as SUM in that it processes a group of rows to return a single value. Unlike other aggregate functions, however, LIST preserves information gathered from the individual rows, and returns that information in a list. Here is the syntax: LIST ( element, separator ORDER BY order_by ) where one or more element expressions are gathered into an ordered list with separators between each element.

The STRING function converts each argument to a string and concatenates them all into one string return value. This function is extremely valuable for building strings in SQL, especially strings that contain SQL commands that are to be passed to EXECUTE IMMEDIATE. In this case STRING is used to build each LIST element as a child primary key column name dot-qualified with the table name.

Step 2 also uses a custom-written view called rroad_v_child_primary to get the child primary key column names involved in the RI violation by selecting information from the SQL Anywhere system catalog tables:

CREATE VIEW rroad_v_child_primary_key AS
SELECT parent_systable.table_name AS parent_table_name,
   child_systable.table_name AS child_table_name,
   child_pkey_syscolumn.column_name AS child_primary_key_column_name,
   child_pkey_syscolumn.column_id AS child_primary_key_column_order,
   sysforeignkey.role AS role_name
FROM SYS.SYSTABLE AS parent_systable
       INNER JOIN SYS.SYSFOREIGNKEY AS sysforeignkey
          ON parent_systable.table_id = sysforeignkey.primary_table_id
     INNER JOIN SYS.SYSTABLE AS child_systable
         ON child_systable.table_id = sysforeignkey.foreign_table_id
     INNER JOIN SYS.SYSCOLUMN AS child_pkey_syscolumn
        ON child_systable.table_id = child_pkey_syscolumn.table_id
WHERE child_pkey_syscolumn.pkey = 'Y';

Step 3 uses the string from Step 2, plus more calls to LIST and STRING, to fill the string variable @sql with an INSERT statement that looks like this template:

INSERT #child_primary_key_column_value_list
SELECT STRING ( CT.CT_PCOL, ', ', CT.CT_PCOL )
FROM CT
WHERE NOT EXISTS ( SELECT *
   FROM PT
WHERE PT.PT_PCOL = CT.CT_FCOL
         AND PT.PT_PCOL = CT.CT_FCOL
         AND CT.CT_FCOL IS NOT NULL
         AND CT.CT_FCOL IS NOT NULL );

The template above contains some abbreviations to represent actual values in the generated string: CT stands for child table, PT for parent, PCOL means primary key column, and FCOL means foreign key column. The template shows a two-column compound key; in reality, Simbex tables contain a varying number of columns in their primary keys.

The generated NOT EXISTS subquery looks for violations of the foreign key constraint, and the call to STRING gathers all the child primary key values into one string that will be used in the final query uploaded to the consolidated database. This gathering of key values into one string is done to simplify handling compound primary keys by treating them as if they consisted of a single column.

The EXECUTE IMMEDIATE statement in Step 3 executes the generated INSERT, with the result being a single-column temporary table called #child_primary_key_column_value_list that is used in the next step. This table is defined in the DECLARE LOCAL TEMPORARY TABLE statement near the top of Listing 10.

Step 3 also uses another custom-written view called rroad_v_foreign_key to get all of the table and column names involved in the RI violation:

CREATE VIEW rroad_v_foreign_key AS
SELECT parent_systable.table_name AS parent_table_name,
   parent_pkey_syscolumn.column_name AS parent_primary_key_column_name,
   parent_pkey_syscolumn.column_id AS parent_primary_key_column_order,
   child_systable.table_name AS child_table_name,
   child_fkey_syscolumn.column_name AS child_foreign_key_column_name,
   sysforeignkey.role AS role_name
FROM SYS.SYSTABLE AS parent_systable
   INNER JOIN SYS.SYSFOREIGNKEY AS sysforeignkey
        ON parent_systable.table_id = sysforeignkey.primary_table_id
   INNER JOIN SYS.SYSTABLE AS child_systable
        ON child_systable.table_id = sysforeignkey.foreign_table_id
   INNER JOIN SYS.SYSFKCOL AS sysfkcol
        ON sysforeignkey.foreign_table_id = sysfkcol.foreign_table_id
  and sysforeignkey.foreign_key_id = sysfkcol.foreign_key_id
   INNER JOIN SYS.SYSCOLUMN AS parent_pkey_syscolumn
        ON parent_systable.table_id = parent_pkey_syscolumn.table_id
     AND sysfkcol.primary_column_id = parent_pkey_syscolumn.column_id
   INNER JOIN SYS.SYSCOLUMN AS child_fkey_syscolumn
        ON child_systable.table_id = child_fkey_syscolumn.table_id
     AND sysfkcol.foreign_column_id = child_fkey_syscolumn.column_id;

Step 4 builds the actual row in the tracking table dbmlsync_ri_violation. It uses the string from Step 2 and the temporary table from Step 3 to build the final query that looks like this template:

SELECT * FROM CT
    WHERE STRING ( CT.CT_PCOL, ', ', CT.CT_PCOL ) = 'xxx'

Figure 5 shows the final result of all the work performed by the procedure. The statements uploaded to the consolidated database that can be used to display the child table rows that caused the referential integrity violations on the remote database.

[Click image to view at full size]
Figure 5: Using dbmlsync_ri_violation to investigate a problem.

Conclusion

My experience on the Simbex HIT System project has reinforced my conviction that central administration of the database synchronization process is critically important, simply because there will never be enough database administrators for all the remote sites where databases are deployed. This will remain true even though the HIT System is emerging from its R&D origins as a commercial product, and is being marketed by sports equipment giant Riddell as the "Riddell Sideline Response System".

Synchronization at the database row level, rather than application message level, is very helpful in reducing application complexity and the errors that result. Nevertheless, stuff happens, even in production, and features which support centralized control over diagnosis and repair are invaluable.


Breck Carter is principal consultant at RisingRoad Professional Services, providing consulting and support for SQL Anywhere databases and MobiLink synchronization with Oracle, DB2, SQL Server, and SQL Anywhere. He is author of SQL Anywhere Studio 9 Developer's Guide, now available in English, Japanese and Chinese. Breck can be reached at [email protected].

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.