Dan, an engineer with InterSystems, previously worked at the Laboratory of Computer Science at Massachusetts General Hospital, the birthplace of the M language.
As the modern day successor to MUMPS, M is an ANSI-standard application-development environment incorporating a programming and data-manipulation language and integrated database. M was traditionally used to complement SQL by providing an extra dimension of data-access control that can prove especially useful in select, high-volume, transaction-processing applications. M's direct-access approach to data lets you exert greater control over performance issues, thereby letting you manage update concurrency at a finer level of granularity. For example, M database structures can be designed to support field-level locking and the ability to frame a transaction (TSTART ... TCOMMIT) to include only those database structures involved in processing the current transaction. You can also control the physical proximity of related data, such as grouping header and line-item data together in processing invoice or order data.
Conversely, SQL abstracts data structures from the programmer, so that the function of database administration (DBA) is more centralized. By optimizing the physical data model, the database administrator strives to optimize performance across a breadth of programs. This philosophical center-of-control difference is one key reason why M is more commonly used by application developers than by end-user organizations and why M applications exhibit very good performance, operational, and cost-of-ownership characteristics. Whether deployed by an application developer or end-user organization, performance is, to a large degree, a function of direct program control and M system processes--not DBA activity. DBA functions still exist but are not dominated by database-tuning activities.
M: Past and Present
Starting in the mid-1970s, MUMPS gained widespread use in health-care applications. M evolved from the need for mainframe-like transaction-processing applications for autonomous departmental computing. M is an ANSI/FIPS/ISO/JIS (Japan Industrial Standard) applications environment and is accepted by the ANSI SQL Standards committee as a host language for embedded SQL (ESQL). M integrates a programming language and database into a single application-development environment. It is common to access the same data with both SQL and M. Indeed, M and SQL data accesses can even be mixed within the same programs. As a result of the integrated programming/database standard, M-based programs are highly portable across operating platforms. M is most commonly used on UNIX and Windows, but it can also be found in DOS, VMS, and other platforms. At the same time, M is an environment particularly suited to high-volume, online transaction-processing applications.
M database structures are implemented as highly optimized B-tree structures on disk. To the application programmer, the database presents itself as user-designed, multidimensional arrays ("globals" in M parlance). All database designs (hierarchical, network, relational, and so on) are supported, and the design is under the control of the application developer. The ANSI/MDC X11.1-1990 specification of the M language component describes 21 commands, 17 intrinsic functions, and 7 intrinsic variables and reserves a "namespace" for vendor-specific extensions to the ANSI specifications. (The next M ANSI standard specification is now in the formal acceptance stage.) Additionally, there is a full complement of unary, arithmetic binary, numeric- and string-relational, pattern-matching, and concatenation operators. The ANSI standard also describes extrinsic functions and user-authored variables. M data typing is variant. A type is applied based on the context in which the data is referenced. Both private routine and database structures are dynamically allocated, supporting both scalar and array structures. All structures are of variable length. Arrays support numeric and string subscripting, and allocation of these structures is sparse--only those nodes defined by application logic exist.
A complete description of M is beyond the scope of this article. However, Table 1 presents a summary of ANSI M's operators, commands, and intrinsic functions and variables. (You can obtain a complete copy of the ANSI specification by contacting the M Technology Association.)
M is implemented by a variety of commercial vendors as well as academia. Many facilities are associated with each implementation of M, including networking (two- and three-tier client/server and peer-to-peer), system-management facilities (bulletproof database, concurrent read/ write incremental backup), application libraries and generators, debuggers, relational/SQL implementations describing M databases (supporting such APIs as the ODBC), and a host of evolving technologies (object orientation, for instance).
M Benefits
The major benefit of M for the programmer is increased direct control over data structures. In M, you can define data in terms of multidimensional arrays that map to an intended transaction process. It is easy to represent data in a hierarchical, network, or relational model. Because M provides direct control over logical data structures, a "flat relational structure" can be implemented. In any case, the M relational data dictionary provides a superimposed relational view of this same data. In this regard, M is identical to most relational DBMS products. After describing a developer-designed M data structure in the Open M relational data dictionary, the application developer will see in the M environment all characteristics expected of relational DBMS products.
The relational model is optimized for change. It excels at serving unanticipated information requirements, such as the ad hoc query. It is ideal for decision support systems (DSSs), executive information systems (EISs), and data-warehousing applications. However, the relational model can present significant challenges to high-volume transaction processing of complex data structures. M gives you an alternative, one that increases programmer control over key performance considerations such as concurrency, dependency, and data proximity.
Extending Visual Basic
To contrast SQL with M in a complex client/server transaction environment, consider a banking database containing banking products (checking, loans, mortgages, safe deposit, savings, credit cards, and investments) and customers (who individually or jointly may use several of the bank's products). Each product has multiple accounts. Each account may have multiple owners and many transactions a month. There are complex relationships between products, customers, transactions, and accounts. Aggregate transaction rates and data volumes can be very high, and security and accuracy are critical.
Representing an integrated banking model as a set of database tables is not trivial. Achieving acceptable performance can be even more daunting--both for online processing, where customers conduct transactions at a teller or an ATM, and for overnight batch runs, where voluminous data must be processed in a narrow time window. Figure 1 is a metadescription of the tables involved in the scenarios demonstrating Open M and SQL.
Listing One is a portion of a Visual Basic front-end application. The code shows the typical method of using SQL statements to retrieve data. Listing Two is the same program using M. As the user enters characters into a combo box, customer names starting with those characters populate the combo box. During the Form_Load event, Listing One retrieves all rows into the combo box. By setting the Sort property of the combo box to True, Visual Basic automatically sorts the list of names. SQL provides a perfectly acceptable scripting mechanism for this, and performance should not be an issue as long as the number of rows being retrieved is reasonable for the application configuration (available memory, network bandwidth, server performance, and so on). As the number of rows grows larger, however, it becomes more difficult to get the database to load in one shot, and repetitive queries against the server are required.
Note in Listing Two that having the data structures under the control of the application developer allows logical partitioning of data to suit the application. This also permits clustering of tables to any depth by using multidimensional array characteristic of M globals. The number and performance of JOINS are a function of the data structure. Inherent'Natural' ASCII collation of data in Open M globals (for English-speaking markets; other collation schemes are available for other markets) provides ordering of rows, and newly inserted data is automatically collated. This also allows for finer granularity in locking, transaction-processing control, and distribution of data. This improves database throughput by, for example, allowing data to serve concurrently as both a primary and index key.
I've taken liberties with the M code in Listing Two to assist those unfamiliar with M programming. The code is logically and syntactically correct but deoptimized for comprehensibility. Many commands and programming techniques that could have been used for this exercise were avoided for this reason.
Table 2 provides a quick description of the M commands used in Listing Two, while Table 3 describes the data structures (restated in Open M globals). In this database structure, the numeric and string-collation characteristics of Open M globals facilitate access to the upright data through its primary key or through a foreign key. The structures used here are simple: Each record is described as a collection of columns whose names are self descriptive. In this simple structure, data for a record is treated as a string with a caret (^) delimiting each column. The design of the global structure is analogous to defining a table in another DBMS environment. Listing Two is actually implemented in InterSystems' Visual-M, in which the Open M language and database are directly accessible to Visual Basic. With Visual-M, M and Basic are colleague languages. VB controls are registered with M and are programmatically introduced by an underscore character.
When to Use M
You can access the same data with either M or SQL. M provides both the "physical layer" and its own "intermediate logical representation layer." M also efficiently implements SQL tables, which are considered the "logical layer." Therefore, you are not forced to make a systemic choice between access alternatives.
Although, relative to SQL, M provides access that appears closer to the physical layer, data in M is still accessed at a very high level of programming--far above the level of disk pages, B-trees, and the like. M data is accessed using multidimensional arrays that provide associative memory and flexible, convenient data organization, where memory and disk allocation are dynamic and automatic.
SQL's strengths clearly lie in data analysis, decision support, and ad hoc queries. For very complex data structures and demanding transaction-processing requirements (high volumes, large databases, or critical response times), M's level of control provides generally better performance.
Many programming requirements will lie between the two polarities for which SQL and M are each optimized. In this middle ground, a wide variety of factors affect the choice.
Programmer preference is one obvious factor--experienced SQL programmers may prefer to stick to what they know best. On the other hand, data complexity and performance demands may lead you to choose M over SQL as the data-manipulation language. Because they are interchangeable, both M and SQL can be tried, and empirical evidence (user-specific benchmarks) can guide the final decision.
For many organizations, the interoperability and portability of M's development environment are significant. Standard SQL is available from numerous vendor sources, many of whom emphasize their own proprietary client/server program development environment to build SQL systems. In contrast, client/server architecture is embedded within the ANSI M language itself, which avoids proprietary programming requirements. Additionally, certain M vendors have embraced emerging de facto client/server standards, especially Visual Basic, as key development platforms. M extensions to Visual Basic can add a high-performance, transaction-processing dimension to this widely used GUI system-building tool. The growing market for commercially available Visual Basic custom controls adds acquired component software to the program-development arsenal.
Conclusion
M's historical focus on transaction processing with complex data in distributed implementations has lead to the development of several key characteristics. First, the programmer is given more direct control of data structures in which to optimize performance. Second, data can be organized to optimally suit program-performance considerations. And, third, requirements for systems and database administration, a key attribute in distributed systems, are minimized compared to traditional DBMS systems.
For More Information
M Technology Association
1738 Elton Road, Suite 205
Silver Spring, MD 20903-1725
301-431-4070
Table 1: Summary of ANSI M: (a) selected ANSI M commands; (b) selected M intrinsic functions; (c) selected intrinsic variables; (d) operators.
Command Description <b>(a)</b> Break Programmatically interrupt routine execution. Close Deallocate and release ownership of a device (terminal, file, tape, and so on). Do Inititate execution of a program, subroutine, extrinsic function. Else Introduce lines of code to execute if IF condition is not true. For Initiate looping. Goto Generalize transfer of control. Halt Terminate process. Hang Suspend for a specified period. If Introduce condition which, if true, causes rest of line to be run. Job Programmatically spawn another M process. Kill Remove data structure (scalar or array). Lock Programmatically interlock M structures among M processes. New Initiate variable scoping. Open Obtain ownership of a device. Quit Terminate execution of routines, subroutines, and For loops; unscope variables scoped by New, and so on. Read Take input from the current device (terminal, tape, file). Set Assign value to variable. Use Specify a device as the current device. View Peek into memory. Write Direct output to the current device. Xecute Execute the results of expression evaluation as M code. <b>(b)</b> $Ascii Return the integer ASCII value of the character argument. $Character Return the ASCII character of the integer argument. $Data Return integer values denoting the argument's state of definition. $Extract Positionally assign or extract a substring from a string. $Find Locate the position of a substring within a string. $FNumber Apply format mask to data. $Get Return NULL or data depending on argument's existence. $Justify Apply justification mask to argument. $Length Return character length of argument. $Order Return the next subscript value at the current subscript level of the argument. $Piece Extract or assign a delimited substring from/to a string. $Query Return the next full local- or global-array reference. $Random Return a random number. $Select Return the evaluated expression of the first of many conditions that evaluates to true. $Text Return the source code of a program at the reference that is the argument of the function. $TRanslate Translate positionally one character for another in a string. $View Peek into memory. <b>(c)</b> $Horolog System date and time. $Io Current device. $Job Current process identifier. $Storage Amount of private process memory available to the M process. $Test Contains truth value of last executed condition and other operations. $X & $Y Return the current x- and y-coordinates on a cursor-positionable device with some restrictions. <b>(d)</b> Arithmetic +, -, /, \ (integer divide), # (modulo), *, ** (exponentiation) Unary +, -, '(not) Relational =, >, <, ] (follows), [ (contains), ]] (sorts after) Pattern Match ?<argument> Logical & (and), ! (or) Concatenation _____LINEEND____
Table 2: M commands used in Listing Two.
Command Function Semicolon (;) Introduces a comment. SET M's assignment command. $$upper Extrinsic function to convert characters to upper case. $LENGTH M intrinsic function returning string length. FOR Looping structure. $ORDER Traverses a specified branch of the B-tree presented logically as the M global structure until programmatically terminated. QUIT Terminates the For loop and the DO command. DO Calls an external routine, executes a subroutine or extrinsic function, and executes a "structured" block of code introduced by one or more periods denoting the nesting level of the block. $PIECE Intrinsic function which assists in extracting or assigning data from/to a specified delimited substring of the argument string. In this example, a caret is used as the delimiter: The first ^ piece of ^people("owners",pkey) is "name." $GET Intrinsic function that tests for the existence of the argument and returns either the data (exists), if true, or NULL (""), if not. $EXTRACT Extracts characters from a string from position N through optional position M. $ASCII Returns the ASCII value of a character position in the argument string. $CHARACTER Returns the character for the ASCII value.
Table 3: Data structures (restated in Open M globals). (a) People 'Table'; (b) Account 'Table'; (c) Service_Type 'Table'.
<b>(a)</b> ^people("owners",pkey)=name^date_of_birth^social_security_number ^people("by_name",<uppercase_name>,pkey)=<null> (index) ^people("owners",pkey,"accts",account_number)=<null> (index) <b>(b)</b> ^accounts(account_number)=service_type^balance^open_date^close_date ^accounts(account_number,"owners",pkey)=<null> <b>(c)</b> ^srvtypes(type_number)=type_name
Figure 1: Metadescription of the tables used in a typical banking application.
Tables: People Fields: pkey (number,10,unique) name (text,25) date_of_birth (date,8) social_security_number (text,11) Primary Key: pkey Index: name (uppercase transformation applied) Accounts Fields: account_number (numeric, 10, unique) service_type (foreign key -> Service_Types) balance (float,10,2) open_date (date,8) close_date (date,8) Primary Key; account_number Acct_Owners Fields: account_number (foreign key -> Accounts) owner_id (foreign key -> People) Primary Key: account_number, owner_id Index: owner_id Service_Types Fields: type_number (number, 10, unique) type_name (text, 25) Primary Key: type_number
Listing One
<Begin Combo1 KeyPress Event> ' ============================================================ ' Bind to database via DSN in data control ' ============================================================ server$="ODBC;DSN=Server;UID=Username;PWD=Password" Data1.Connect = server$ ' ============================================================ ' Set up query to find all rows where name 'starts with' character ' input by user perhaps under the KeyPress event for Combo1 ' ============================================================ query$="Select Acct_Owners.Owner_ID, Acct_Owners.Account_Number," query$=query$ & " People.Name,Service_Types.Type_Name query$=query$ & " From Acct_Owners, People, Accounts, Service_Types" query$=query$ & " Where People.Name Like '" & ucase$(combo1.text) & "%'" query$=query$ & " And People.Pkey = Acct_Owners.Owner_Id" query$=query$ & " And Account.Account_Number=Acct_Owners.Account_Number" query$=query$ & " And Service_Types.Type_Number = Account.Service_Type" query$=query$ & " Order by 3, 2" ' ============================================================ ' Go to server to get rows ' ============================================================ Data1.Recordsource=query$ Data1.Refresh ' ============================================================ ' Populate Combo Box ' ============================================================ Combo1.Clear Do while not Data1.Recordset.EOF name$=Data1.Recordset.Name account=Data1.Recordset.Account srvtype$=Data1.RecordSet.Type_Name pkey=Data1.Recordset.Primary_Key Combo1.Additem name$ & " " & account & " " & srvtype$ Combo1.ItemData(Combo1.NewIndex)= pkey Data1.Recordset.MoveNext Loop ' ============================================================ ' If the text area of the combo box is touched with new input, reexecute query ' and do loop to recover rows from the database which satisfy the new input. ' If the end-user has double-clicked an item; issue new query to recover all ' elements of row selected for presentation/manipulation by end-user ' ============================================================ <End Combo1 KeyPress Event> <Begin Combo1 Double-Click Event> ' ============================================================ ' detect which item in the combo box was selected ' ============================================================ selected=Combo1.ListIndex ' ============================================================ ' get the row id in the people table for the item selected ' ============================================================ pkey=Combo1.ItemData(selected) ' ============================================================ ' recover the info in the combo box to reuse it. piece is a user-written Basic ' module which returns the portion of a string (1st arg) found between the ' instances (specified by the 3rd and 4th args) of the delimiter-string ' (2nd arg) ' ============================================================ x$=Combo1.List(selected) name$=piece(x," ",1,1) account=piece(x," ",2,2) srvtype$=piece(x," ",3,3) ' ============================================================ ' issue a query to recover all information to be displayed on the form ' ============================================================ query$="Select People.Date_of_Birth, People.Social_Security_Number," query$=query$ & " Accounts.Open_Date, Accounts.Close_Date, Accounts.Balance query$=query$ & " From People, Accounts query$=query$ & " Where People.Pkey = " & pkey query$=query$ & " And Accounts.Account_Number = " & account Data1.RecordSource=query$ Data1.Refresh ' ============================================================ ' assign recovered information to appropriate controls on the form ' ============================================================ Pname.Text=name$ Pdob.Text=Data1.RecordSet.Date_of_Birth Pssn.Text=Date1.RecordSet.Social_Security_Number Pacct_num.Text=account Pacct_odate.Text= Data1.RecordSet.Open_Date Pacct_cdate.Text= Data1.RecordSet.Close_Date Pacct_balance.Text=Data1.RecordSet.Balance Pacct_srv_type.Text=srvtype$ <End Combo1 Double-Click Event>
Listing Two
; ============================================================ ; get user input from combo box perhaps under the KeyPress event, transform ; to uppercase and get Length ; $$upper is an extrinsic funtion which performs the work of UCASE$ ; in Listing 1 ; ============================================================ set input=$$upper(_combo1.text) set inlng=$length(input) ; ============================================================ ; initialize scanning variable to start at the string collating just before ; user input, e.g., if combo1.text="smith", then input="SMITH" and ; name="SMITG~" after the following code executes. It sets up to perform ; the function of the LIKE operator in Listing One's SQL SELECT ; ============================================================ set name=input set name=$extract(name,1,inlng-1)_$character($ascii(name,inlng)-1)_"~" ; ============================================================ ; find all names in the data structure which start with characters input. ; In Direct Database Access, this code actually does the SELECT, WHERE, ; and ORDER BY work described in Listing One by using Open M's ascii ; collation, easily clustered tables, and $order function ; ============================================================ for set name=$order(^people("by_name",name)) quit:$extract(name,1,inplng)'=input set pkey="" do . For set pkey=$order(^people("by_name",name,pkey)) quit:pkey="" set account="" do . . for set account=$order(^people("by_name",name,pkey,"accts",account)) quit:account="" do . . . set name=$piece(^people("owners",pkey),"^") . . . set acctype=$p($get(^accounts(account),"^")) . . . set srvtype=^srvtypes(acctype) . . . ; ============================================================ . . . ; The listbox portion of the combo box will hold peoples' names, account . . . ; numbers and service type. The DO command is used to execute the . . . ; Additem method to place these rows in the combo box. . . . ; The innermost FOR loop (above) corresponds in a sense to the DO WHILE . . . ; loop in Listing One. . . . ; ============================================================ . . . do _combo1.additem(name_" "_account_" "_srvtype) . . . set _combo1.ItemAdd(_combo1.newindex)=pkey . . . quit . . quit . Quit quit ; ============================================================ ; As a reminder, the above code is placed in the KeyPress event of combo box. ; If text area is touched with new input, data fetch and processing loop will ; execute again to recover rows from the database which satisfy the new input. ; This processing logic is that of VB and is independent of the ; scripting/database language: BASIC/SQL vs. M ; ============================================================ <End of the KeyPress Event> <Begin the Double-Click Event> ; ============================================================ ; The following Visual-M code is placed under double-click event of combo box. ; If the end-user has double-clicked an item; a new query is issued to recover ; and display all elements of the row selected. ; First, detect which item in the combo box was selected just as in ; Listing One but using Visual-M ; ============================================================ set selected=_combo1.list ; ============================================================ ; Now, get the row id from ItemData of the row selected. Again, using ; Visual-M ; ============================================================ set pkey=_Combo1.ItemData(selected) ; ============================================================ ; recover the info in the combo box to reuse it. This code uses the ; $PIECE function emulated in VB by the 'piece' module ; ============================================================ set x=_Combo1.List(selected) set name=$piece(x," ") set account=$piece(x," ",2) set srvtype=$piece(x," ",3) ; ============================================================ ; issue M code to recover all information to be displayed on the form and ; display on the form. ; $$date is an extrinsic function to convert internal date format to ; external format ; $$number is an extrinsic function to convert numbers to a desired ; external format ; These might be defined in an Open M Include file and would ; take arguments to define the output format. ; ============================================================ set _Pname.Text=name ; ============================================================ ; Open M Direct Database Access is used to recover rows from the database ; for the people 'table' and the account 'table'. The logic is to recover a ; row into a private process variable (here 'x') and the assign the data to ; the appropriate VB control. This code snippet and the one below combine the ; function of the SELECT query in the Double-Click event of the combo box and ; the assignments of the resulting data to the VB controls. ; ============================================================ set x=$get(^people("owners",pkey)) set _Pdob.Text=$$date($piece(x,"^",2),"dd/mm/yy") set _Pssn.Text=$piece(x,"^",3) set _Pacct_num.Text=account ; ============================================================ ; recover 'account' row into local M variable and assign to VB controls. ; ============================================================ set x-$get(^account(account)) set _Pacct_odate.Text=$$date($piece(x,"^",3),"dd/mm/yy") set _Pacct_cdate.Text=$$date($piece(x,"^",4),"dd/mm/yy") set _Pacct_balance.Text=$$number($piece(x,"^",2),"$n.n") set _Pacct_srvtype.Text=srvtype quit <End Combo Box Double-Click Event) End Listings