Teaching a new programming language is difficult enough without confusing the very concepts we are trying to teach. The problem of concept confusion is compounded because the art of software development is a building process. That is, the concepts introduced early in the learning process serve as the foundation for understanding more complex concepts that are introduced later on.
Define vs. Declare
One early point of confusion comes from the belief that data definitions and data declarations are the same thing. They are not. Perpetuating this confusion is to doom the students' understanding of subsequent, more complex, topics. As stated by Brian W. Kernighan and Dennis M. Ritchie in The C Programming Language:
It is important to distinguish between the declaration of an external variable and its definition. A declaration announces the properties of a variable (its type, size, etc.); a definition also causes storage to be allocated.
If you really understand the difference between these two concepts, some nettlesome programming concepts become duck soup to understand. More importantly, understanding the distinction between define and declare applies to any language, although I concentrate on C#, Visual Basic, and Java here.
Suppose we have the statement (in C, C++, C# or Java):
int k; // In C, C++, C#, Java
Dim k as Integer ' Visual Basic
Ask yourself: What does the compiler do with such a simple statement? First, the compiler checks to see if the syntax is correct, which it is in this example. (Okay, I take some liberties with the actual internal workings of the compiler. However, when teaching beginning students, such abstractions and simplifications often help the students to understand what otherwise might be obscured by unnecessary details.) Next, the compiler scans its symbol table to see if a variable named k has already been defined. Table 1 is a (greatly simplified) symbol table. In the table, a variable name i has already been defined elsewhere in the program. Table 1 shows the state of the symbol table after variable i has been defined, but before variable k has been entered into the table.
Looking through the symbol table in Table 1, the compiler does not find another variable named k at the same scope level. Therefore, the compiler fills in the attribute list for the new variable as it current understands the variable named k. The new state of the symbol table is in Table 2.
Note in Table 2 that we have not filled in the lvalue for k. What is an lvalue? The lvalue of a variable is the memory location where we can find that variable stored in memory. (The term "lvalue" comes from the old assembly language days and referred to the "location value", or memory address, of a variable.) At this point, Table 2 shows us that we know quite a bit about the variable named k, but we do not know where it is stored yet.
The compiler now issues a request to the operating system's memory manager and asks for 4 bytes (the storage requirements for an int data type, taken from column 3 in Table 2) of storage. The memory manager looks for 4 contiguous bytes of storage and, assuming the request can be fulfilled, the memory manager passes back to the compiler the memory address of those 4 bytes of storage. For illustration, we assume the memory manager passes back address 910,000. Table 2 then changes state to look like the symbol table in Table 3.
Now recall the K&R statement:
A declaration announces the properties of a variable (its type, size, etc.); a definition also causes storage to be allocated.
The first four columns in Table 3 describe the basic properties of our variables, but it is column 5, the lvalue column, that has to be filled in for us to have a data definition. Notice that a data definition includes a data declaration while a data declaration does not include a data definition. Indeed, the state of variable k in Table 2 is a data declaration for variable k. On the other hand, Table 3 completes the entry and forms a data definition for variable k because the variable has been allocated storage (i.e., its lvalue is assigned a memory address).
Now, contrast the definition of variable k we just discussed with the C statement:
extern int j;
After checking the syntax for this statement, the compiler fills in the attribute list for the variable j. The symbol table now looks like Table 4.
In C, the keyword extern means that the variable is defined in another source file, but we would like to be able to use that variable in this source file. Because of the way extern variables work, the definition of j has already been processed when some other source file was compiled. (The fact that j was already given a home in memory is why we have the Scope column set to a different value.) Therefore, there is no need to allocate memory for variable j here. (It is the linker's responsibility to sort out where j actually lives in memory when all of the compiled source files are brought together.) Because no storage is allocated for j when the current source file is compiled, the lvalue column for j is not filled in. Therefore, the statement:
extern int j;
is a data declaration because no storage (i.e., no lvalue) was allocated for jj is used in the current source file.
Another common example illustrating the declare/define distinction was created when ANSI standardized the C programming language (i.e., X3J11) and allowed function prototypes. For example:
int myFunction(int a, double b);
The purpose of function prototypes is to allow the compiler to perform type-checking on the function parameters and return type. The symbol table might again change to something similar to that in Table 5.
Once again, no memory is allocated for myFunction(), as indicated by the empty lvalue column in Table 5. Only the attributes of the function are recorded in the symbol table so the compiler can perform type-checking when the function is used. (The Attributes in column 6 of the table might represent the number of arguments for the function; column 7 might be the byte count for the first argument, etc. Symbol tables can be quite complex and tables with dozens of columns are not uncommon.) Constructs similar to function prototypes are found in other languages, like interfaces in C++, C#, and Java.
As K&R point out, the critical distinction between data definition and declarations is that data definitions do cause storage to be allocated (i.e., there is an lvalue entry in the symbol table) while data declarations do not. Usually, data declarations appear in a source file for information purposes (e.g., to permit type-checking or enforce signature rules).
Sadly, programmers blur the distinction between data definitions and data declarations all the time. Indeed, most textbooks seem to be oblivious to the distinction. Microsoft's documentation uses the term "declaration" everywhere, and the term is used incorrectly most of the time. This not only makes learning programming more difficult, it robs the student of some useful learning techniques.