Channels ▼
RSS

Design

Statistics In Java


Generalization in General

Before leaving Example 1, let's make a few observations. The tallyFrequencyModes method can handle all types of values because it forces all values into Double rounding bins. That requires calling roundData method each time it tallies any DataPoint even if no rounding is required. Also, using the Double class to store all data is wasteful. For example, some of the ordinal values could easily be stored in a byte. Generalizations very often come at some cost in machine efficiency. It is usually just a question of how big a price we are paying. We need to balance between machine and programmer efficiency and weigh each case separately.

The tallyFrequencyModes method ignores a DataPoint that is invalid. The DataPoint class sets a valid flag to true if it is instantiated with valid numeric data. This mechanism allows us to have a trash bucket. If your program is passed the value, "Donnie Boy", as a number then the valid flag is set to false. If it is passed that value a thousand times it will not skew your modal counts. Always provide a trash bucket for quantitative variables.

A trash bucket is a specific instance of an all-else bucket. An all-else bucket may be one of the most important software devices in frequency distributions. Let me give you a concrete example.

I took a music composition class in undergrad school. We were given the opportunity to receive feedback from a seasoned composer for our current attempts at music composition. The last week of class we were given a teacher evaluation survey. It asked, "My teacher always prepares a well thought out lecture: (A) strongly agree (B) agree (C) disagree (D) strongly disagree." A valid survey required a response for each and every question. The students were forced to either state the professor was either prepared or unprepared for a lecture that was never scheduled to happen. This left them with junk data and alienated professors. Always review your ordinal and qualitative choices to make sure you are covering all possibilities.

Calculating the Median

The median is the middle value of a sorted odd numbered population. It is the average of the two middle values of an even numbered population. The calculateMedian method in Example 2 uses a method, getNipCount, that will completely explained later. For now, all you need to know is getNipCount returns the number of valid data points. The total count of items in the population is returned by the method getFullCount. So, I am calculating the median based solely on valid data items in Example 2.


void calculateMedian() {
      int elementsInArray = getList().size();
      double[] tempArray = new double[elementsInArray];
      int count = 0;
      // set median at 0 and return if no valid datapoints
      if ( elementsInArray == 0 ) {
               setMedian(0);
               return;
      }
      while (count < elementsInArray ) {
            // We are only going to count nip median and mode
           if (getList().get(count).GetValid() == true) {
                  tempArray[count] = roundData(getList().get(count));
           }
          ++count;
      }
     Arrays.sort(tempArray);
     //Actually count and getNipCount are the same value at this point
    // Using getNipCount() to underscore this is a count of only valid
    // data
    int midPoint = (int)getNipCount() / 2;
    if ( getNipCount() % 2 == 0 ) {
        // take the average of the two middle values
        // when we have an even number of valid values
       setMedian((tempArray[midPoint] + tempArray[midPoint - 1]) / 2);
    } else {
    setMedian(tempArray[midPoint]);
    }
}

Example 2

Nip or Nil the Null

Let's look closer at trash handling with the DataPoint class. We will always receive some level of trash. Trash is usually translated into a null at some point in our processing. We have two choices when faced with that trash converted to a null. We can nip the null by throwing out the null and pretending it does not exist. Or, we can nil the null by assuming it functionally equivalent to zero and processing it as if it were just any other zero. Validating the data in the DataPoint class and keeping separate counts of valid and invalid counts allows me to distinguish between a valid data point that is actually zero and an invalid nil data point that is trash translated into zero.

The calculateMedian method ignores bad data always. DataPoint.isValid returns true only if the data is valid. Invalid values are not stored in tempArray. So, (1, 2, 3, null, null) would return a median of 2 because calculateMedian threw away the two null values. It would return a median of 1 (null=0, null=0, 1, 2, 3) if it translated the nulls to zero. We will revisit this discussion of nulls when calculating mean in Example 3. It calculates two different mean values.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video