### Generalization in General

Before leaving Example 1, let's make a few observations. The **tallyFrequencyModes** method can handle all types of values because it forces all values into **Double** rounding bins. That requires calling **roundData** method each time it tallies any **DataPoint** even if no rounding is required. Also, using the **Double** class to store all data is wasteful. For example, some of the ordinal values could easily be stored in a byte. Generalizations very often come at some cost in machine efficiency. It is usually just a question of how big a price we are paying. We need to balance between machine and programmer efficiency and weigh each case separately.

The **tallyFrequencyModes** method ignores a **DataPoint** that is invalid. The **DataPoint** class sets a valid flag to true if it is instantiated with valid numeric data. This mechanism allows us to have a trash bucket. If your program is passed the value, "Donnie Boy", as a number then the valid flag is set to false. If it is passed that value a thousand times it will not skew your modal counts. Always provide a trash bucket for quantitative variables.

A trash bucket is a specific instance of an all-else bucket. An all-else bucket may be one of the most important software devices in frequency distributions. Let me give you a concrete example.

I took a music composition class in undergrad school. We were given the opportunity to receive feedback from a seasoned composer for our current attempts at music composition. The last week of class we were given a teacher evaluation survey. It asked, "My teacher always prepares a well thought out lecture: (A) strongly agree (B) agree (C) disagree (D) strongly disagree." A valid survey required a response for each and every question. The students were forced to either state the professor was either prepared or unprepared for a lecture that was never scheduled to happen. This left them with junk data and alienated professors. Always review your ordinal and qualitative choices to make sure you are covering all possibilities.

### Calculating the Median

The median is the middle value of a sorted odd numbered population. It is the average of the two middle values of an even numbered population. The **calculateMedian** method in Example 2 uses a method, **getNipCount**, that will completely explained later. For now, all you need to know is **getNipCount** returns the number of valid data points. The total count of items in the population is returned by the method **getFullCount**. So, I am calculating the median based solely on valid data items in Example 2.

void calculateMedian() { int elementsInArray = getList().size(); double[] tempArray = new double[elementsInArray]; int count = 0; // set median at 0 and return if no valid datapoints if ( elementsInArray == 0 ) { setMedian(0); return; } while (count < elementsInArray ) { // We are only going to count nip median and mode if (getList().get(count).GetValid() == true) { tempArray[count] = roundData(getList().get(count)); } ++count; } Arrays.sort(tempArray); //Actually count and getNipCount are the same value at this point // Using getNipCount() to underscore this is a count of only valid // data int midPoint = (int)getNipCount() / 2; if ( getNipCount() % 2 == 0 ) { // take the average of the two middle values // when we have an even number of valid values setMedian((tempArray[midPoint] + tempArray[midPoint - 1]) / 2); } else { setMedian(tempArray[midPoint]); } }

### Nip or Nil the Null

Let's look closer at trash handling with the **DataPoint** class. We will always receive some level of trash. Trash is usually translated into a null at some point in our processing. We have two choices when faced with that trash converted to a null. We can nip the null by throwing out the null and pretending it does not exist. Or, we can nil the null by assuming it functionally equivalent to zero and processing it as if it were just any other zero. Validating the data in the **DataPoint** class and keeping separate counts of valid and invalid counts allows me to distinguish between a valid data point that is actually zero and an invalid nil data point that is trash translated into zero.

The **calculateMedian** method ignores bad data always. **DataPoint.isValid** returns true only if the data is valid. Invalid values are not stored in **tempArray**. So, (1, 2, 3, null, null) would return a median of 2 because **calculateMedian** threw away the two null values. It would return a median of 1 (null=0, null=0, 1, 2, 3) if it translated the nulls to zero. We will revisit this discussion of nulls when calculating mean in Example 3. It calculates two different mean values.