# Statistics In Java

### Generalization in General

Before leaving Example 1, let's make a few observations. The tallyFrequencyModes method can handle all types of values because it forces all values into Double rounding bins. That requires calling roundData method each time it tallies any DataPoint even if no rounding is required. Also, using the Double class to store all data is wasteful. For example, some of the ordinal values could easily be stored in a byte. Generalizations very often come at some cost in machine efficiency. It is usually just a question of how big a price we are paying. We need to balance between machine and programmer efficiency and weigh each case separately.

The tallyFrequencyModes method ignores a DataPoint that is invalid. The DataPoint class sets a valid flag to true if it is instantiated with valid numeric data. This mechanism allows us to have a trash bucket. If your program is passed the value, "Donnie Boy", as a number then the valid flag is set to false. If it is passed that value a thousand times it will not skew your modal counts. Always provide a trash bucket for quantitative variables.

A trash bucket is a specific instance of an all-else bucket. An all-else bucket may be one of the most important software devices in frequency distributions. Let me give you a concrete example.

I took a music composition class in undergrad school. We were given the opportunity to receive feedback from a seasoned composer for our current attempts at music composition. The last week of class we were given a teacher evaluation survey. It asked, "My teacher always prepares a well thought out lecture: (A) strongly agree (B) agree (C) disagree (D) strongly disagree." A valid survey required a response for each and every question. The students were forced to either state the professor was either prepared or unprepared for a lecture that was never scheduled to happen. This left them with junk data and alienated professors. Always review your ordinal and qualitative choices to make sure you are covering all possibilities.

### Calculating the Median

The median is the middle value of a sorted odd numbered population. It is the average of the two middle values of an even numbered population. The calculateMedian method in Example 2 uses a method, getNipCount, that will completely explained later. For now, all you need to know is getNipCount returns the number of valid data points. The total count of items in the population is returned by the method getFullCount. So, I am calculating the median based solely on valid data items in Example 2.

```
void calculateMedian() {
int elementsInArray = getList().size();
double[] tempArray = new double[elementsInArray];
int count = 0;
// set median at 0 and return if no valid datapoints
if ( elementsInArray == 0 ) {
setMedian(0);
return;
}
while (count < elementsInArray ) {
// We are only going to count nip median and mode
if (getList().get(count).GetValid() == true) {
tempArray[count] = roundData(getList().get(count));
}
++count;
}
Arrays.sort(tempArray);
//Actually count and getNipCount are the same value at this point
// Using getNipCount() to underscore this is a count of only valid
// data
int midPoint = (int)getNipCount() / 2;
if ( getNipCount() % 2 == 0 ) {
// take the average of the two middle values
// when we have an even number of valid values
setMedian((tempArray[midPoint] + tempArray[midPoint - 1]) / 2);
} else {
setMedian(tempArray[midPoint]);
}
}

```
Example 2

### Nip or Nil the Null

Let's look closer at trash handling with the DataPoint class. We will always receive some level of trash. Trash is usually translated into a null at some point in our processing. We have two choices when faced with that trash converted to a null. We can nip the null by throwing out the null and pretending it does not exist. Or, we can nil the null by assuming it functionally equivalent to zero and processing it as if it were just any other zero. Validating the data in the DataPoint class and keeping separate counts of valid and invalid counts allows me to distinguish between a valid data point that is actually zero and an invalid nil data point that is trash translated into zero.

The calculateMedian method ignores bad data always. DataPoint.isValid returns true only if the data is valid. Invalid values are not stored in tempArray. So, (1, 2, 3, null, null) would return a median of 2 because calculateMedian threw away the two null values. It would return a median of 1 (null=0, null=0, 1, 2, 3) if it translated the nulls to zero. We will revisit this discussion of nulls when calculating mean in Example 3. It calculates two different mean values.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

# First C Compiler Now on Github

The earliest known C compiler by the legendary Dennis Ritchie has been published on the repository.

# HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

HTML5 Mobile Development: Seven Good Ideas (and Three Bad Ones)

# Building Bare Metal ARM Systems with GNU

All you need to know to get up and running... and programming on ARM

# Amazon's Vogels Challenges IT: Rethink App Dev

Amazon Web Services CTO says promised land of cloud computing requires a new generation of applications that follow different principles.

# How to Select a PaaS Partner

Eventually, the vast majority of Web applications will run on a platform-as-a-service, or PaaS, vendor's infrastructure. To help sort out the options, we sent out a matrix with more than 70 decision points to a variety of PaaS providers.

More "Best of the Web" >>