Programming 8. Design a Salary Analyzer Java Program to Process Salary Files

In this project, we are to design a Java program that processes data files. Specifically, we are to design and implement a Java command line program called SalaryAnalyzer with which a user can analyze employee salary files. The program should support four major functionalities:

  1. Get salary statistics
  2. Mark each earner according to average salary over all earners
  3. Get earners whose salaries are greater or equal to a threshold
  4. Plot histogram The program is a command line application that expects at least two arguments on the command line, a command for each of the four functionalities and an input salary file. The program should display a help message when no command arguments are given, as in the example below:
    $ java SalaryAnalyzer
    Usage: SalaryAnalyzer Command Input_File ...
    

In the following, we discuss the salary file structure and the four functionalities.

Structure of Salary File

In this project, salary files are the input files to the program. A salary file is a text file where each line is a record for an employee. The record has 4 fields, first name, last name, the rank of the employee, and the salary in dollars and cents. The fields are separated by a single space. The following is an example of such a salary file that contains data for 3 employees.

John Doe Assistant 51394.86
Jane Doe Associate 62135.43
Amy Doe Full 12349.99

The program should be able to handle salary file of any size as permitted by Java arrays.

Functionalities

Get salary statistics

To perform this functionality, a user should provide GetStats as the 1st command line argument and a salary file as the 2nd command line argument. The following example exhibits the functionality:

$ java SalaryAnalyzer GetStats
Usage: SalaryAnalyzer Command Input_File ...

$ java SalaryAnalyzer GetStats Salary.txt
    Min Salary: $50009.26
    Max Salary: $129975.83
Average Salary: $84768.71
 Median Salary: $80535.39

The program should compute the minimum (smallest), the maximum (greatest), the average, and the median salary using the data in the input file, and display them in the format exhibited in the example above. All calculations should be rounded to the nearest cent.

The median value of a list number may be a new concept. Assuming we want to compute the median of a list of values, we can evaluate it as follows:

  1. sort the list of values
  2. the median is the middle value or the average of the middle values. For this, we consider two cases
    1. If the number of the values is odd, there is a single middle value. The median is the value of the middle value.
    2. If the number of the values is even, there are two middle values. The median is the average of the two.

Mark each earner according to average salary over all earners

To perform this functionality, a user should provide MarkAverage as the 1st command line argument, a salary file as the 2nd command line argument, and an output file as the 3rd argument. The following example exhibits the functionality:

$ java SalaryAnalyzer MarkAverage Salary.txt
Usage: SalaryAnalyzer MarkAverage Input_File Output_File

$ java SalaryAnalyzer MarkAverage Salary.txt MarkedSalary.txt
Wrote MarkedSalary.txt

The program shall compute the average salary from the data in the input file, and mark each employee with “+”, “-“, or “=”. If an employee’s salary is greater than the average, mark the employee with “+”; if less, do with “-“; if equal, do with “=”. The program shall save the results in the output file, in the above example, MarkedSalary.txt. An example of such an output file is as follows

John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +

Get earners whose salaries are greater or equal to a threshold

To request this functionality, a user shall provide GetTopEarners, a salary threshold in dollar and cents, and an output file as the command line arguments. The following is a running example:

$ java SalaryAnalyzer GetTopEarners Salary.txt
Usage: SalaryAnalyzer GetTopEarners Input_File Salary_Threshold_in_Dollars Output_File

$ java SalaryAnalyzer GetTopEarners Salary.txt 50000.00 TopEarners100KDollars.txt
Wrote TopEarners100KDollars.txt

This example shows that the top earners, i.e., the employees who earn as much as or more than $50000.00 will be in the output file TopEarners100KDollars.txt.

It is required that the program should write the top earners in descending order according to their salary, i.e., the employee who earns the most goes the first, that who earns the second goes the second, and so on. Each employee’s record in the output file should be of the same format as the input file. The following is an excerpt of such an output file:

Amy Doe Full 110000.55
Jane Doe Associate 80000.55

Plot histogram

To request this functionality, a user shall provide PlotHistogram, the input file along with several other command line arguments. The program should print a histogram for the data in the input file. The following is a running example:

$ java SalaryAnalyzer.java PlotHistogram Salary.txt
Usage: SalaryAnalyzer PlotHistogram Input_File Begin_Salary End_Salary Num_Bins Bin_Unit

$ java SalaryAnalyzer.java PlotHistogram Salary.txt 50000 130000 10 10
[ 5000000 -  5800000): *******
[ 5800000 -  6600000): ***********
[ 6600000 -  7400000): ***************
[ 7400000 -  8200000): *******************
[ 8200000 -  9000000): **********
[ 9000000 -  9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****

The histogram is defined by several parameters that the user shall provide on the command line. To plot a histogram, we divide a given interval, indicated by Begin_Salary and End_Salary for this project into bins. Num_Bins specifies how many bins are requested. In the above example, Begin_Salary, End_Salary, and Num_Bins are $50,000, $130,000, and 10. The program shall divide the interval [$50,000, $130,000) into 10 bins, each is given by an interval. Since ($130000 - $50000)/10 = $8000, the 10 intervals for the 10 bins are [ 50000 - 58000), [ 58000 - 66000), and so on. In the above example, the intervals are shown in cents.

Next is to count the number of employees who fall in each bin. If the input file is big, the number of employees who fall in a bin or the frequency of the bin can be big. To make the histogram compact, we provide a command line argument called Bin_Unit. For Bin_Unit number of employees, we should plot a * – since it is unlikely the bin frequency is exactly a muliple of Bin_Unit, you should round the frequency to the nearest multiple of Bin_Unit.

Divide-and-Conquer

We follow the divide-and-conquer approach as introduced in class. The project is thus divided into multiple exercises and a subproject. You should complete all the exercises first and combine the methods in the subproject, for which, you may want to write more methods. In the subproject, you shall use these methods to complete the design of the program.

Exercise 8.1 Getting Number of Records in Salary File

To be able to read the salary file to an array, we need to know the number of records in the file in order to allocate an array for the records in the file. In this exercise, we are to implement the following method:

public static int getNumberOfRecords(String filePath) throws FileNotFoundException

It scans the file located by filePath to determine the number of records in the file, and returns the number to the caller.

Exercise 8.2 Reading Records in Salary File to String Array

We are to implement the following method:

public static String[] readFile(String filePath) throws FileNotFoundException

The method uses the getNumberOfRecords(String filePath) method to determine the number of records in the file located by filePath, allocate a String array, read the employee records in the file to the String array, and return the array to the caller.

Exercise 8.3 Retrieving Salary Array from Record Array

Now we write the following method:

public static int[] getSalaryList(String[] recordList)

An employee record is a String in the recordList array. The record contains 4 fields separated by a single white space, as discussed before. The last field is the employee’s salary in dollars and cents. This method allocates an int array, gets the salary for each record in recordList, converts it to cents, saves the cents in the int array, and return the int array to the caller.

Exercise 8.4 Sorting Parallel Arrays

The method we are to implement shall have the following header:

public static void parallelSelectionSort(String[] recordList, int[] salaryList)

The two parameters recordList and salaryList are two parallel arrays, i.e., the two elements in the two arrays at an index belongs to a single employee. The method should sort the parallel arrays according to salaryList in ascending order. Consider the following example:

recordList salaryList
Amy Doe Assistant 84639.18 8463918
John Doe Assistant 93630.93 5363093
Jane Doe Assistant 63630.46 6363046

After we invoke the parallelSelectionSort method on the two parallel arrays, the two arrays should become:

recordList salaryList
John Doe Assistant 93630.93 5363093
Jane Doe Assistant 63630.46 6363046
Amy Doe Assistant 84639.18 8463918

Exercise 8.5 Computing Median Salary

We shall implement the following method:

public static int computeMedianSalary(int[] salaryList)

As discussed before, the median is the middle salary of all employees. If the length of salaryList is odd, there is a single middle element; and if even, there are two. Assuming the salaryList is sorted, in the former case, the median is the value of the middle element, and in the later case, the average of the two middle elements and the average should be rounded to the nearest integer.

Exercise 8.6 Binary Search of Sorted Arrays

Given that arr is a sorted array, the following method searches the key in the array using binary search.

public static int binarySearch(int[] arr, int key)

The method returns the index of the key in the array. If the key is not found, it returns - (insertion point + 1) where insertion point is the array index if we had inserted the key in the array. 

Exercise 8.7 Getting Top Earners

The header of the method we are to implement is as follows:

public static String[] getTopEarners(String[] recordList, int[] salaryList, int salaryThresholdInCents)

The precondition for this method is that the two parallel arrays recordList and salaryList are already sorted according to salaryList in ascending order. Salaries in salaryList are already in cents. This method is to find the employee records in recordList whose salaries are equal to or greater than salaryThresholdInCents, and these records shall be returned in a String array to the caller. To complete this method, you must use the binarySearch method. In the returned array, the records must be arranged in descending order according to employee salaries.

Exercises 8.8 - 8.10 Plotting Salary Histograms

We shall develop three methods to complete and to plot a histogram for the salary data.

Exercise 8.8 Making Bins

This method is to compute the start and end values of the histogram bins. All bins are evenly divided and their width are rounded to the nearest integer. It returns an int array of length nBins + 1. Suppose the returned bins are in array bins. The start and end values of the i-th bin where i begins at 0, are bin[i] and bin[i+1].

public static int[] makeBins(int begin, int end, int nBins)

#### Exercise 8.9 Counting Array Elements in Bins The following method is to count the number of employees whose salaries are within a bin.

 public static int[] countFrequencies(int[] arr, int[] bins)

Suppose that array arr has all employee salaries, and the start and the end values of a bin is binStart and binEnd, an element of array arr belongs to the bin if the bin’s value is equal to or greater than binStart, but less than binEnd. The method returns an int array with the counts (or frequencies) of the array elements in each bin.

#### Exercise 8.10 Plotting Histogram The following method is to plot a histogram specified by the given parameters.

 public static String[] plotTextHistogram(int[] bins, int[] binFreqs, int barHeightUnit)

The histogram should have the format given in the example below:

[ 5000000 -  5800000): *******
[ 5800000 -  6600000): ***********
[ 6600000 -  7400000): ***************
[ 7400000 -  8200000): *******************
[ 8200000 -  9000000): **********
[ 9000000 -  9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****

For each bin, the method prints out the interval of the bin (i.e., the start value and the end value in cents). For every barHeightUnit in the frequency count, the method prints out a *, i.e., the number of * should be binFreqs[i]/barHeightUnit rounded to the nearest integer for the i-th bin.

8.11 Exercise 8.11 Writing Record Array to File

The following method writes the recordList to the file located by outFilePath. It writes a record a line in the format identical to the salary file.

public static void writeRecordListFile(String[] recordList, String outFilePath) throws FileNotFoundException

Exercise 8.12 Writing Record Array to File With “Marks” According to a Threshold

The following method writes the records in the recordList array, a record a line to the file whose path in outFilePath.

public static void writeFileMarkedWithThreshold(String[] recordList, int thresholdInCents, String outFilePath)

For each record, if the salary in the record is greater than the threshold, it appends a “+” to the line, if less, a “-“ to the line, and if equal, a “=” to the line. An example of such an output file is as follows given the threshold salary in cents is 8000055:

John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +

Subproject 8.1. The SalaryAnalyzer Application

Now combining the methods in the above, we are to complete the SalaryAnalyzer program. The following test runs exhibit its functionalities.

$ java SalaryAnalyzer GetStats
Usage: SalaryAnalyzer Command Input_File ...

$ java SalaryAnalyzer GetStats Salary.txt
    Min Salary: $50009.26
    Max Salary: $129975.83
Average Salary: $84768.71
 Median Salary: $80535.39

$ java SalaryAnalyzer MarkAverage Salary.txt
Usage: SalaryAnalyzer MarkAverage Input_File Output_File

$ java SalaryAnalyzer MarkAverage Salary.txt MarkedSalary.txt
Wrote MarkedSalary.txt

$ cat MarkedSalary.txt
John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +

$ java SalaryAnalyzer GetTopEarners Salary.txt
Usage: SalaryAnalyzer GetTopEarners Input_File Salary_Threshold_in_Dollars Output_File

$ java SalaryAnalyzer GetTopEarners Salary.txt 50000.00 TopEarners100KDollars.txt
Wrote TopEarners100KDollars.txt

$ cat TopEarners100KDollars.txt
Amy Doe Full 110000.55
Jane Doe Associate 80000.55

$ java SalaryAnalyzer.java PlotHistogram Salary.txt
Usage: SalaryAnalyzer PlotHistogram Input_File Begin_Salary End_Salary Num_Bins Bin_Unit

$ java SalaryAnalyzer.java PlotHistogram Salary.txt 50000 130000 10 10
[ 5000000 -  5800000): *******
[ 5800000 -  6600000): ***********
[ 6600000 -  7400000): ***************
[ 7400000 -  8200000): *******************
[ 8200000 -  9000000): **********
[ 9000000 -  9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****