Programming 8. Design a Salary Analyzer Java Program to Process Salary Files
In this project, we are to design a Java program that processes data files. Specifically,
we are to design and implement a Java command line program called SalaryAnalyzer
with which
a user can analyze employee salary files. The program should support four major functionalities:
- Get salary statistics
- Mark each earner according to average salary over all earners
- Get earners whose salaries are greater or equal to a threshold
- Plot histogram
The program is a command line application that expects at least two arguments on the command
line, a command for each of the four functionalities and an input salary file. The program
should display a help message when no command arguments are given, as in the example below:
$ java SalaryAnalyzer Usage: SalaryAnalyzer Command Input_File ...
In the following, we discuss the salary file structure and the four functionalities.
Structure of Salary File
In this project, salary files are the input files to the program. A salary file is a text file where each line is a record for an employee. The record has 4 fields, first name, last name, the rank of the employee, and the salary in dollars and cents. The fields are separated by a single space. The following is an example of such a salary file that contains data for 3 employees.
John Doe Assistant 51394.86
Jane Doe Associate 62135.43
Amy Doe Full 12349.99
The program should be able to handle salary file of any size as permitted by Java arrays.
Functionalities
Get salary statistics
To perform this functionality, a user should provide GetStats
as the 1st command line argument
and a salary file as the 2nd command line argument. The following example exhibits the functionality:
$ java SalaryAnalyzer GetStats
Usage: SalaryAnalyzer Command Input_File ...
$ java SalaryAnalyzer GetStats Salary.txt
Min Salary: $50009.26
Max Salary: $129975.83
Average Salary: $84768.71
Median Salary: $80535.39
The program should compute the minimum (smallest), the maximum (greatest), the average, and the median salary using the data in the input file, and display them in the format exhibited in the example above. All calculations should be rounded to the nearest cent.
The median value of a list number may be a new concept. Assuming we want to compute the median of a list of values, we can evaluate it as follows:
- sort the list of values
- the median is the middle value or the average of the middle values. For this, we consider two cases
- If the number of the values is odd, there is a single middle value. The median is the value of the middle value.
- If the number of the values is even, there are two middle values. The median is the average of the two.
Mark each earner according to average salary over all earners
To perform this functionality, a user should provide MarkAverage
as the 1st command line argument, a salary file
as the 2nd command line argument, and an output file as the 3rd argument. The following example exhibits the functionality:
$ java SalaryAnalyzer MarkAverage Salary.txt
Usage: SalaryAnalyzer MarkAverage Input_File Output_File
$ java SalaryAnalyzer MarkAverage Salary.txt MarkedSalary.txt
Wrote MarkedSalary.txt
The program shall compute the average salary from the data in the input file, and mark each employee with “+”, “-“, or “=”.
If an employee’s salary is greater than the average, mark the employee with “+”; if less, do with “-“; if equal, do with “=”.
The program shall save the results in the output file, in the above example, MarkedSalary.txt
. An example of such an output file
is as follows
John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +
Get earners whose salaries are greater or equal to a threshold
To request this functionality, a user shall provide GetTopEarners
, a salary threshold in dollar and cents, and an output
file as the command line arguments. The following is a running example:
$ java SalaryAnalyzer GetTopEarners Salary.txt
Usage: SalaryAnalyzer GetTopEarners Input_File Salary_Threshold_in_Dollars Output_File
$ java SalaryAnalyzer GetTopEarners Salary.txt 50000.00 TopEarners100KDollars.txt
Wrote TopEarners100KDollars.txt
This example shows that the top earners, i.e., the employees who earn as much as or more than $50000.00 will be in the output file
TopEarners100KDollars.txt
.
It is required that the program should write the top earners in descending order according to their salary, i.e., the employee who earns the most goes the first, that who earns the second goes the second, and so on. Each employee’s record in the output file should be of the same format as the input file. The following is an excerpt of such an output file:
Amy Doe Full 110000.55
Jane Doe Associate 80000.55
Plot histogram
To request this functionality, a user shall provide PlotHistogram
, the input file along with several other command line arguments.
The program should print a histogram for the data in the input file. The following is a running example:
$ java SalaryAnalyzer.java PlotHistogram Salary.txt
Usage: SalaryAnalyzer PlotHistogram Input_File Begin_Salary End_Salary Num_Bins Bin_Unit
$ java SalaryAnalyzer.java PlotHistogram Salary.txt 50000 130000 10 10
[ 5000000 - 5800000): *******
[ 5800000 - 6600000): ***********
[ 6600000 - 7400000): ***************
[ 7400000 - 8200000): *******************
[ 8200000 - 9000000): **********
[ 9000000 - 9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****
The histogram is defined by several parameters that the user shall provide on the command line. To plot a histogram, we divide
a given interval, indicated by Begin_Salary
and End_Salary
for this project into bins. Num_Bins
specifies how many bins
are requested. In the above example, Begin_Salary
, End_Salary
, and Num_Bins
are $50,000, $130,000, and 10. The program
shall divide the interval [$50,000, $130,000)
into 10 bins, each is given by an interval. Since ($130000 - $50000)/10 = $8000
,
the 10 intervals for the 10 bins are [ 50000 - 58000)
, [ 58000 - 66000)
, and so on. In the above example, the intervals
are shown in cents.
Next is to count the number of employees who fall in each bin. If the input file is big, the number of employees who fall in a
bin or the frequency of the bin can be big. To make the histogram compact, we provide a command line argument called Bin_Unit
.
For Bin_Unit
number of employees, we should plot a *
– since it is unlikely the bin frequency is exactly a muliple
of Bin_Unit
, you should round the frequency to the nearest multiple of Bin_Unit
.
Divide-and-Conquer
We follow the divide-and-conquer approach as introduced in class. The project is thus divided into multiple exercises and a subproject. You should complete all the exercises first and combine the methods in the subproject, for which, you may want to write more methods. In the subproject, you shall use these methods to complete the design of the program.
Exercise 8.1 Getting Number of Records in Salary File
To be able to read the salary file to an array, we need to know the number of records in the file in order to allocate an array for the records in the file. In this exercise, we are to implement the following method:
public static int getNumberOfRecords(String filePath) throws FileNotFoundException
It scans the file located by filePath
to determine the number of records in the file, and returns the number to the caller.
Exercise 8.2 Reading Records in Salary File to String Array
We are to implement the following method:
public static String[] readFile(String filePath) throws FileNotFoundException
The method uses the getNumberOfRecords(String filePath)
method to determine the number of records in the file located by
filePath
, allocate a String
array, read the employee records in the file to the String
array, and return the array
to the caller.
Exercise 8.3 Retrieving Salary Array from Record Array
Now we write the following method:
public static int[] getSalaryList(String[] recordList)
An employee record is a String
in the recordList
array. The record contains 4 fields separated by a single white space,
as discussed before. The last field is the employee’s salary in dollars and cents. This method allocates an int
array,
gets the salary for each record in recordList
, converts it to cents, saves the cents in the int
array, and return the
int
array to the caller.
Exercise 8.4 Sorting Parallel Arrays
The method we are to implement shall have the following header:
public static void parallelSelectionSort(String[] recordList, int[] salaryList)
The two parameters recordList
and salaryList
are two parallel arrays, i.e., the two elements in the two arrays at an index
belongs to a single employee. The method should sort the parallel arrays according to salaryList
in ascending order. Consider
the following example:
recordList | salaryList |
---|---|
Amy Doe Assistant 84639.18 | 8463918 |
John Doe Assistant 93630.93 | 5363093 |
Jane Doe Assistant 63630.46 | 6363046 |
After we invoke the parallelSelectionSort
method on the two parallel arrays, the
two arrays should become:
recordList | salaryList |
---|---|
John Doe Assistant 93630.93 | 5363093 |
Jane Doe Assistant 63630.46 | 6363046 |
Amy Doe Assistant 84639.18 | 8463918 |
Exercise 8.5 Computing Median Salary
We shall implement the following method:
public static int computeMedianSalary(int[] salaryList)
As discussed before, the median is the middle salary of all employees. If the length of salaryList
is odd, there is a
single middle element; and if even, there are two. Assuming the salaryList
is sorted, in the former case, the median
is the value of the middle element, and in the later case, the average of the two middle elements and the average should be
rounded to the nearest integer.
Exercise 8.6 Binary Search of Sorted Arrays
Given that arr
is a sorted array, the following method searches the key
in the array using binary search.
public static int binarySearch(int[] arr, int key)
The method returns the index of the key in the array. If the key
is not found, it returns - (insertion point + 1)
where insertion point
is the array index if we had inserted the key
in the array.
Exercise 8.7 Getting Top Earners
The header of the method we are to implement is as follows:
public static String[] getTopEarners(String[] recordList, int[] salaryList, int salaryThresholdInCents)
The precondition for this method is that the two parallel arrays recordList
and salaryList
are already sorted according to salaryList
in ascending order. Salaries in salaryList
are already in cents. This method is to find the employee records in recordList
whose salaries
are equal to or greater than salaryThresholdInCents
, and these records shall be returned in a String
array to the caller. To complete this method, you must use the binarySearch
method. In the returned array, the records must be arranged in descending order
according to employee salaries.
Exercises 8.8 - 8.10 Plotting Salary Histograms
We shall develop three methods to complete and to plot a histogram for the salary data.
Exercise 8.8 Making Bins
This method is to compute the start and end values of the histogram bins. All bins are evenly divided
and their width are rounded to the nearest integer. It returns an int
array of length nBins + 1
.
Suppose the returned bins are in array bins
. The start and end values of the i-th bin where
i
begins at 0, are bin[i]
and bin[i+1]
.
public static int[] makeBins(int begin, int end, int nBins)
#### Exercise 8.9 Counting Array Elements in Bins The following method is to count the number of employees whose salaries are within a bin.
public static int[] countFrequencies(int[] arr, int[] bins)
Suppose that array arr
has all employee salaries, and the start and the end values of a
bin is binStart
and binEnd
, an element of array arr
belongs to the bin if the
bin’s value is equal to or greater than binStart
, but less than binEnd
. The method
returns an int
array with the counts (or frequencies) of the array elements in each bin.
#### Exercise 8.10 Plotting Histogram The following method is to plot a histogram specified by the given parameters.
public static String[] plotTextHistogram(int[] bins, int[] binFreqs, int barHeightUnit)
The histogram should have the format given in the example below:
[ 5000000 - 5800000): *******
[ 5800000 - 6600000): ***********
[ 6600000 - 7400000): ***************
[ 7400000 - 8200000): *******************
[ 8200000 - 9000000): **********
[ 9000000 - 9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****
For each bin, the method prints out the interval of the bin (i.e., the start value and
the end value in cents). For every barHeightUnit
in the frequency count, the method
prints out a *
, i.e., the number of *
should be binFreqs[i]/barHeightUnit
rounded
to the nearest integer for the i-th bin.
8.11 Exercise 8.11 Writing Record Array to File
The following method writes the recordList
to the file located by outFilePath
. It writes
a record a line in the format identical to the salary file.
public static void writeRecordListFile(String[] recordList, String outFilePath) throws FileNotFoundException
Exercise 8.12 Writing Record Array to File With “Marks” According to a Threshold
The following method writes the records in the recordList
array, a record a line to the file whose path in outFilePath
.
public static void writeFileMarkedWithThreshold(String[] recordList, int thresholdInCents, String outFilePath)
For each record, if the salary in the record is greater than the threshold, it appends a “+” to the line, if less,
a “-“ to the line, and if equal, a “=” to the line. An example of such an output file
is as follows given the threshold salary in cents is 8000055
:
John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +
Subproject 8.1. The SalaryAnalyzer Application
Now combining the methods in the above, we are to complete the SalaryAnalyzer
program. The following test runs exhibit
its functionalities.
$ java SalaryAnalyzer GetStats
Usage: SalaryAnalyzer Command Input_File ...
$ java SalaryAnalyzer GetStats Salary.txt
Min Salary: $50009.26
Max Salary: $129975.83
Average Salary: $84768.71
Median Salary: $80535.39
$ java SalaryAnalyzer MarkAverage Salary.txt
Usage: SalaryAnalyzer MarkAverage Input_File Output_File
$ java SalaryAnalyzer MarkAverage Salary.txt MarkedSalary.txt
Wrote MarkedSalary.txt
$ cat MarkedSalary.txt
John Doe Assistant 50000.55 -
Jane Doe Associate 80000.55 =
Amy Doe Full 110000.55 +
$ java SalaryAnalyzer GetTopEarners Salary.txt
Usage: SalaryAnalyzer GetTopEarners Input_File Salary_Threshold_in_Dollars Output_File
$ java SalaryAnalyzer GetTopEarners Salary.txt 50000.00 TopEarners100KDollars.txt
Wrote TopEarners100KDollars.txt
$ cat TopEarners100KDollars.txt
Amy Doe Full 110000.55
Jane Doe Associate 80000.55
$ java SalaryAnalyzer.java PlotHistogram Salary.txt
Usage: SalaryAnalyzer PlotHistogram Input_File Begin_Salary End_Salary Num_Bins Bin_Unit
$ java SalaryAnalyzer.java PlotHistogram Salary.txt 50000 130000 10 10
[ 5000000 - 5800000): *******
[ 5800000 - 6600000): ***********
[ 6600000 - 7400000): ***************
[ 7400000 - 8200000): *******************
[ 8200000 - 9000000): **********
[ 9000000 - 9800000): **********
[ 9800000 - 10600000): ***********
[10600000 - 11400000): *******
[11400000 - 12200000): *****
[12200000 - 13000000): *****