Investigate the implementation strategy of MapReduce System

  • Give a sample program using the MapReduce programming pattern in your favorite programming language for one of the following problems:
    1. Count word frequency of text files in a single computer system
    2. Compute and output TF-IDF for text files in a single computer system
  • Design a MapReduce system, and document the design in pseudo code. Consider the following components
    • User program
    • Master
    • Map worker
    • Reduce worker
  • Set up Apache Hadoop over a cluster of Virtual Machines, and write and run a program for one of the problems.
    • To run multiple virtual machines, it is advisable to set up Linux guests without GUI
  • Complete a deck of slides. Record video of your experiments.
  • Include resources used (LLMs, such as ChatGPT; websites; papers; GitHub repositories; or blogs etc.)

Be prepared to present selected slides and your video.