Assignment: Task Description: You are a Data Engineer tasked with designing and implementing a proof- of concept Big Data analytics solution for a city’s transport authority. Scenario: The city council wants to analyse urban mobility patterns using data from road sensors, taxi trips, or public transit records. The objective is to identify congestion

Assignment Title:
Assignment Type: Report
Word Limit: 3000 words (+/- 300)
Weighting: 100%
Issue Date: 4/9/2025
Submission Date: 30/9/2025
Feedback Date: 21/10/2025
Plagiarism:
When submitting work for assessment, students should be aware of the
InterActive/Canvas guidance and regulations in concerning plagiarism. All submissions
should be your own, original work. Please note that you must not submit the same
assignment for two different modules within your course.
You must submit an electronic copy of your work. Your submission will be
electronically checked.
Harvard Referencing:
The Harvard Referencing System must be used. The Wikipedia, UKEssays.com or
similar websites must not be used or referenced in your work.
Student signature: Date:
SmartCity UrbanMobilityAnalysisusingHadoopand
Predictive AI
Introduction

2
Phase 2: Implementation & Analysis (LO 3) – 50% of Total Grade
LO1. Demonstrate the understanding of basic concepts of Big Data, its importance and need
in business context.
LO2. Explain the various components of Hadoop and HFDS along with their role in the Big Data
ecosystem.
LO3. Summarize the learning on Big Data analytics using Yarn, HDFS and MapReduce.
Assessment Criteria: Weighting 100%
3000 words
Task Description: You are a Data Engineer tasked with designing and implementing a proof- of
concept Big Data analytics solution for a city’s transport authority.
Scenario: The city council wants to analyse urban mobility patterns using data from road
sensors, taxi trips, or public transit records. The objective is to identify congestion
hotspots, understand their causes, and predict future traffic patterns to enable proactive
traffic management and better infrastructure planning.
Phase 1: Conceptual Design & Architecture (LO 1, LO 2) – 20% of Total Grade
The goal of this assignment is to provide you with hands-on experience in designing and
implementing a Big Data analytics solution that incorporates a predictive AI component.
You will address a hypothetical smart city challenge by using the Hadoop ecosystem to
process large-scale data and derive actionable insights for urban planning. This
assignment requires you to design a solution using Hadoop, HDFS, YARN, and
MapReduce to analyse transportation data. The final step involves using the processed
data to train a simple predictive model, thereby connecting Big Data processing with AI
applications. This will help you understand the end-to-end pipeline from raw data to
business intelligence in a modern context.
Learning Outcomes:

Business Context and Problem Statement (5%)
•
•
Describe the smart city scenario, focusing on the challenges of urban mobility.
Define a clear problem statement. For example: “To analyse historical traffic data to
predict the likelihood of traffic congestion at key intersections based on time of day
and day of the week”.
Explain how solving this problem provides tangible value to the city (e.g., reduced
commute times, lower pollution, improved public safety).
•
Hadoop Ecosystem and Architecture (15%)
•
•
•
•
Explain why a Big Data approach is necessary for this scenario.
Identify the roles of HDFS, YARN, and MapReduce in your proposed solution.
Justify your choice of these components for the defined problem.
Create a clear architectural diagram illustrating how data flows from source to
HDFS, is processed by MapReduce managed by YARN, and is then used for analysis.

3
Submission Guidelines:

Data Acquisition & Preparation (5%)
• Select a suitable public dataset representing urban mobility (e.g., taxi trip records,
traffic sensor data). Platforms like Kaggle or city-specific open data portals are good
sources.
Describe the dataset’s structure, size, and key attributes relevant to your problem
statement.
•
Hadoop Environment and Data Ingestion (10%)
• Set up a local single-node Hadoop cluster (e.g., using the official Apache Hadoop
binaries or a Docker image).
Document the key steps of your setup process.
Load your chosen dataset into HDFS. Provide the commands and screenshots
showing the data successfully stored in HDFS.
•
•
Data Processing with MapReduce (20%)
• Write a MapReduce program in Java or Python to process the data. Your program
must perform data cleaning and feature engineering to prepare it for the predictive
model.
Example tasks: calculate average trip duration per route, count vehicle flow per
hour, or identify other relevant features from the raw data.
Explain the logic of your Mapper and Reducer classes and include the well
commented source code in your report’s appendix.
•
•
Predictive Analysis and Visualization (15%)
•
•
Export the processed data from HDFS.
Use the processed data to train a simple predictive model. You can use a library like
Scikit-learn in Python to build a classification or regression model that addresses
your problem statement.
Analyze and interpret the output of your MapReduce job and your predictive model.
Create meaningful visualizations (e.g., graphs showing congestion by time of day, a
confusion matrix for your model) to present your findings.
•
•
Phase 3: Reflection and Documentation (LO 1, LO 2, LO 3) – 30% of Total Grade
Critical Reflection (10%)
•
•
Reflect on the key challenges you encountered during implementation (e.g., data
cleaning, debugging MapReduce, model accuracy) and how you addressed them.
Critically discuss the performance and scalability of your MapReduce solution.
Could it be optimized (e.g., by using a Combiner)?
Final Report Documentation (20%)
•
•
•
Compile a detailed, professional report of no more than 3000 words documenting
the entire project.
The report must be well-structured with clear headings, proper grammar, and
academic language.
Ensure all phases (Conceptual Design, Implementation, Reflection) are thoroughly
covered, including diagrams, code snippets, commands, and visualizations to
support your work.
Include a bibliography using the Harvard referencing style. •

4
•
•
•
•
•
•
•
•
Document Format: Submit your assignment as a single document following theBSBI
assignment template provided in Canvas.
Writing Quality: Ensure clear and concise writing with proper grammar and spelling.
Use headings and subheadings to organize your work logically according to the tasks
outlined above.
Visuals: Include visuals like diagrams (process flow, conceptual model sketches), tables
(data assumptions, results), and graphs (simulation output) where appropriate to
enhance understanding.
Task Coverage: Address each part thoroughly, demonstrating your understanding of Big
Data concepts and their application to the business scenario.
Implementation Details: Provide relevant examples and details of your model
implementation, including code snippets, commands, and calculations.
Referencing Style: Use Harvard referencing style for your bibliography.
Discussion: Discuss your findings, insights, and the implications of your
recommendations. Reflect on the challenges faced and how you overcame them.
Submission: Submit your assignment electronically (Canvas) by the specified deadline.

5
GUIDANCE ON ASSESSMENT
Allmaterials must be properly referenced under Harvard conventions. The length required is 3000 words with tasks equally weighted. The writing style should be formal academic /
report writing style with in-text referencing to support your comments and observations. Originality, quality of argument and good structure are required. The report should demonstrate
sound understanding and ability to apply knowledge and theory of Simulation Techniques. Additional marks being awarded for juxtaposition and insight of issues.
Grading Criteria
Generic Criteria
Knowledge of contexts,
concepts, technologies and
processes
The extent to which:
relevant contextual or theoretical
issues are identified, defined and
described
historical or contemporary
practices are identified, defined
and described
appropriate technologies,
methods and processes are
identified defined and described
90 – 100 80 – 89 70 – 79 60 – 69 50 – 59 40 – 49 30 – 39 0-29
Level
6
Understanding through
application of knowledge
The degree to which research
methods are demonstrated:
relevant knowledge and
information is compared,
contrasted, manipulated,
translated and interpreted
knowledge and information is
selected, analysed, synthesized
and evaluated in order to
generate creative ideas, practices,
solutions, arguments or
hypotheses
[grading descriptions continue exactly as shown]

6
Application of technical and
professional skills
The degree to which:
appropriate materials and media
are selected, tested and utilised
to realise and present ideas and
solutions
appropriate technologies,
methods and processes are
demonstrated
transferable, professional skills
are effectively demonstrated
self management and
independent learning are
demonstrated
[grading descriptions continue exactly as shown]

If you want, I can also format it cleanly into a Word document or simplify it into notes for easier studying.

Recent Posts

Recent Comments

Archives

Categories

Related Posts

Post an explanation of how the use of CBT in groups compares to its use in family or individual settings

Post a summary on how predictive analytics might be used to support healthcare

CONCEPT WORKSHEET: Practicing Summary, Paraphrase, and Quotes

Recent Posts

Recent Comments

Archives

Categories