Home Page > > Details

Help With COMP5111M01Ghostwriter , ProgrammingGhostwriter

Module Code: COMP5111M01 
Module Title: Big Data Systems c© UNIVERSITY OF LEEDS 
School of Computing Semester 2 2018/2019 
Calculator instructions: 
- You are not allowed to use any calculator in this examination. 
Dictionary instructions: 
- A basic English dictionary is available to use: raise your hand and ask an invigilator, if you 
need it. 
Examination Information 
- There are 4 pages to this examination. 
- There are 2 hours to complete the examination. 
- Answer all 3 questions. 
- The number in brackets [ ] indicates the marks available for each question or part 
- You are reminded of the need for clear presentation in your answers. 
- The total number of marks for this examination paper is 60. 
- You are allowed to use annotated materials. 
Page 1 of 4 Turn the page over 
Module Code: COMP5111M01 
Question 1 
(a) Facebook is an example of a massively connected social media platform, generating huge 
volumes of data. Give an example scenario where Facebook may batch process some data, 
and an example scenario where Facebook may need to process data in real-time. 
[2 marks] 
(b) There are several big data platforms available with different characteristics and choosing 
the right platform requires an in-depth knowledge about the capabilities of these platforms. 
You need to decide the right platform to choose from and therefore you investigate what 
your application’s needs are. Give two fundamental issues that you will consider before 
making the right decision. 
[2 marks] 
(c) Data volume is predicted to grow at an enormous rate, with some studies predicting a 
10-fold growth in world data by 2025. Give two reasons - with real-world examples - why 
this trend is occurring. 
[2 marks] 
(d) State the similarities and differences between traditional computing clusters and the com- 
puting clouds launched in recent years, considering the technical and economic aspects as 
listed below: 
• Hardware, software, and technical support. 
• Resource allocation and provisioning methods. 
• Infrastructure management and protection. 
• Support of utility computing services. 
[8 marks] 
(e) You are designing an application that requires both data acquisition and pre-processing of 
raw data for event filtering. Moreover you have the freedom to describe the underlying 
hardware to use to perform the pre-processing. Which hardware architecture would you 
choose for such an application? Justify your answer. 
[3 marks] 
(f) How does specialist hardware deployment and the use of a technology like Apache Storm 
compare to the more traditional MapReduce solution? 
[3 marks] 
[Question 1 Total: 20 marks] 
Page 2 of 4 Turn the page over 
Module Code: COMP5111M01 
Question 2 
(a) Self-driving vehicles are a technology that is rapidly moving towards mass-market produc- 
tion. Give examples of how a self-driving vehicle relates to the 5 Vs of Big Data (Volume, 
Velocity, Variety, Veracity, Value). 
[5 marks] 
(b) The Hadoop Distributed File System (HDFS) is a popular storage mechanism for large 
quantities of data. Explain how HDFS ensures the fault-tolerance of data stored on its 
data nodes. 
[2 marks] 
(c) Containers are used in Hadoop V2. They are viewed as the Virtual Machine killer. Compare 
containers and Virtual Machines using three criteria of your choice. 
[3 marks] 
(d) The original Hadoop’s MapReduce is used to process large sets of data on a large number of 
collective servers. However, it often performs poorly while involving too many servers, e.g. 
running 40K concurrent tasks over 4K servers. Clearly explain why such poor performance. 
Outline a possible mitigation strategy. . 
[5 marks] 
(e) Apache Storm is an example of a Continuous Operator Model (COM) system, used to 
process streaming data. Explain how Apache Storm guarantees that all data emitted by its 
spouts will be processed. 
[3 marks] 
(f) Discuss two disadvantages of using Apache Storm to process streamed data. 
[2 marks] 
[Question 2 Total: 20 marks] 
Page 3 of 4 Turn the page over 
Module Code: COMP5111M01 
Question 3 
(a) Apache Spark is one of the most popular Big Data Systems in today’s industry. Discuss 
two advantages that Spark offers over the more traditional Apache Hadoop framework, and 
explain why these advantages are significant. Explain why Hadoop is still useful, and give 
an example of how Hadoop could still be used. 
[5 marks] 
(b) Data deduplication is a specialized data compression technique for eliminating duplicate 
copies of repeating data. Explain the concepts of both source-based and target-based 
deduplication. Discuss an advantage and a disadvantage to each approach in the context 
of Cloud Computing. 
[5 marks] 
(c) NoSQL is a broad class of database management systems which do not use a relational 
database management model. Discuss two advantages and two disadvantages of using 
NoSQL in the context of a big data system. Give an example scenario where use of a 
NoSQL database would be appropriate. 
[5 marks] 
(d) Neo4j is an example of a NoSQL Graph database. Use an example to explain what type of 
application a Graph database is suitable for. Discuss two advantages and two disadvantages 
of graph databases. 
[5 marks] 
[Question 3 Total: 20 marks] 
[Grand Total: 60 marks] 
Contact Us - Email:99515681@qq.com    WeChat:codinghelp2
© 2014 www.asgnhelp.com
Programming Assignment Help!