Help With COMP5111M01 , Programming

Module Code: COMP5111M01

Module Title: Big Data Systems c© UNIVERSITY OF LEEDS

School of Computing Semester 2 2018/2019

Calculator instructions:

- You are not allowed to use any calculator in this examination.

Dictionary instructions:

- A basic English dictionary is available to use: raise your hand and ask an invigilator, if you

need it.

Examination Information

- There are 4 pages to this examination.

- There are 2 hours to complete the examination.

- Answer all 3 questions.

- The number in brackets [ ] indicates the marks available for each question or part

question.

- You are reminded of the need for clear presentation in your answers.

- The total number of marks for this examination paper is 60.

- You are allowed to use annotated materials.

Page 1 of 4 Turn the page over

Module Code: COMP5111M01

Question 1

(a) Facebook is an example of a massively connected social media platform, generating huge

volumes of data. Give an example scenario where Facebook may batch process some data,

and an example scenario where Facebook may need to process data in real-time.

[2 marks]

(b) There are several big data platforms available with different characteristics and choosing

the right platform requires an in-depth knowledge about the capabilities of these platforms.

You need to decide the right platform to choose from and therefore you investigate what

your application’s needs are. Give two fundamental issues that you will consider before

making the right decision.

[2 marks]

10-fold growth in world data by 2025. Give two reasons - with real-world examples - why

this trend is occurring.

[2 marks]

(d) State the similarities and differences between traditional computing clusters and the com-

puting clouds launched in recent years, considering the technical and economic aspects as

listed below:

• Hardware, software, and technical support.

• Resource allocation and provisioning methods.

• Infrastructure management and protection.

• Support of utility computing services.

[8 marks]

(e) You are designing an application that requires both data acquisition and pre-processing of

raw data for event filtering. Moreover you have the freedom to describe the underlying

hardware to use to perform the pre-processing. Which hardware architecture would you

choose for such an application? Justify your answer.

[3 marks]

(f) How does specialist hardware deployment and the use of a technology like Apache Storm

compare to the more traditional MapReduce solution?

[3 marks]

[Question 1 Total: 20 marks]

Page 2 of 4 Turn the page over

Module Code: COMP5111M01

Question 2

(a) Self-driving vehicles are a technology that is rapidly moving towards mass-market produc-

tion. Give examples of how a self-driving vehicle relates to the 5 Vs of Big Data (Volume,

Velocity, Variety, Veracity, Value).

[5 marks]

(b) The Hadoop Distributed File System (HDFS) is a popular storage mechanism for large

quantities of data. Explain how HDFS ensures the fault-tolerance of data stored on its

data nodes.

[2 marks]

containers and Virtual Machines using three criteria of your choice.

[3 marks]

(d) The original Hadoop’s MapReduce is used to process large sets of data on a large number of

collective servers. However, it often performs poorly while involving too many servers, e.g.

running 40K concurrent tasks over 4K servers. Clearly explain why such poor performance.

Outline a possible mitigation strategy. .

[5 marks]

(e) Apache Storm is an example of a Continuous Operator Model (COM) system, used to

process streaming data. Explain how Apache Storm guarantees that all data emitted by its

spouts will be processed.

[3 marks]

(f) Discuss two disadvantages of using Apache Storm to process streamed data.

[2 marks]

[Question 2 Total: 20 marks]

Page 3 of 4 Turn the page over

Module Code: COMP5111M01

Question 3

(a) Apache Spark is one of the most popular Big Data Systems in today’s industry. Discuss

two advantages that Spark offers over the more traditional Apache Hadoop framework, and

explain why these advantages are significant. Explain why Hadoop is still useful, and give

an example of how Hadoop could still be used.

[5 marks]

(b) Data deduplication is a specialized data compression technique for eliminating duplicate

copies of repeating data. Explain the concepts of both source-based and target-based

deduplication. Discuss an advantage and a disadvantage to each approach in the context

of Cloud Computing.

[5 marks]

database management model. Discuss two advantages and two disadvantages of using

NoSQL in the context of a big data system. Give an example scenario where use of a

NoSQL database would be appropriate.

[5 marks]

(d) Neo4j is an example of a NoSQL Graph database. Use an example to explain what type of

application a Graph database is suitable for. Discuss two advantages and two disadvantages

of graph databases.

[5 marks]

[Question 3 Total: 20 marks]

[Grand Total: 60 marks]

QQ：99515681
WeChat：codinghelp
Email：99515681@qq.com
Work Time：8:00-23:00

Hots

Ghostwriter Ecmt2150 Intermediate Econ... 2024-05-07
Ghostwriter Comp9123 Assignment 4 S1 2... 2024-05-07
Ghostwriter Alternative Securities Afi... 2024-05-07
Ghostwriter Bsan3204 Quiz 2Help With R 2024-05-07
Help With Enel 809 Assign 2 In Digital... 2024-05-07
Help With Econ0028: Economics Of Growt... 2024-05-07
Help With Database Management And Secu... 2024-05-07
Help With Citx1401 Computational?Think... 2024-05-07
Ghostwriter 5Aaob204 International Pol... 2024-05-07
Ghostwriter Csci-Ua.202: Operating Sys... 2024-05-07
Ghostwriter Mn-3526: Spreadsheets And ... 2024-05-07
Ghostwriter Numerical Computing, Sprin... 2024-05-07
Ghostwriter Ecmt1020 Introduction To E... 2024-05-07
Ghostwriter Principles Of Accounting I 2024-05-07
Ghostwriter Cmt218 Data Visualisationh... 2024-05-07
Help With Math1062: Mathematics 1B Sem... 2024-05-07
Ghostwriter Mktg 301 Advanced Marketin... 2024-05-07
Help With Finn 2041 Corporate Financeh... 2024-05-07
Help With Ec 502: Problem Set 5Help Wi... 2024-05-07
Help With Mec206 Dynamic Systems Lab ... 2024-05-07

Programming Assignment Help！