Data Flow Interview Questions

Data Flow interview questions

Take as many assements as you can to improve your validate your skill rating

Total Questions: 5

1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

A. Hive

B. MapReduce

C. Pig

D. Lucene

Correct Answer is : MapReduce

2. Point out the correct statement :

A. Data locality means movement of algorithm to the data instead of data to algorithm

B. When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm

C. Moving Computation is expensive than Moving Data

D. None of the mentioned

Correct Answer is : Data locality means movement of algorithm to the data instead of data to algorithm

3. The daemons associated with the MapReduce phase are ________ and task-trackers.

A. job-tracker

B. map-tracker

C. reduce-tracker

D. all of the mentioned

Correct Answer is : job-tracker

4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible

A. DataNodes

B. TaskTracker

C. ActionNodes

D. All of the mentioned

Correct Answer is : TaskTracker

5. Point out the wrong statement :

A. The map function in Hadoop MapReduce have the following general form:map:(K1, V1) → list(K2, V2)

B. The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) → list(K3, V3)

C. MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs

D. None of the mentioned

Correct Answer is : MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs

Data Flow interview questions