| Snaprecruit.com

| Snaprecruit.com

Interview question based on skill :

Take as many assements as you can to improve your validate your skill rating

Total Questions: 20

1. Hive, Pig, and Cascading all use a _________ data model .

Correct Answer is : tuple-centric

2. A __________ represents a distributed, immutable collection of elements of type T.

Correct Answer is : PCollection

3. ___________ executes the pipeline as a series of MapReduce jobs.

Correct Answer is : MRPipeline

4. __________ represent the logical computations of your Crunch pipelines.

Correct Answer is : DoFns

5. PCollection, PTable, and PGroupedTable all support a __________ operation.

Correct Answer is : union

6. Point out the correct statement :

Correct Answer is : MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster

7. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition

Correct Answer is : DoFns

8. Inline DoFn that splits a line up into words is an inner class :

Correct Answer is : MyPipeline

9. Point out the wrong statement :

Correct Answer is : None of the mentioned

10. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.

Correct Answer is : TaskInputOutputContext

11. The top-level ___________ package contains three of the most important specializations in Crunch.

Correct Answer is : org.apache.crunch

12. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.

Correct Answer is : reflects

13. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.

Correct Answer is : NLineInputFormat

14. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.

Correct Answer is : GroupingOptions

15. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

Correct Answer is : MapReduce

16. Point out the correct statement :

Correct Answer is : Data locality means movement of algorithm to the data instead of data to algorithm

17. The daemons associated with the MapReduce phase are ________ and task-trackers.

Correct Answer is : job-tracker

18. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible

Correct Answer is : TaskTracker

19. Point out the wrong statement :

Correct Answer is : MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs

20. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.

Correct Answer is : getSplits