Take as many assements as you can to improve your validate your skill rating
Total Questions: 20
1. Hive, Pig, and Cascading all use a _________ data model .
Correct Answer is : tuple-centric
2. A __________ represents a distributed, immutable collection of elements of type T.
Correct Answer is : PCollection
3. ___________ executes the pipeline as a series of MapReduce jobs.
Correct Answer is : MRPipeline
4. __________ represent the logical computations of your Crunch pipelines.
Correct Answer is : DoFns
5. PCollection, PTable, and PGroupedTable all support a __________ operation.
Correct Answer is : union
6. Point out the correct statement :
Correct Answer is : MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster
7. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition
Correct Answer is : DoFns
8. Inline DoFn that splits a line up into words is an inner class :
Correct Answer is : MyPipeline
9. Point out the wrong statement :
Correct Answer is : None of the mentioned
10. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.
Correct Answer is : TaskInputOutputContext
11. The top-level ___________ package contains three of the most important specializations in Crunch.
Correct Answer is : org.apache.crunch
12. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.
Correct Answer is : reflects
13. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.
Correct Answer is : NLineInputFormat
14. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.
Correct Answer is : GroupingOptions
15. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
Correct Answer is : MapReduce
16. Point out the correct statement :
Correct Answer is : Data locality means movement of algorithm to the data instead of data to algorithm
17. The daemons associated with the MapReduce phase are ________ and task-trackers.
Correct Answer is : job-tracker
18. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible
Correct Answer is : TaskTracker
19. Point out the wrong statement :
Correct Answer is : MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
20. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.