Hadoop Interview Questions - 8

1. Hive, Pig, and Cascading all use a _________ data model .

A. value centric

B. columnar

C. tuple-centric

D. none of the mentioned

Correct Answer is : tuple-centric

2. A __________ represents a distributed, immutable collection of elements of type T.

A. PCollect

B. PCollection

C. PCol

D. All of the mentioned

Correct Answer is : PCollection

3. ___________ executes the pipeline as a series of MapReduce jobs.

A. SparkPipeline

B. MRPipeline

C. MemPipeline

D. None of the mentioned

Correct Answer is : MRPipeline

4. __________ represent the logical computations of your Crunch pipelines.

A. DoFns

B. DoFn

C. ThreeFns

D. None of the mentioned

Correct Answer is : DoFns

5. PCollection, PTable, and PGroupedTable all support a __________ operation.

A. intersection

B. union

C. OR

D. None of the mentioned

Correct Answer is : union

6. Point out the correct statement :

A. StreamPipeline executes the pipeline in-memory on the client

B. MemPipeline executes the pipeline by converting it to a series of Spark pipelines

C. MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster

D. All of the mentioned

Correct Answer is : MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster

7. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition

A. Transient

B. DoFns

C. Configuration

D. All of the mentioned

Correct Answer is : DoFns

8. Inline DoFn that splits a line up into words is an inner class :

A. Pipeline

B. MyPipeline

C. ReadPipeline

D. WritePipe

Correct Answer is : MyPipeline

9. Point out the wrong statement :

A. DoFns also have a number of helper methods for working with Hadoop Counters, all named increment

B. The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test

C. FilterFn class defines a single abstract method

D. None of the mentioned

Correct Answer is : None of the mentioned

10. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.

A. TaskInputContext

B. TaskInputOutputContext

C. TaskOutputContext

D. All of the mentioned

Correct Answer is : TaskInputOutputContext

11. The top-level ___________ package contains three of the most important specializations in Crunch.

A. org.apache.scrunch

B. org.apache.crunch

C. org.apache.kcrunch

D. all of the mentioned

Correct Answer is : org.apache.crunch

12. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.

A. spot

B. reflects

C. gets

D. all of the mentioned

Correct Answer is : reflects

13. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.

A. NLineInputFormat

B. InputLineFormat

C. LineInputFormat

D. None of the mentioned

Correct Answer is : NLineInputFormat

14. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.

A. Grouping

B. GroupingOptions

C. RowGrouping

D. None of the mentioned

Correct Answer is : GroupingOptions

15. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

A. Hive

B. MapReduce

C. Pig

D. Lucene

Correct Answer is : MapReduce

16. Point out the correct statement :

A. Data locality means movement of algorithm to the data instead of data to algorithm

B. When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm

C. Moving Computation is expensive than Moving Data

D. None of the mentioned

Correct Answer is : Data locality means movement of algorithm to the data instead of data to algorithm

17. The daemons associated with the MapReduce phase are ________ and task-trackers.

A. job-tracker

B. map-tracker

C. reduce-tracker

D. all of the mentioned

Correct Answer is : job-tracker

18. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible

A. DataNodes

B. TaskTracker

C. ActionNodes

D. All of the mentioned

Correct Answer is : TaskTracker

19. Point out the wrong statement :

A. The map function in Hadoop MapReduce have the following general form:map:(K1, V1) → list(K2, V2)

B. The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) → list(K3, V3)

C. MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs

D. None of the mentioned

Correct Answer is : MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs

20. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.

A. puts

B. gets

C. getSplits

D. all of the mentioned

Correct Answer is : getSplits

Hadoop interview questions part 8

Hadoop interview questions part 8

Take as many assements as you can to improve your validate your skill rating

Total Questions: 20

1. Hive, Pig, and Cascading all use a _________ data model .

2. A __________ represents a distributed, immutable collection of elements of type T.

3. ___________ executes the pipeline as a series of MapReduce jobs.

4. __________ represent the logical computations of your Crunch pipelines.

5. PCollection, PTable, and PGroupedTable all support a __________ operation.

6. Point out the correct statement :

7. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition

8. Inline DoFn that splits a line up into words is an inner class :

9. Point out the wrong statement :

10. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.

11. The top-level ___________ package contains three of the most important specializations in Crunch.

12. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.

13. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.

14. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.

15. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

16. Point out the correct statement :

17. The daemons associated with the MapReduce phase are ________ and task-trackers.

18. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible

19. Point out the wrong statement :

20. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.

Similar Interview Questions

Search for latest jobs

For Employers

For Partner

For Jobseekers

Help

Follow Us

snaprecruit