Hadoop Interview Questions - 45

1. The ________ method in the ModelCountReducer class “reduces” the values the mapper collects into a derived value

A. count

B. add

C. reduce

D. all of the mentioned

Correct Answer is : reduce

2. Which of the following works well with Avro ?

A. Lucene

B. kafka

C. MapReduce

D. None of the mentioned

Correct Answer is : MapReduce

3. __________ tools is used to generate proxy objects in Java to easily work with the objects.

A. Lucene

B. kafka

C. MapReduce

D. Avro

Correct Answer is : Avro

4. Spark was initially started by ____________ at UC Berkeley AMPLab in 2009.

A. Mahek Zaharia

B. Matei Zaharia

C. Doug Cutting

D. Stonebraker

Correct Answer is : Matei Zaharia

5. Point out the correct statement :

A. RSS abstraction provides distributed task dispatching, scheduling, and basic I/O functionalities

B. For cluster manager, Spark supports standalone Hadoop YARN

C. Hive SQL is a component on top of Spark Core

D. None of the mentioned

Correct Answer is : For cluster manager, Spark supports standalone Hadoop YARN

6. ____________ is a component on top of Spark Core.

A. Spark Streaming

B. Spark SQL

C. RDDs

D. All of the mentioned

Correct Answer is : Spark SQL

7. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

A. Spark Streaming

B. Spark SQL

C. RDDs

D. All of the mentioned

Correct Answer is : RDDs

8. Point out the wrong statement :

A. For distributed storage, Spark can interface with a wide variety, including Hadoop Distributed File System (HDFS)

B. Spark also supports a pseudo-distributed mode, usually used only for development or testing purposes

C. Spark has over 465 contributors in 2014

D. All of the mentioned

Correct Answer is : All of the mentioned

9. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.

A. MLlib

B. Spark Streaming

C. GraphX

D. RDDs

Correct Answer is : Spark Streaming

10. ____________ is a distributed machine learning framework on top of Spark

A. MLlib

B. Spark Streaming

C. GraphX

D. RDDs

Correct Answer is : MLlib

11. ________ is a distributed graph processing framework on top of Spark.

A. MLlib

B. Spark Streaming

C. GraphX

D. All of the mentioned

Correct Answer is : GraphX

12. GraphX provides an API for expressing graph computation that can model the __________ abstraction.

A. GaAdt

B. Spark Core

C. Pregel

D. None of the mentioned

Correct Answer is : Pregel

13. Spark architecture is ___________ times as fast as Hadoop disk-based Apache Mahout and even scales better than Vowpal Wabbit.

A. 10

B. 20

C. 50

D. 100

Correct Answer is : 10

14. Users can easily run Spark on top of Amazon’s __________

A. Infosphere

B. EC2

C. EMR

D. None of the mentioned

Correct Answer is : EC2

15. Point out the correct statement :

A. Spark enables Apache Hive users to run their unmodified queries much faster

B. Spark interoperates only with Hadoop

C. Spark is a popular data warehouse solution running on top of Hadoop

D. None of the mentioned

Correct Answer is : Spark enables Apache Hive users to run their unmodified queries much faster

16. Spark runs on top of ___________, a cluster manager system which provides efficient resource isolation across distributed applications

A. Mesjs

B. Mesos

C. Mesus

D. All of the mentioned

Correct Answer is : Mesos

17. Which of the following can be used to launch Spark jobs inside MapReduce ?

A. SIM

B. SIMR

C. SIR

D. RIS

Correct Answer is : SIMR

18. Point out the wrong statement :

A. Spark is intended to replace, the Hadoop stack

B. Spark was designed to read and write data from and to HDFS, as well as other storage systems

C. Hadoop users who have already deployed or are planning to deploy Hadoop Yarn can simply run Spark on YARN

D. None of the mentioned

Correct Answer is : Spark is intended to replace, the Hadoop stack

19. Which of the following language is not supported by Spark ?

A. Java

B. Pascal

C. Scala

D. Python

Correct Answer is : Pascal

20. Spark is packaged with higher level libraries, including support for _________ queries.

A. SQL

B. C

C. C++

D. None of the mentioned

Correct Answer is : SQL

Hadoop interview questions part 45

Hadoop interview questions part 45

Take as many assements as you can to improve your validate your skill rating

Total Questions: 20

1. The ________ method in the ModelCountReducer class “reduces” the values the mapper collects into a derived value

2. Which of the following works well with Avro ?

3. __________ tools is used to generate proxy objects in Java to easily work with the objects.

4. Spark was initially started by ____________ at UC Berkeley AMPLab in 2009.

5. Point out the correct statement :

6. ____________ is a component on top of Spark Core.

7. Spark SQL provides a domain-specific language to manipulate ___________ in Scala, Java, or Python.

8. Point out the wrong statement :

9. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.

10. ____________ is a distributed machine learning framework on top of Spark

11. ________ is a distributed graph processing framework on top of Spark.

12. GraphX provides an API for expressing graph computation that can model the __________ abstraction.

13. Spark architecture is ___________ times as fast as Hadoop disk-based Apache Mahout and even scales better than Vowpal Wabbit.

14. Users can easily run Spark on top of Amazon’s __________

15. Point out the correct statement :

16. Spark runs on top of ___________, a cluster manager system which provides efficient resource isolation across distributed applications

17. Which of the following can be used to launch Spark jobs inside MapReduce ?

18. Point out the wrong statement :

19. Which of the following language is not supported by Spark ?

20. Spark is packaged with higher level libraries, including support for _________ queries.

Similar Interview Questions

Search for latest jobs

For Employers

For Partner

For Jobseekers

Help

Follow Us

snaprecruit