Apache Spark Interview Questions and Answers
Here are the list of most frequently asked Spark Interview Questions and Answers in technical interviews. These Apache Spark questions and answers are suitable for both fresher’s and experienced professionals at any level. The questions are for intermediate to somewhat advanced Apache Spark professionals, but even if you are just a beginner or fresher you should be able to understand the answers and explanations here we give. These Apache Spark Interview Questions and Answers will guide you to clear all Interviews.
Best Apache Spark Interview questions and Answers
Besant Technologies supports the students by providing Spark interview questions and answers for the job placements and job purposes. We provide Apache Spark online training also for all students around the world through the Gangboard medium. These are top interview questions and answers, prepared by our institute experienced trainers. Stay tune we will update New Apache Spark Interview questions with Answers Frequently. If you want to learn Apache Spark Practical then please go through this Apache Spark Training in Chennai
1.Various ways to create contexts in spark ?
a. Sparkconext
b. Sqlcontext
c. Sparksession
d. Sqlcontext.sparkcontext
2.Difference between map and flatmap?
a. Map – one input row to one output row
b. Flatmap – one input row to multiple output rows
3.Repartition and coalesce difference?
a. Using repartition spark can increase/decrease number of partitions of data.
b. Using coalesce spark only can reduce the number of partitions of input data
c. Reparition is not efficient than coalesce.
4.How to create a stream in spark
a. Dstream
b. Structured stream.
c. DirectStream
5.how to handle data shuffle in spark?
Using map partition and foreachpartition to replace all the collect methods in spark.
6. what are all the file formats supported by spark ?
Avro, parquest, json, xml, csv, tsv, snappy, orc, rc are the file formats supported by spark.
Raw files as well as the structured file formats also supported by spark for efficient reading.
7. What are all the internal daemons used in spark?
ACLs, BlockManager, Memestore, DAGScheduler, SparkContext, Driver, Worker,Executor, Tasks.
8. What is SPARK UI how to monitor a spark job?
Jobs- to view all the spark jobs
Stages- to check the DAGs in spark
Storages- to check all the cached RDDs
Streaming- to check the cached RDDs
Spark history server- to check all the logs of finished spark jobs.
9. Cluster manager in spark?
Standalone
YARN-client and YARN-cluster (efficient for master-slave architecture)
MESOS (Efficient for master master architecture container orchestration)
KUBERNETES(container orchestration)
10. how to submit a spark job?
Using spark-submit and just follow the following program?
spark-submit –class org.apache.spark.examples.ClassJobName –master yarn –deploy-mode client –driver-memory 4g –num-executors 2 –executor-memory 2g –executor-cores 10
in the above sample
–master is a cluster manager
driver-memory is the actual memory size of the driver
executor-memory is the actual memory size of the executor
–num-executors is the total number of executors which are running at the worker nodes.
–executor-cores number of individual processes that the executor memory can take up.
11. what is the difference between dataframe and dataset?
Dataframe is untyped (throw an exception at runtime in case of any error in the schema mismatch)
Dataset is typed(throw an exception at compile time in case of any error in the schema mismatch)
12. what are all the memory tuning parameters and how to achieve parallelism in spark?
a. leverage the Tungsten engine.
b. spark job execution plan analysis.
c. caching and data broadcasting and accumulating the data using multiple optimization techniques in spark.
13. spark history server how to start?
./sbin/start-history-server.sh –properties-file history.properties
Once you successfully start this server then you can check all the logs of all the containers in spark jobs.
14. how to join two dataframes in spark?
Df1.join(df2).where(df1.col1==df2.col1).where(df1.col1==df2.col1)
15. what is udfs and how to use it ?
UDFs are user defined functions and in which are used to make a certain changes across all the rows in a specific columns like timestamp to day conversion, timestamp to week conversion.
16. code sample to read a data from text file?
from pyspark import SparkContext
SparkContext.stop(sc)
sc = SparkContext(“local”,”besant”) sqlContext = SQLContext(sc)
sc.textFile(filename)
17. code sample to read a data from mysql ?
spark.read.format(‘jdbc’).options(driver=’com.mysql.jdbc.Driver’,url=”””jdbc:mysql://
:3306/db?user=&password=”””,dbtable=’besant’,numPartitions=4 ).load()
The post Apache Spark Interview Questions and Answers appeared first on Besant Technologies | No.1 Training Institute in Chennai, Bangalore & Pune.