arrow_back
Back
lock
Spark Interview Question Set 1
lock
Introduction
lock
How to add a index Column in Spark Dataframe?
lock
What are the differences between Apache Spark and Apache Storm?
lock
How to limit the number of retries on Spark job failure in YARN?
lock
Is there any way to get Spark Application id, while running a job?
lock
How to stop a Running Spark Application?
lock
In Spark Standalone Mode, How to compress spark output written to HDFS
lock
Is there any way to get the current number of partitions of a DataFrame?
lock
How to get good performance with Spark
lock
Why does a job fail with “No space left on device”, but df says otherwise?
lock
Where are logs in Spark on YARN? How to view those logs?
lock
Spark Interview Question Set 2
lock
How to prevent Spark Executors from getting Lost when using YARN client mode?
lock
In which situation you will use Client mode and Cluster mode ?
lock
How to print the contents of RDD?
lock
What is the difference between Apache Spark and Apache Flink?
lock
How to remove the parentheses? from output
lock
What are possible reasons for receiving TimeoutException: [n seconds] ?
lock
How to open/stream .zip files through Spark?
lock
How to read multiline JSON in Apache Spark?
lock
How to replace NULL value in Spark Dataframe?
lock
How does Spark partition(ing) work on files in HDFS?
lock
Scenario Based Question (Memory Management)
lock
Scenario Based Question (Cache)
lock
Scenario Based Question (Cluster)
lock
Scenario Based Question (Recovery)
lock
Let’s say you have 100 GB of table and one 1 GB of small table. How do you join?
lock
Spark Interview Question Set 3
lock
How to read a AWS S3 file in Spark?
lock
I want to find the moving average of the Time Series using Apache Spark
lock
How to change column types in Spark SQL DataFrame?
lock
I've got big RDD(1gb) in yarn cluster. I can't use collect() How to handle this?
lock
Is there any way for Spark to create primary keys?
lock
How to add a constant column in a Spark DataFrame?
lock
What does Stage Skipped mean in Apache Spark web UI?
lock
How to concatenate columns in apache spark dataframe?
lock
While processing CSV file resultant output is multiple file, wanted single file?
lock
Explain sortByKey() operation.
lock
Spark Interview Question Set 4
lock
List the advantage of Parquet file in Apache Spark.
lock
Do you need to install Spark on all nodes of Yarn cluster while running Spark
lock
What is PageRank?
lock
What does MLlib do?
lock
What is GraphX?
lock
What do you understand by receivers in Spark Streaming ?
lock
Name some companies that are already using Spark Streaming.
lock
Name some source from where Spark streaming component can process real-time data
lock
What are the key features of Apache Spark that you like?
lock
What are the various data sources available in SparkSQL?
lock
Spark Interview Question Set 5
lock
What is the difference between map and flatMap and a good use case for each?
lock
How to read multiple text files into a single RDD?
lock
Does SparkSQL support subquery?
lock
Have you ever encounter Spark java.lang.OutOfMemoryError? How to fix this issue?
lock
How do I skip a header from CSV files in Spark?
lock
What happens to RDD when one of the nodes on which it is distributed goes down?
lock
Certain data that we want to use again and again how to improve performance
lock
How Spark Streaming API works?
lock
What is write ahead log(journaling)?
lock
What are the advantages of DataFrame?
lock
Spark Interview Question Set 6
lock
What is DataFrames?
lock
What is Spark Driver?
lock
What are benefits of Spark over MapReduce?
lock
What does a Spark Engine do?
lock
Explain the difference between Spark SQL and Hive?
lock
What are the various levels of persistence in Apache Spark?
lock
Which one will you choose for a project Hadoop MapReduce or Apache Spark?
lock
What is a DStream?
lock
What is the significance of Sliding Window operation?
lock
How can you minimize data transfers when working with Spark?
lock
Spark Interview Question Set 7
lock
Is it possible to run Apache Spark on Apache Mesos?
lock
Can you use Spark to access and analyse data stored in Cassandra databases?
lock
Explain about transformations and actions in the context of RDDs?
lock
What is Apache Spark Streaming?
lock
How can you define Spark Accumulators?
lock
What is a Broadcast Variable?
lock
What is Data locality / placement?
lock
Which all cluster manager can be used with Spark?
lock
What is Speculative Execution of a tasks?
lock
What is stage, with regards to Spark Job execution?
lock
Spark Interview Question Set 8
lock
What is DAGSchedular and how it performs?
lock
Please define executors in detail?
lock
Please explain, how worker's work, when a new Job submitted to them?
lock
What are the workers?
lock
Define Spark architecture?
lock
What is checkpointing?
lock
What is the difference between groupByKey and use reduceByKey ?
lock
What is Shuffling?
lock
What is the difference between cache() and persist() method of RDD
lock
What is coalesce transformation?
lock
Spark Interview Question Set 9
lock
Data is spread in all the nodes of cluster, how spark tries to process this data
lock
How would you control the number of partitions of a RDD?
lock
What is Lazy evaluated RDD mean?
lock
How do you define RDD?
lock
How do you evaluate your spark application ?
lock
How do you disable Info Message when running Spark Application?
lock
What is the advantage of broadcasting values across Spark Cluster?
lock
Is it possible to have multiple SparkContext in single JVM?
lock
What is the Default level of parallelism in Spark?
lock
Which all are the, ways to configure Spark Properties and order them?
lock
Spark Interview Question Set 10
lock
Which all kind of data processing supported by Spark?
lock
Why Spark is good at low-latency iterative workloads ?
lock
We understand Spark Streaming uses micro-batching. Does this increase latency?
lock
Does Spark require modified versions of Scala or Python?
lock
Do I need Hadoop to run Spark?
lock
How can I run Spark on a cluster?
lock
Does my data need to fit in memory to use Spark?
lock
How large a cluster can Spark scale to?
lock
How does Spark relate to Apache Hadoop?
lock
Who is using Spark in production?
Preview - Apache Spark Interview Question and Answer (100 FAQ)
Discuss (
0
)
navigate_before
Previous
Next
navigate_next