Introduction to SparkContext
Generating the Spark context is the primary and necessary step in any SparkContext for the Spark driver. On worker the nodes, the operations inside the executors are run by the driver program. The gateway point of Spark in Apache functionality is the Spark context. Through the spark context, the driver application of Spark will be passed and they have parameters. A driver program initializes, which has the main function and the SparkContext gets initiated and generated here, as soon as we run any Spark application. Access and allowance to Spark Cluster is done with the help of Resource Manager which are of two types in main the Mesos, YARN. Initially, SparkConf ( spark configuration ) should be made to create a SparkContext.
Syntax for Apache SparkContext:
from pyspark import SparkContext
sc = SparkContext("local", "First App")
How Apache SparkContext is Created?
Initially, SparkConf should be made if one has to create SparkContext. The parameter for configuration of Sparkconf is our Spark driver application will pass to SparkContext. The parameters from these, a few are used in defining the properties of driver application in Spark.
And the other few are utilized in allocating the cluster resources which are the memory size, the number the cores on the worker nodes, used by executors run by the Spark. To put it in a simple way, the Spark context helps in guiding the accession of the Spark cluster. Invoking the text file – textFile, the file sequencing – sequenceFile, and parallelize and a few others can be done after creating the SparkContext object.
Parameters:
Profiler_cls | Used to do the profiling which is called a custom profiler and the default is pyspark.profiler.BasicProfiler. |
JSC | Instance of the Java Spark context. |
Gateway | Install a new JVM or otherwise use the present or existing JVM. |
Serializer | Serialiser of RDD. |
batchSize | The number of Python objects show cased as a single object in Java. To disable batching, set 1. To automatically choose the batch size based on object sizes, set 0. or to use an unlimited batch size, set -1. |
Environment | Worker nodes environment variables. |
pyFiles | PythonPath has an add on of .zip or .py files to send to the cluster. |
SparkHome | Directory for Spark installation. |
appName | The job name of particulars. |
Master | It connects the cluster URL. |
Conf | To set all the Spark properties, an object of L, that is, Sparkconf, spark configuration is used. |
Below represents the data flow of the Spark context:
The Spark context takes Py4J to use and launches a Java virtual machine which further creates a Java Spark context. PySpark has the context in Spark available as sc which is in default. That is the reason why creating a new Spark context will not work.
Code:
class pyspark.SparkContext (
master = None,
appName = None,
sparkHome = None,
pyFiles = None,
environment = None,
batchSize = 0,
serializer = PickleSerializer(),
conf = None,
gateway = None,
jsc = None,
profiler_cls =
)
Example:
Code:
package com.dataflair.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object Word_Count {
def main(args: Array[String]) {
//Configuration for spark context is set and created
val conf = new SparkConf()
.setAppName("WordCount")
// An object for Spark context is created
val sc = new SparkContext(conf)
//Check whether sufficient params are supplied
if (args.length
println("Usage: ScalaWordCount
Output:
Conclusion
To sum up, Spark helps to simplify the challenging and computationally intensive task of processing high volumes of real-time or archived data, both structured and unstructured, seamlessly integrating relevant complex capabilities such as machine learning and graph algorithms. Spark brings Big Data processing to the masses. Hence, SparkContext provides the various functions in Spark like get the current status of Spark Application, set the configuration, cancel a job, Cancel a stage and much more. It is an entry point to the Spark functionality. Thus, it acts as a backbone.
Recommended Articles
This is a guide to SparkContext. Here we discuss the introduction to SparkContext and how apache SparkContext is created with respective example. You may also have a look at the following articles to learn more –
- Spark Accumulator
- Spark Parallelize
- Spark Functions
- Spark Versions
The post SparkContext appeared first on EDUCBA.
This post first appeared on Best Online Training & Video Courses | EduCBA, please read the originial post: here