December 6th 2017

Alg_D:

I am wondering how to choose the best settings to run Tune me Spark Job. Basically I am just reading a big csv file into a DataFrame and count some string occurrences.

The input file is over 500 GB. The Spark job seems too slow..

The Rise of Mega Hits: Inside the AAA…
Jednostavne polpete od tune: Najbolje…
A List of the Best College Graduation…
A Popular Restaurant Now Makes An Une…
InclusÃ£o social na OCTOMAR transform…

terminal Progress Bar:


[Stage1:=======>                      (4174 + 50) / 18500]

NumberCompletedTasks: (4174) takes around one hour.

NumberActiveTasks: (50), I believe I can control with setting. --conf spark.dynamicAllocation.maxExecutors=50 (tried with different values).

TotalNumberOfTasks: (18500), why is this fixed? what does it mean, is it only related to file size? Since I am reading a csv just with little logic, how can I optimize the Spark Job?

I also tried changing :


 --executor-memory 10g 
 --driver-memory 12g

Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots
This Question have been answered
HERE

This post first appeared on Stack Solved, please read the originial post: here

People also like

The Rise of Mega Hits: Inside the AAA Game Industry

Jednostavne polpete od tune: Najbolje su uz krumpir salatu, a vole ih i veliki i mali

A List of the Best College Graduation Gifts for Him or Her

Vous devez faire un chiffre dâ€™affaires minimum chaque mois!

Exploring Opportunities with Freelance Writing Websites

SOLVED: Tune Spark, set executors and memory driver for reading large csv file

Related Articles

SOLVED: Tune Spark, set executors and memory driver for reading large csv file

Related Articles

Share the post

Subscribe to Stack Solved

Thank you for your subscription