Overview
This blog post is a follow up to the previous blog post where the raw loan applications data is brought into the data lake managed by Hadoop, normalized into CUFX model in transient phase, and finally some aggregations and correlations were done in refined phase. At this point, the data lake contains raw, transient, and refined data in different shape that are useful for data scientists, curators, and business analyst.
Tableau for Hadoop
Tableau comes with multiple product types such as Tableau desktop, server, online, public, and reader that caters to different user groups and needs. Our original work was based on Tableau professional that provides connectors to Hadoop system via Hive Thrift, Drill JDBC/ODBC, and SparkSQL. There are many references on the internet that explain in detail about the configuration of Tableau with Hadoop and some of them are listed in the reference section at the bottom of this blog.
The below screenshot shows hive configuration from Tableau to query hive tables via the metastore.
Loan Application Analytics with Tableau Public
To keep things simple for this blog, we have used Tableau public and the csv files that were extracted from Hive, and created different visualizations. These data files are based on CUFX normalized model obtained from the refined phase and mostly uses Applicant and Application entities. Please refer the previous blog post for the relation between Applicant and Application, and the associated schema. The below interactive visualization is served from the Tableau Public that was created using Tableau public desktop and posted it online.Please note that these visualizations are based on mock data and not real.
References
- Connect Hive with Tableau
- Connect SparkSQL with Tableau
- Connect Drill with Tableau
- Tableau Public
This post first appeared on Front-end Code Review & Validation Tools | Treselle Systems, please read the originial post: here