Overview
Zenko (“Good fox” in Japanese) is a reporting system (see code on Github here) I’ve created over the last couple of weeks at Mozilla. Basically my non-technical coworkers were getting so frustrated by Tableau (“what the heck is the difference between INNER JOIN and OUTER JOIN?”) that I decided to create a simple dashboard interface for them.
Its a simple bootstrap front-end to a database containing campaign stats for sponsored tiles. You can drill down to each tile or client/partner and pivot by things like locale, country and date.
Zenko’s stack (high to low)
- Highcharts
- Bootstrap
- Flask running locally (due to security restrictions)
- Python scripts using psycopg2
- Postgres/OLAP Col Store running on Amazon Redshift
A new feature
When loading one of the analyses pages, a table will be shown. My coworker wanted to be able to download the data to Excel. I came up with 4 possible ways to implement this:
- Simply rerun the query, format the results as a csv on the backend, save it and window.open() the file location.
- Automatically save the data from each analysis request server request and periodically clear old files.
- Use a javascript library like ExcelBuilder
- Send the data back to the server, format it, and then back to the client via an iframe
Which is the best solution?
- This is problematic because our sticking point is the query speed. The redshift database is an OLAP Column Oriented database, and append-only. This means that it is insanely fast to add data to, but quite slow (often 6+ seconds) to query. Yes, it is dealing with billions of rows so excusable, but its not so great in terms of user experience to wait so long.The user doesn’t want to wait another 6 seconds for the analysis to rerun when they have the data already.
- This sounds like it could just end up storing a lot of data on the client, but it could work quite well. In terms of security though, I’m not sure that the data should be lingering on the user’s PC unrequested though.
- This didn’t work out so well – in Firefox, the file is incorrectly named. In the future, I’d like to name the files according to the parameters of the analysis e.g. <client>-<date>-<country>.xls
- This is the weirdest solution, but it works! Flask is running locally so it is actually very fast. There are no huge JQuery/JavaScript complications with file permissions and the fact that you can manipulate the data easily on the server is nice too.
Solution 4
The process is as follows when the “Download for Excel” button is clicked:
- Reference the HTML table using JavaScript and convert it to an array of arrays
- Append an iframe to the DOM
- Append a form with a POST action and hidden field to the iframe
- Insert the table contents into the hidden field’s value
- Submit the form
- Let Flask receive the POST request and format the information as a CSV
- Return an HTTP response with a file attachment containing the CSV
Let’s implement it
function convert_table_to_array() { //convert the current table to a list of lists itable = document.getElementById("impressions_table") //the table will always be called this in zenko //convert the table to a list of lists (i.e. array of arrays) var data = []; //meta data col_count = itable.children[0].children[0].children.length //number of cols row_count = itable.children[1].children.length //number of rows //grab the header (i.e. first row containing column titles) header_cells = itable.children[0].children[0].children header = [] for (i=0;i<header_cells.length;i++) { header.push(header_cells[i].textContent) } data.push(header) //iterate through each row row_cells = itable.children[1].children for (i=0;i<row_cells.length;i++) { row = itable.children[1].children[i].children //get each cell in the row row_content = [] for (j=0;j<row.length;j++) { cell_content = row[j].textContent.replace(",", "") //some textual entries already contained a comma which messed with things row_content.push(cell_content) } data.push(row_content) } return data }
There were various ways to do this in JQuery with
iterable.each()but I ran into complications and simply referencing cells using .children was much easier.
function download_xls() { //Downloads the current table as an excel file //1. Create an iframe iframe = document.createElement("iframe") iframe.setAttribute("width", 1) iframe.setAttribute("height", 1) iframe.setAttribute("frameborder", 0) iframe.setAttribute("src", "about:blank") //2. Create a form that can send data to Flask form = document.createElement("form") form.setAttribute("method", "POST") form.setAttribute("action", "/download_excel") //3. Put the table data into a hidden field in that form data = document.createElement("input") data.setAttribute("type", "hidden") data.setAttribute("value", convert_table_to_array()) data.setAttribute("name", "data") //4. Append these new elements to the DOM form.appendChild(data) iframe.appendChild(form) document.body.appendChild(iframe) //5. Send off the data form.submit() }
The (locally running) Flask will then recieve a POST request at
/download_excel. Let’s set up the route:
#accepts POST data (larger than GET data) @app.route('/download_excel', methods=['GET', 'POST']) def download_excel(): """Creates a file download from received post information""" #POST data is accessed via a dictionary at request.form data = request.form["data"] data = data.split(",") #Split it up by comma #The data is a list of lists like [[1,2], [3,4]] but is unfortunately sent #as [1,2,3,4]. However, we know that there are 6 columns, so we can split it #up into sixes with a list comprehension data = [data[x:x+6] for x in xrange(0, len(data), 6)] #Now re-join each one with commas, so it is nicely csv-ish data = "\n".join([','.join(x) for x in data]) response = make_response(data) #Return an HTTP Response with an attachment response.headers["Content-Disposition"] = "attachment; filename=data.csv" return response
Now, when the user clicks the button:
They instantly get:
Sorry, I can’t show what it looks like in Excel because the data isn’t public at the moment. All code is however available here on github!
One bizarre thing, however, is that the form doesn’t appear in the inspector (in either Chrome or Firefox):
Though, you can access it with some fairly lengthy getters:
Future features
- The files could be named something more intuitive than data.csv – perhaps a combination of various things seen in the URL’s query string
- Accommodate for a table wider than 6 rows. This could be done easily by stringifying the array using a different delimiter such as a “###”.
- Create an .xls file rather than a CSV, if there is any advantage
This post first appeared on Ikigomu | A Data Science, NLP And Personal Blog By, please read the originial post: here