Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Converting an HTML table to an Excel download HTTP Response: A hack for slow OLAP DB connections

Overview

Zenko (“Good fox” in Japanese) is a reporting system (see code on Github here) I’ve created over the last couple of weeks at Mozilla. Basically my non-technical coworkers were getting so frustrated by Tableau (“what the heck is the difference between INNER JOIN and OUTER JOIN?”) that I decided to create a simple dashboard interface for them.

Its a simple bootstrap front-end to a database containing campaign stats for sponsored tiles. You can drill down to each tile or client/partner and pivot by things like locale, country and date.

Zenko’s stack (high to low)

  • Highcharts
  • Bootstrap
  • Flask running locally (due to security restrictions)
  • Python scripts using psycopg2
  • Postgres/OLAP Col Store running on Amazon Redshift

A new feature

When loading one of the analyses pages, a table will be shown. My coworker wanted to be able to download the data to Excel. I came up with 4 possible ways to implement this:

  1. Simply rerun the query, format the results as a csv on the backend, save it and window.open() the file location.
  2. Automatically save the data from each analysis request server request and periodically clear old files.
  3. Use a javascript library like ExcelBuilder
  4. Send the data back to the server, format it, and then back to the client via an iframe

Which is the best solution?

  1. This is problematic because our sticking point is the query speed. The redshift database is an OLAP Column Oriented database, and append-only. This means that it is insanely fast to add data to, but quite slow (often 6+ seconds) to query. Yes, it is dealing with billions of rows so excusable, but its not so great in terms of user experience to wait so long.The user doesn’t want to wait another 6 seconds for the analysis to rerun when they have the data already.
  2. This sounds like it could just end up storing a lot of data on the client, but it could work quite well. In terms of security though, I’m not sure that the data should be lingering on the user’s PC unrequested though.
  3. This didn’t work out so well – in Firefox, the file is incorrectly named. In the future, I’d like to name the files according to the parameters of the analysis e.g. <client>-<date>-<country>.xls
  4. This is the weirdest solution, but it works! Flask is running locally so it is actually very fast. There are no huge JQuery/JavaScript complications with file permissions and the fact that you can manipulate the data easily on the server is nice too.

Solution 4

The process is as follows when the “Download for Excel” button is clicked:

  1. Reference the HTML table using JavaScript and convert it to an array of arrays
  2. Append an iframe to the DOM
  3. Append a form with a POST action and hidden field to the iframe
  4. Insert the table contents into the hidden field’s value
  5. Submit the form
  6. Let Flask receive the POST request and format the information as a CSV
  7. Return an HTTP response with a file attachment containing the CSV

Let’s implement it

function convert_table_to_array() {
	//convert the current table to a list of lists
	itable = document.getElementById("impressions_table") //the table will always be called this in zenko
	
	//convert the table to a list of lists (i.e. array of arrays)
	var data = [];
	
	//meta data
	col_count = itable.children[0].children[0].children.length //number of cols
	row_count = itable.children[1].children.length //number of rows
	
	//grab the header (i.e. first row containing column titles)
	header_cells = itable.children[0].children[0].children
	header = []
	for (i=0;i<header_cells.length;i++) {
		header.push(header_cells[i].textContent)
	}
	data.push(header)
	
	//iterate through each row
	row_cells = itable.children[1].children
	
	for (i=0;i<row_cells.length;i++) {
		row = itable.children[1].children[i].children
		
		//get each cell in the row
		row_content = []
		for (j=0;j<row.length;j++) {
			cell_content = row[j].textContent.replace(",", "") //some textual entries already contained a comma which messed with things
			row_content.push(cell_content)
		}
		data.push(row_content)
	}
	return data
}

There were various ways to do this in JQuery with

iterable.each()
  but I ran into complications and simply referencing cells using .children was much easier.
function download_xls() {
	//Downloads the current table as an excel file

	//1. Create an iframe
	iframe = document.createElement("iframe")
	iframe.setAttribute("width", 1)
	iframe.setAttribute("height", 1)
	iframe.setAttribute("frameborder", 0)
	iframe.setAttribute("src", "about:blank")

	//2. Create a form that can send data to Flask
	form = document.createElement("form")
	form.setAttribute("method", "POST")
	form.setAttribute("action", "/download_excel")

	//3. Put the table data into a hidden field in that form
	data = document.createElement("input")
	data.setAttribute("type", "hidden")
	data.setAttribute("value", convert_table_to_array())
	data.setAttribute("name", "data")
	
	//4. Append these new elements to the DOM	
	form.appendChild(data)
	iframe.appendChild(form)
	document.body.appendChild(iframe)
	
	//5. Send off the data
	form.submit()
}

The (locally running) Flask will then recieve a POST request at

/download_excel
 . Let’s set up the route:
#accepts POST data (larger than GET data)
@app.route('/download_excel', methods=['GET', 'POST']) 
def download_excel():
	"""Creates a file download from received post information"""
	#POST data is accessed via a dictionary at request.form
	data = request.form["data"] 
	data = data.split(",") #Split it up by comma
	#The data is a list of lists like [[1,2], [3,4]] but is unfortunately sent 
	#as  [1,2,3,4]. However, we know that there are 6 columns, so we can split it 
	#up into sixes with a list comprehension
	data = [data[x:x+6] for x in xrange(0, len(data), 6)]
	#Now re-join each one with commas, so it is nicely csv-ish
	data = "\n".join([','.join(x) for x in data])
	response = make_response(data)
	#Return an HTTP Response with an attachment
	response.headers["Content-Disposition"] = "attachment; filename=data.csv"
	return response

Now, when the user clicks the button:

They instantly get:

Sorry, I can’t show what it looks like in Excel because the data isn’t public at the moment. All code is however available here on github!

One bizarre thing, however, is that the form doesn’t appear in the inspector (in either Chrome or Firefox):

Though, you can access it with some fairly lengthy getters:

Future features 

  • The files could be named something more intuitive than data.csv  – perhaps a combination of various things seen in the URL’s query string
  • Accommodate for a table wider than 6 rows. This could be done easily by stringifying the array using a different delimiter such as a “###”.
  • Create an .xls file rather than a CSV, if there is any advantage


This post first appeared on Ikigomu | A Data Science, NLP And Personal Blog By, please read the originial post: here

Share the post

Converting an HTML table to an Excel download HTTP Response: A hack for slow OLAP DB connections

×

Subscribe to Ikigomu | A Data Science, Nlp And Personal Blog By

Get updates delivered right to your inbox!

Thank you for your subscription

×