December 13th 2017

Zombie.js in node.js fails to scrape certain websites

Problem

The simple script below returns a bunch of rubbish. It works for most websites, but not william hill:

var Browser = require("zombie");
var assert = require("assert");

// Load the page from localhost
browser = new Browser()
browser.visit("http://sports.williamhill.com/bet/en-gb/betting/y/5/et/Football.html", function () {
browser.wait(function(){
console.log(browser.html());
});
});

run with node

output:

S��J��ꪙRUݒ�kf�6��Efr2�Riz��^��0�X� ��{�^�a�yp��p��Ή��`��(��S]-��'N�8q��/��?�ݻ��u;�݇�ׯ�Eiٲ>��-��3�ۗG�Ee�,��mF��MI��Q�۲��ڊ�ZG��O�J�^S�C~g��JO�緹�Oݎ��P��ET�n;v��v��D�tvJn��J�8'��햷r�v:��m��J��Z�nh�]�� Z��.{Z��Ӳl�B'�.¶D�~$n�/��u"�z��Ni��"ǋ��\00_I\00\��S��O�E8{"�m;�h��,o��Q�y��;��a[��c��q�D�띊?��/|?:�;��Z!}��/�wے�h�

(actual output is much longer)

Anyone know why this happens, and specifically why it happens on the only site i actually want to scrape???

Thanks

Problem courtesy of: Hugh M Halford-Thompson

Mastering TikTok Gaming: A Beginner's…
Socialist Parliamentarians Condemn Ha…
Similar to Davido, Sabinus fails to a…
Manet - the Unrecognized, the Aristoc…
7 Smart investments to Generate a Pas…

Solution

I have abandoned this method long ago, but in case anyone is interested I got a reply from one of the zombie.js devs.

https://github.com/assaf/zombie/issues/251#issuecomment-5969175

He says: "Zombie will now send accept-encoding header to indicate it does not support gzip."

Thank you all who looked into this.

Solution courtesy of: Hugh M Halford-Thompson

Discussion

View additional discussion.

This post first appeared on Node.js Recipes, please read the originial post: here

People also like

Mastering TikTok Gaming: A Beginner's Guide

Socialist Parliamentarians Condemn Harshened Immigration Law as â€œIndignantâ€

Manet - the Unrecognized, the Aristocrat, the Great Bohemian

7 Smart investments to Generate a Passive Monthly Income from Your Currency Profits

Zombie.js in node.js fails to scrape certain websites

Zombie.js in node.js fails to scrape certain websites

Problem

Related Articles

Solution

Discussion

Share the post

Subscribe to Node.js Recipes

Thank you for your subscription