Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Simple ways to spider / crawl / duplicate / mirror websites for later use

Weird post for me if you follow most of mine but I had a few requests on copying sites so they could test them later so here are a couple really quick ways to mirror another site:

wget –spider –force-html -r -l5 https://www.internetsales.biz 2>&1 | grep ‘^–‘ | awk ‘{print $3}’ > urls.txt

HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3.

HTTrack allows users to download World Wide Web sites from the Internet to a local computer.[5][6] By default, HTTrack arranges the downloaded site by the original site’s relative link-structure. The downloaded (or “mirrored”) website can be browsed by opening a page of the site in a browser.

PlatformChoose file to downloadVersion
Windows (from Windows 2000 to Windows 10 and above) installer version
WinHTTrack (also included: command line version)
httrack-3.49.2.exe   [alternate site]3.49-2
4 MiB (4199216 B)
(20/May/2017)
We recommend:
Windows (from Windows Vista to Windows 10 and above) 64-bit installer version
WinHTTrack (also included: command line version)
httrack_x64-3.49.2.exe   [alternate site]3.49-2
4.3 MiB (4513224 B)
(20/May/2017)
Windows (from Windows 2000 to Windows 10 and above) without installer (eg: USB key)
WinHTTrack (also included: command line version)
httrack-noinst-3.49.2.zip   [alternate site]3.49-2
4.43 MiB (4640129 B)
(20/May/2017)
Windows (from Windows Vista to Windows 10 and above) 64-bit without installer (eg: USB key)
WinHTTrack (also included: command line version)
httrack_x64-noinst-3.49.2.zip   [alternate site]3.49-2
4.83 MiB (5064206 B)
(20/May/2017)
Linux/OSX/BSD/Unix sources version
WebHTTrack (also included: httrack, command line version)
httrack-3.49.2.tar.gz   [alternate site]3.49-2
1.75 MiB (1835116 B)
(20/May/2017)
Android (>= 2.2) on Google Play
HTTrack (Android)
com.httrack.android3.47.99 (trunk)
2.22 MiB
Linux Distributions (external links)
Debian packageDistribution Package – apt-get install webhttrack
Ubuntu packageDistribution Package – apt-get install webhttrack
Gentoo packageDistribution Package – emerge httrack
RPM package (RedHat & Suse)Search at rpmfind.net..
OSX (MacPorts) package
OSX (Homebrew) package
MacPorts Package – sudo port install httrack
Homebrew Package – brew install httrack
Fedora packageDistribution Package – yum install httrack
FreeBSD i386 packageSearch at www.freebsd.org..

The post Simple ways to spider / crawl / duplicate / mirror websites for later use first appeared on Just another WordPress site.



This post first appeared on Computer Security.org - CyberSecurity News, Inform, please read the originial post: here

Share the post

Simple ways to spider / crawl / duplicate / mirror websites for later use

×

Subscribe to Computer Security.org - Cybersecurity News, Inform

Get updates delivered right to your inbox!

Thank you for your subscription

×