Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

SOLVED: Scrapy Override file_path from FilesPipeline

Joseph:

I want to modify the output folder of the downloaded files and based on source code of files pipeline, file_path can be override, i tried the below code but it seems i didn't work. Btw, I'm new on python - scrapy.

pipelines.py


from scrapy.pipelines.files import FilesPipeline

class secFilesPipeline(FilesPipeline):
def file_path(self, request, response=None, info=None):
## start of deprecation warning block (can be removed in the future)
def _warn():
from scrapy.exceptions import ScrapyDeprecationWarning
import warnings
warnings.warn('FilesPipeline.file_key(url) method is deprecated, please use '
'file_path(request, response=None, info=None) instead',
category=ScrapyDeprecationWarning, stacklevel=1)

# check if called from file_key with url as first argument
if not isinstance(request, Request):
_warn()
url = request
else:
url = request.url

# detect if file_key() method has been overridden
if not hasattr(self.file_key, '_base'):
_warn()
return self.file_key(url)
## end of deprecation warning block

media_guid = hashlib.sha1(to_bytes(url)).hexdigest() # change to request.url after deprecation
media_ext = os.path.splitext(url)[1] # change to request.url after deprecation
return 'test/%s%s' % (media_guid, media_ext)

settings.py


ITEM_PIPELINES = {
'myproject.pipelines.secFilesPipeline': 2,
'scrapy.pipelines.files.FilesPipeline': 1,
}

FILES_STORE = '/home/joseph/pdf'

Expected output: Ex. FILES_STORE + Month + filename.pdf = /home/joseph/pdf/September/filename.pdf

Any idea? Thank you.



Posted in S.E.F
via StackOverflow & StackExchange Atomic Web Robots
This Question have been answered
HERE


This post first appeared on Stack Solved, please read the originial post: here

Share the post

SOLVED: Scrapy Override file_path from FilesPipeline

×

Subscribe to Stack Solved

Get updates delivered right to your inbox!

Thank you for your subscription

×