Report this

What is the reason for this report?

how to save scraped data to digital ocean with scrapy python

Posted on March 7, 2024

i have created a code to scrape website and uploaded it to github and Run Scrapy Spiders On Digital Ocean Droplet With ScrapeOps. but i want to save the scrapped data back to digital ocean.anyone could guide me the steps what to enter in scrapy setting and how to create a space in digital ocean and how to save scrappped data in digital ocean space?



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Hey!

To save the scraped data to DigitalOcean Spaces using Scrapy and ScrapeOps, you could do the following:

  1. Create a DigitalOcean Spaces Bucket:

    • Log in to your DigitalOcean account.
    • Go to the ‘Spaces’ section and create a new space. During the creation process, you’ll choose a data center region, a unique name for your space, and whether it should be public or private.
    • Once the space is created, note down your Space name and the endpoint URL.

    https://docs.digitalocean.com/products/spaces/how-to/create/

  2. Get Your Access Keys:

    • You need to generate Spaces access keys to authenticate your requests. Go to the API section in the DigitalOcean control panel.
    • Generate a new Spaces access key and secret. Note these down securely.
  3. Configure Scrapy to Use DigitalOcean Spaces: Scrapy supports Amazon S3 storage, and since DigitalOcean Spaces is compatible with S3, you can use Scrapy’s S3 feed exporters. Configure your Scrapy settings.py to export data to your Space:

    https://docs.scrapy.org/en/latest/topics/feed-exports.html#storages

    FEEDS = {
        's3://your_space_name/your_folder/%(name)s/%(time)s.json': {
            'format': 'json',
            'store_empty': False,
            'encoding': 'utf8',
            'uri_params': {
                'endpoint_url': 'https://your_region.digitaloceanspaces.com',  # Replace your_region with the actual region
                'aws_access_key_id': 'your_access_key',
                'aws_secret_access_key': 'your_secret_key',
            },
        },
    }
    

    Replace placeholders (your_space_name, your_folder, etc.) with your actual Space details.

  4. Install AWS SDK Packages: Scrapy utilizes the AWS SDK to interact with S3-compatible services. Install the required packages:

    pip install boto3 botocore
    

With the settings properly configured, execute your Scrapy spider. The scraped data will be uploaded to your specified DigitalOcean Space in JSON format.

Let me know how it goes!

Best,

Bobby

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.