Automatic scrapy deployment with GitHub actions
1 min read

Automatic scrapy deployment with GitHub actions

Repo here

Scrapy is a nice framework for web scraping. But like all local development processes, some settings / configs are disabled.

This wouldn't pose an issue, but to deploy a scrapy project to zyte (a hosted scrapy platform) you need to run shub deploy, and if you run it and forget to reset the config back to prod settings, a Titan may devour your home.

You can set auto deployment from github via the UI in zyte, but it only works with github only. Plus if you want to run some extra tests during CI/CD you're out of luck. So here's how to set up CI/CD to deploy automatically:

Note: I would assume that you have your scrapy project set up already.

Create scrapinghub.yml + add repo secrets

project: ${PROJECT_ID}

  file: requirements.txt

apikey: null

Notice that apikey is left blank. This is because it's considered a good practice to not checkin sensitive information & credentials in version control. Instead apikey will be added to github secrets, so it can be called as environment variable.

Create github workflow file

name: Deploy

    branches: [ master, main ]
    runs-on: ubuntu-latest
    - uses: actions/checkout@v2
    - name: Set up Python 3.9
      uses: actions/setup-python@v2
        python-version: 3.9
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install pyyaml shub
    - name: Deploy to zyte
      if: github.ref == 'refs/heads/master'
      run: python3 utils/ && shub deploy
        APIKEY: ${{ secrets.APIKEY }}


  • On push to this repo (this doesn't work for PRs)
  • Download this repo
  • Setup python3.9
  • Install some pip modules
  • Run a script to overwrite scrapinghub.yml's apikey value, in which the value is obtained from github secrets
  • Execute deploy command