By lenartgolob
I have a web scrapper written in python with the library selenium that works on my local machine. When I push the data to my droplet, I cannot run that app. This is one of my web scrapping methods:
def defense_dash_lt10(pbp_stats, season):
# Less than 10 foot
url = 'https://www.nba.com/stats/players/defense-dash-lt10?Season=' + season
options = Options()
options.add_argument('--no-sandbox')
driver = webdriver.Chrome(service=ChromeService(
ChromeDriverManager().install()), options=options)
driver.get(url)
selects = driver.find_elements(By.CLASS_NAME, "DropDown_select__4pIg9 ")
for select in selects:
options = Select(select).options
for option in options:
if option.text == 'All':
option.click() # select() in earlier versions of webdriver
break
# Find the table element
table = driver.find_element(By.CLASS_NAME, 'Crom_table__p1iZz')
# Find all rows in the table
rows = table.find_elements(By.TAG_NAME, 'tr')
defense_dash_lt10 = []
# Loop through each row and extract the data from each cell
for row in rows:
player_dd_lt10 = []
# Find all cells in the row
cells = row.find_elements(By.TAG_NAME, 'td')
for cell in cells:
player_dd_lt10.append(cell.text)
# Add pbp stats to defense dash if not empty
if player_dd_lt10:
if player_dd_lt10[0] in pbp_stats:
defense_dash_lt10.append(player_dd_lt10 + pbp_stats[player_dd_lt10[0]][-2:])
else:
defense_dash_lt10.append(player_dd_lt10 + ['NaN', 'NaN'])
header = ['Player', 'Team', 'Age', 'Position', 'GP', 'Games', 'FREQ%', 'DFGM', 'DFGA', 'DFG%', 'FG%', 'DIFF%', "MP", "BLKR"]
return header, defense_dash_lt10
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Hi there!
Running a Selenium-based web scraper on a DigitalOcean Droplet involves several considerations that differ from running the script on your local machine. You will have to set up a headless browser environment, managing web driver installations, and ensuring your script can run in a non-GUI server environment.
Here’s how you could do that:
Ensure your Droplet is up to date and has Python installed. You’ll also need to install Selenium and a web driver manager, such as webdriver-manager, which simplifies the management of binary drivers for different web browsers.
You can install these using pip. If you haven’t installed pip, you can install it using your package manager (e.g., apt for Ubuntu/Debian).
sudo apt update
sudo apt install python3-pip
pip3 install selenium webdriver-manager
For headless operation, you can use Chrome or Firefox. This example uses Chrome, but the process is similar for Firefox.
Install Google Chrome:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
Install ChromeDriver:
The webdriver-manager package you installed earlier will handle the ChromeDriver installation in your Python script, so you don’t need to manually install ChromeDriver.
To run your browser in headless mode (without a GUI), you need to modify your Selenium script to specify headless options.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless') # Runs Chrome in headless mode.
options.add_argument('--no-sandbox') # Bypass OS security model
options.add_argument('--disable-dev-shm-usage') # Overcome limited resource problems
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=options)
Now, you should be able to run your script on the server just like you would on your local machine. Ensure you’re using the correct Python command (python3 or python) based on your server’s configuration.
python3 your_script_name.py
If you encounter any issues, reviewing the error messages and logs can provide insights into what might be going wrong!
Hope that this helps.
Best,
Bobby
Heya,
To run a Selenium-based web scraper on a server, such as a DigitalOcean droplet, you need to configure it to work in a headless environment. Servers typically don’t have a GUI, so you can’t run browsers in the regular, graphical mode. Here’s how to modify your existing Selenium setup to work on a server:
Ensure Python is installed on the server.
Install the necessary drivers and browser. For Chrome, you’ll need ChromeDriver and the Chrome browser itself. You can install them using your server’s package manager. For example, in Ubuntu:
sudo apt-get update
sudo apt-get install -y unzip xvfb libxi6 libgconf-2-4
sudo apt-get install default-jdk
sudo apt-get install -y google-chrome-stable
wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
sudo chown root:root /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver
pip install selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
driver = webdriver.Chrome(options=options)
This will initialize Chrome in headless mode, allowing it to run without a GUI.
Remember, running a web scraper on a server is essentially the same as running it locally, with the key difference being the headless setup and ensuring all dependencies are correctly installed on the server.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.