Example input
[Type of Program]: web scraper, [Programming Language]: Python, [Specific Task]: extracting product details from an e-commerce website, [Error Handling, Performance Optimization, User Interaction]: Error Handling, Performance Optimization, User Interaction
Example output
This script will scrape product name and price details from a hypothetical e-commerce website. The code is documented and considers error handling, performance optimization, user interaction and future expansion.
Please note that web scraping should be done ethically and legally. Always check the website's robots.txt file (e.g., http://website.com/robots.txt) and terms of service to ensure that your scraping activity is permitted. For actual e-commerce websites, the structure will likely be more complex and may require more advanced techniques to deal with JavaScript rendering, pagination, and so on.
import requests
from bs4 import BeautifulSoup
import time
class Scraper:
def __init__(self, url):
self.url = url
self.headers = {'User-Agent': 'Mozilla/5.0'}
self.products = []
def fetch_data(self):
try:
response = requests.get(self.url, headers=self.headers)
# Check if the request was successful
response.raise_for_status()
except requests.HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')
except Exception as err:
print(f'Other error occurred: {err}')
else:
return response.text
def parse_data(self, html):
soup = BeautifulSoup(html, 'html.parser')
# Extract product details based on HTML structure
for product in soup.find_all('div', class_='product'):
name = product.find('h2', class_='product-name').text
price = product.find('span', class_='product-price').text
self.products.append({
'product_name': name,
'product_price': price
})
def run_scraper(self):
print("Starting the scraper...")
start_time = time.time()
html = self.fetch_data()
if html:
self.parse_data(html)
print(f"Scraping completed in {time.time() - start_time} seconds.")
return self.products
# User interaction
url = input("Enter the URL of the e-commerce website: ")
scraper = Scraper(url)
products = scraper.run_scraper()
for product in products:
print(product)
Some points about the code:
The Scraper class is designed in such a way that it can be extended or modified for other websites by simply changing the parse_data method.
Error handling is done using try-except blocks, which is a Pythonic way to handle errors. It ensures the program doesn't break if the server doesn't respond, or if there's any other issue.
Performance is measured using the time library.
The user is asked to input the URL, demonstrating user interaction.
The fetch_data and parse_data methods are designed to be reusable and maintainable. They can easily be modified to suit different scraping needs.
For more complex or large-scale scraping tasks, more advanced tools like Scrapy or Selenium might be more suitable. Additionally, handling more specific error types, adding logging, managing sessions/cookies, dealing with captchas, and respecting rate limits are important considerations for a robust, real-world web scraper.