Example input
[ programming language or library ]: Python (BeautifulSoup and Requests)
[ target website or data source ]: example.com
[ specify data points, e.g., titles, prices, descriptions ]: product titles and prices
Example output
Sure! Below is an example of a web scraping application using Python with BeautifulSoup and Requests to scrape product titles and prices from "example.com". For demonstration purposes, I'll assume "example.com" has a fictitious structure that contains product titles and prices within specific HTML elements.
You need to have the `requests` and `beautifulsoup4` libraries installed. You can install them using pip if you haven't already:
```bash
pip install requests beautifulsoup4
```
Here's the source code for the web scraping application:
```python
import requests
from bs4 import BeautifulSoup
import json
import csv
# Function to fetch HTML content from a URL
def fetch_html(url):
try:
response = requests.get(url)
response.raise_for_status() # Raises an HTTPError for bad responses
return response.text
except requests.RequestException as e:
print(f"Error fetching {url}: {e}")
return None
# Function to parse HTML and extract product titles and prices
def parse_products(html):
soup = BeautifulSoup(html, 'html.parser')
products = []
# Assuming product titles are within <h2 class="product-title"> and prices are within <span class="product-price">
for product in soup.find_all('div', class_='product'):
title = product.find('h2', class_='product-title')
price = product.find('span', class_='product-price')
if title and price:
products.append({
'title': title.get_text(strip=True),
'price': price.get_text(strip=True)
})
return products
# Function to save extracted data to JSON format
def save_to_json(data, filename):
with open(filename, 'w', encoding='utf-8') as json_file:
json.dump(data, json_file, ensure_ascii=False, indent=4)
print(f"Data saved to {filename}")
# Function to save extracted data to CSV format
def save_to_csv(data, filename):
with open(filename, 'w', newline='', encoding='utf-8') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames=['title', 'price'])
writer.writeheader()
writer.writerows(data)
print(f"Data saved to {filename}")
# Main function to orchestrate the scraping
def main():
url = 'http://example.com/products' # Change to the actual URL you want to scrape
html = fetch_html(url)
if html:
products = parse_products(html)
# Save data in both JSON and CSV formats
save_to_json(products, 'products.json')
save_to_csv(products, 'products.csv')
# Run the scraper
if __name__ == '__main__':
main()
```
### Explanation of the Code:
1. **Fetching HTML**:
- The `fetch_html` function makes an HTTP GET request to the specified URL and handles potential errors (like network issues or invalid responses) using a try-except block. If the request is successful, it returns the HTML content; otherwise, it prints an error message.
2. **Parsing HTML**:
- The `parse_products` function uses BeautifulSoup to parse the HTML content. It looks for specific elements (in this case, product titles and prices) within the HTML structure. It assumes that each product is contained within a `div` with the class `product`, and the title and price are found in specific child elements.
3. **Saving Data**:
- The `save_to_json` and `save_to_csv` functions handle saving the scraped data to JSON and CSV formats, respectively.
4. **Main Function**:
- The `main` function coordinates the overall process: it fetches the HTML, parses it to extract product data, and then saves that data to the specified file formats.
### Important Note:
- Since this example uses "example.com" as a placeholder, you need to adjust the URL and the HTML structure according to the actual website you intend to scrape. Always ensure you have permission to scrape a website by checking its `robots.txt` file and the website's terms of service.