PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-7-opus)
Token size
933
Example input
[TASK_DESCRIPTION]: Extract all text from every PDF in a folder and save each as a .txt file [OUTPUT_FORMAT]: TXT [FILE_COUNT]: 50
Example output
Task: Extract all text from every PDF in a folder and save each as a .txt file Desired output format: TXT Number of files to process: 50 A. Approach: pdfplumber is the best choice here — it handles multi-column layouts and preserves whitespace better than PyPDF2. Each PDF maps to one .txt output file. B. Install: pip install pdfplumber C. Script: # pip install pdfplumber import pdfplumber import logging from pathlib import Path logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s") INPUT_DIR = Path("pdfs") OUTPUT_DIR = Path("output_texts") def extract_text(pdf_path: Path) -> str: with pdfplumber.open(pdf_path) as pdf: return "\n".join(page.extract_text() or "" for page in pdf.pages) def main() -> None: OUTPUT_DIR.mkdir(exist_ok=True) files = list(INPUT_DIR.glob("*.pdf")) success, failed = 0, 0 for pdf_path in files: try: text = extract_text(pdf_path) out = OUTPUT_DIR / pdf_path.with_suffix(".txt").name out.write_text(text, encoding="utf-8") logging.info(f"Saved: {out.name}") success += 1 except Exception as e: logging.error(f"Failed {pdf_path.name}: {e}") failed += 1 logging.info(f"Done — {success} ok, {failed} failed.") if __name__ == "__main__": main() D. Usage: python extract_text.py E. Edge cases: - Scanned PDFs return empty strings — use pytesseract for OCR fallback. - Password-protected files raise an exception — caught and logged, script continues. - Non-UTF-8 characters are handled via utf-8 encoding with implicit error replacement.
🌀 Claude

Python Pdfdocument Automation Engineer

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-7-OPUS
Tested icon
Guide icon
4 examples icon
Free credits icon
Turn Claude into a Python document automation expert. Get complete, runnable scripts for PDF extraction, generation, merging, OCR, and Word/Excel automation — every time. Covers pdfplumber, ReportLab, python-docx & more. Full error handling, logging, and type hints included. Works with Python 3.10+.
...more
Added 5 days ago
Report
Browse Marketplace