Prompt Details
Model
(claude-4-7-opus)
Token size
933
Example input
[TASK_DESCRIPTION]: Extract all text from every PDF in a folder and save each as a .txt file
[OUTPUT_FORMAT]: TXT
[FILE_COUNT]: 50
Example output
Task: Extract all text from every PDF in a folder and save each as a .txt file
Desired output format: TXT
Number of files to process: 50
A. Approach: pdfplumber is the best choice here — it handles multi-column layouts
and preserves whitespace better than PyPDF2. Each PDF maps to one .txt output file.
B. Install:
pip install pdfplumber
C. Script:
# pip install pdfplumber
import pdfplumber
import logging
from pathlib import Path
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
INPUT_DIR = Path("pdfs")
OUTPUT_DIR = Path("output_texts")
def extract_text(pdf_path: Path) -> str:
with pdfplumber.open(pdf_path) as pdf:
return "\n".join(page.extract_text() or "" for page in pdf.pages)
def main() -> None:
OUTPUT_DIR.mkdir(exist_ok=True)
files = list(INPUT_DIR.glob("*.pdf"))
success, failed = 0, 0
for pdf_path in files:
try:
text = extract_text(pdf_path)
out = OUTPUT_DIR / pdf_path.with_suffix(".txt").name
out.write_text(text, encoding="utf-8")
logging.info(f"Saved: {out.name}")
success += 1
except Exception as e:
logging.error(f"Failed {pdf_path.name}: {e}")
failed += 1
logging.info(f"Done — {success} ok, {failed} failed.")
if __name__ == "__main__":
main()
D. Usage:
python extract_text.py
E. Edge cases:
- Scanned PDFs return empty strings — use pytesseract for OCR fallback.
- Password-protected files raise an exception — caught and logged, script continues.
- Non-UTF-8 characters are handled via utf-8 encoding with implicit error replacement.
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-7-OPUS
Turn Claude into a Python document automation expert. Get complete, runnable scripts for PDF extraction, generation, merging, OCR, and Word/Excel automation — every time. Covers pdfplumber, ReportLab, python-docx & more. Full error handling, logging, and type hints included. Works with Python 3.10+.
...more
Added 5 days ago
