Privvert - private browser-based file toolsPrivvert

PDF → Texto / Markdown

Extrae texto plano

Drop a PDF here
o haz clic para examinar - los archivos permanecen en tu dispositivo
Tamaño máximo: 100 MB

Sobre esta herramienta

Extract plain text or Markdown from a PDF. Reading order is reconstructed by sorting text items by their position on each page, so the output reads like the document - not like a random shuffle of words.

Use it to feed PDFs into ChatGPT, build a searchable archive, copy content into another document, or quickly scan the contents of a long report. Privvert runs the extraction locally with pdf.js - confidential PDFs never leave your machine.

Características

  • Plain text or Markdown output
  • Preserves natural reading order
  • Per-page text or single concatenated document
  • Works on any text-based PDF
  • Detects bullets, headings (heuristic) for Markdown
  • Browser-only - files never uploaded
  • Free and unlimited
  • Optionally preserves paragraph breaks based on PDF text layout

Cómo usarla

  1. Drop in your PDF.
  2. Pick plain text or Markdown.
  3. Click Extract.
  4. Download the .txt or .md file.
🔒 100% privado

Todo sucede dentro de tu navegador usando JavaScript y WebAssembly. Tus archivos nunca se suben a un servidor, nunca se almacenan y nosotros nunca los vemos.

Preguntas frecuentes

Will it work on scanned PDFs?

No - scanned pages are images of text, not actual text. Run the OCR tool first to produce a text layer, then extract.

How accurate is the reading order?

Very accurate for normal single-column documents. Multi-column layouts (newspapers, academic papers) sometimes interleave columns; manual cleanup is occasionally needed.

Will Markdown headings be correctly detected?

The Markdown converter uses font-size heuristics to guess heading levels. It's good for most documents but not perfect - review and adjust as needed.

Can I extract just one page?

Yes - switch to per-page mode and copy only the page you want.

Why is the extracted text jumbled?

PDF stores text in absolute positions on the page, not as a linear stream. Text in columns, footnotes or text boxes can come out in an unexpected order. The tool tries to reconstruct natural reading order but multi-column layouts are a known hard case.

Does it OCR scanned PDFs?

No - it extracts text that's already inside the PDF. Pure image scans return nothing. Run them through the OCR tool first to get real text underneath the images.