HYVE Docs
DumpFeatures

Dump automatically detects the source type from the input and routes it to the appropriate extractor. Detection is based on URL pattern matching.

Detection Priority

Source detection follows a priority order — the first matching pattern wins:

PriorityPatternSource Type
1twitter.com, x.comtwitter
2youtube.com, youtu.beyoutube
3instagram.cominstagram
4linkedin.comlinkedin
5reddit.com, redd.itreddit
6.pdf in URLpdf
7Any other http:// or https://article
8Image extension (.png, .jpg, .webp)image
9Plain text (no URL)text

Extractor Details

Twitter/X

Extracts tweet content, author, and media. On Vercel (serverless), uses VPS proxy with Bird CLI. On local, uses direct extraction.

Extracts: Tweet text, author, author URL, media URLs, thread content.

YouTube

Extracts video transcript using the youtube-transcript library.

Extracts: Video title, transcript text, channel info.

Instagram

Uses oEmbed API + Open Graph meta tags for extraction.

Extracts: Post caption, author, media URLs, thumbnail.

Instagram extraction uses oEmbed + OG tags because VPS-based extraction returned empty results.

LinkedIn

Parses Open Graph meta tags with a Googlebot user agent to access public post data.

Extracts: Post content, author name, author URL.

LinkedIn extraction uses Googlebot UA because VPS-based extraction returned empty results.

Reddit

Uses Reddit's JSON API directly (appending .json to the URL).

Extracts: Post title, body text, author, subreddit, comments.

PDF

Parses PDF documents using the pdf-parse library.

Extracts: Full text content, metadata (title, author, page count).

Article

Generic web page extractor for any URL not matching other patterns. Extracts main content using readability algorithms.

Extracts: Title, main content, author, description, Open Graph metadata.

Image

Processes image files via VPS proxy. Supports PNG, JPEG, GIF, WebP, SVG, BMP, ICO.

Extracts: Image metadata, OCR text (when available).

Text

Stores plain text input directly without extraction.

Extracts: The input text as-is, with no transformation.

On this page