Files

Learn about the various file formats Hyper supports for syncing text, image, and audio data with your vector database.


Supported File Formats for Vector Sync

Hyper's vector database can incorporate data from various file formats, each optimized for specific types of content:

Text Files

Text files are processed for their textual content, suitable for NLP tasks and searchable vector embeddings:

  • JSON (application/json): Structured data ideal for complex embeddings.
  • PDF (application/pdf): Includes metadata and supports text coordinate extraction.
  • CSV (text/csv): Easily converted into vector representations for each row.
  • TXT (text/plain): Pure text for straightforward vectorization.
  • MD (text/markdown): Markdown files processed as text.
  • RTF (application/rtf): Rich Text Format converted to plain text.
  • TSV (text/tab-separated-values): Similar to CSV for structured data representation.
  • DOCX (application/vnd.openxmlformats-officedocument.wordprocessingml.document): Text content extraction for document embeddings.
  • XLSX (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet): Parses sheets into structured data.
  • PPTX (application/vnd.openxmlformats-officedocument.presentationml.presentation): Presentation text useful for creating searchable content.

Image Files

Images are analyzed for content and text, enabling visual search and classification:

  • JPG (image/jpeg)
  • PNG (image/png)

Audio Files

Audio files are transcribed to text, allowing for searchable audio content within the vector database:

  • MP3 (audio/mpeg)
  • MP4 (video/mp4)
  • WAV (audio/wav)

Uploading Files to the Vector Database

Upload files to Hyper to automatically process and store their data as vectors in the database, enhancing searchability and analysis.

How to Upload

Use the /v1/files endpoint for uploading files. Specify the Content-Type as multipart/form-data and include the file along with its file_type:

curl --request POST \
  --url https://api.gethyper.ai/v1/files \
  --header 'Authorization: Bearer YOUR_HYPER_API_KEY' \
  --header 'Content-Type: multipart/form-data' \
  --form 'file=@"/path/to/your/file.jpg"' \
  --form 'type="image/jpeg"'

File Type Specification

It's crucial to specify the file_type when uploading, as this informs Hyper how to process and integrate the file into the vector database:

{
  "type": "image/jpeg"
}

Note: Maximum file sizes are 20 MB for text and image files, and larger files can be accommodated upon request. Audio and video files are transcribed to text, with both raw and processed data stored for efficient retrieval.