URLs & Sitemaps

Learn how Hyper ingests URLs and Sitemaps and keeps data synchronized.

Hyper extends its data ingestion capabilities beyond files to include URLs and sitemaps. This feature enables the platform to index and vectorize content directly from the web, allowing users to perform sophisticated queries on live web data.

Ingesting URLs

To ingest content from individual URLs, Hyper provides an easy-to-use API that can fetch and process web pages. This functionality supports a variety of content types, including HTML pages and web-accessible PDFs.

How to Ingest a URL

  1. Submit a URL: Use the Hyper API endpoint to submit a URL for ingestion.
  2. Data Processing: Hyper will retrieve the content from the URL and process it to extract text, images, or any other supported data type for vectorization.
  3. Indexing: The extracted content is then indexed and stored in the vector database, ready for querying.

Ingesting Sitemaps

For more extensive web content ingestion, Hyper can process entire sitemaps. This allows for bulk indexing of web pages, ensuring comprehensive coverage of a website's content.

How to Ingest a Sitemap

  1. Submit a Sitemap: Provide the sitemap URL to Hyper through the designated API endpoint.
  2. Automated Crawling: Hyper will crawl the sitemap, processing each listed URL according to the sitemap's structure and metadata.
  3. Continuous Sync: Optionally, set up a recurring ingestion schedule to keep the indexed data up to date with the live content.

Synchronization Schedule

Hyper can be configured to perform automatic synchronization of ingested URLs and sitemaps on a 24-hour schedule. This ensures that the vector database reflects the most recent version of the web content, allowing for accurate and current search results.

Setting Up Sync

  1. Enable Sync: During the ingestion process, specify that you want to enable automatic synchronization.
  2. Schedule Frequency: By default, synchronization occurs every 24 hours, but this can be adjusted based on your needs.
  3. Monitor and Maintain: Hyper will handle the sync process, but you can monitor the ingestion logs and make adjustments as necessary.

Use Cases

  • Competitive Analysis: Stay updated on competitors' website changes and product offerings.
  • Content Aggregation: Build a content aggregator that stays current with the latest web articles and posts.
  • Market Research: Keep a pulse on market trends by analyzing regularly updated industry web pages.

Getting Started

To begin using URL and sitemap ingestion with Hyper, review the API documentation for detailed instructions on submitting URLs and sitemaps. If you need to adjust the synchronization schedule or encounter any issues, please reach out to Hyper support for assistance.

Note: When ingesting web content, be mindful of website terms of service and copyright laws. Ensure that you have the right to access and index the content you plan to ingest.