crawl-urls

A container for elements that are enqueued to the crawler. The crawler will not reply to the caller until all of the contained nodes are processed.

Attributes

  • synchronization (Any of: none, enqueued, to-be-indexed, indexed, indexed-no-sync) - If present, a default synchronization matching this value will be applied to all enqueued elements that do not specify their own synchronization. All synchronizations other than none will cause the enqueue to be committed to secondary storage before a synchronous reply is issued.
    • none: immediately after receiving the enqueue.
    • enqueued: after all the child nodes are found to satisfy the crawl conditions and will be processed
    • to-be-indexed: immediately before the child nodes are sent to the indexer
    • indexed: after the child nodes have been recorded by the indexer, forcing the indexer to do additional work to reply in the most punctual manner.
    • indexed-no-sync: after the child nodes have been recorded by the indexer, but does not force the indexer to do additional work.

Children

  • Use these in the listed order. The sequence may repeat 1 to unbounded times.
    • crawl-url: (At least 1) - A node that encapsulates all crawler state for a particular URL.
    • crawl-delete: (At least 1) - A node used to remove a URL or set of URLs from the index.
    • index-atomic: (At least 1) - Container for crawl-url and crawl-delete elements that should be indexed together in an atomic operation.