FAST Crawler restart, db timeout and refresh period

We have some people looking at the logs now. There can be several reasons why the is missing. Hopefully the logs will provide us with some answers. The hbdWebWeekly collection is configured to start from scratch, with db interval of 3 and refresh cycle of 10 days. From scratch means that each time the crawler is started the work queues are truncated. The db interval will determine how many times the crawler is restarted or refresh cycles elapsed before the URL is deleted (if it is not found by the crawler). This means that if the crawler is stopped and restarted often and the crawler has not gone trough a refresh cycle after 3 restarts some URLs will be deleted. It is therefore recommended that a complete refresh cycle is completed within the db interval. In this case the crawler should be running for 10 days at least for each 3 restart.


Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden /  Ändern )

Google Foto

Du kommentierst mit Deinem Google-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s

Erstelle eine kostenlose Website oder Blog – auf

Nach oben ↑

%d Bloggern gefällt das: