I used Amazon's Kendra to build a semantic search engine. I took 2 datasets for this task - csv & excel. The csv dataset is mostly text. The excel dataset is mostly numbers.
The data to be used is first connected to a data source which is linked to a Kendra index. The Kendra index then performs Crawling and Indexing upon syncing the data source.
What I have tried:
When I sync the data source with text-based csv files, the crawling & indexing takes place within 5 minutes, but it takes indefinite time to do the same on the excel file. The excel file took 2 hours to crawl & the indexing has been happening since the past 7 hours. I converted the excel file to csv & tried but the issue persists. The datasets contain not more than 20 rows, so size of the data doesn't seem to be the issue.
The disclaimer on the website says:
Amazon Kendra is syncing the following data source: 'dq-rule-fail'. It can take from a few minutes to a few hours. Syncing is a two-step process. First documents are crawled to determine the ones to index. Then the selected documents are indexed. Sync speeds are limited by factors such as remote repository throughput and throttling, network bandwidth, and the size of documents.
The factors specified in the disclaimer are not problematic in my case.
My doubt: Is Kendra not designed to index numerical data? How do I make sure my data is properly synced?