I figured how to do exactly what I was looking to do in terms of citing sources with .md files stored in spaces buckets (https://www.digitalocean.com/community/questions/linking-to-source-documents), it’s actually quite easy if you just request the sources/context from the API. Now I have a new challenge. I would like to move to using a .csv file because I want to add columns of metadata next to the content so that the metadata and content is all unified in a single knowledge base.
The problem is that I was relying on the filename names, bucket names and directories to construct the original source. With a single large .csv file every source ends up as /bucketname/foldername/some_big.csv
Maybe the only solution is to create an individual .csv for each and every .md file. That is an OK solution but I was just curious if that would be the best practice or if there is a better way and if there is a way to accomplish it with a single large .csv
This may be useful for other projects as well since datasets are often provided as .parquet files which are easy to dump into a large .csv file.
(it would be great if knowledge bases had support for .parquet files directly)
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!