By setec
I figured how to do exactly what I was looking to do in terms of citing sources with .md files stored in spaces buckets (https://www.digitalocean.com/community/questions/linking-to-source-documents), it’s actually quite easy if you just request the sources/context from the API. Now I have a new challenge. I would like to move to using a .csv file because I want to add columns of metadata next to the content so that the metadata and content is all unified in a single knowledge base.
The problem is that I was relying on the filename names, bucket names and directories to construct the original source. With a single large .csv file every source ends up as /bucketname/foldername/some_big.csv
Maybe the only solution is to create an individual .csv for each and every .md file. That is an OK solution but I was just curious if that would be the best practice or if there is a better way and if there is a way to accomplish it with a single large .csv
This may be useful for other projects as well since datasets are often provided as .parquet files which are easy to dump into a large .csv file.
(it would be great if knowledge bases had support for .parquet files directly)
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Hi there,
Really great to see you pushing the GenAI platform this way, you’re getting into the kind of real-world use cases that can really help shape future improvements!
From what I understand, with a single large .csv
, it’’s tricky to track individual sources properly since everything points back to the same file. Splitting into multiple .csv
files (one per logical source) might be the more reliable approach right now, similar to how multiple .md
files work. But I’m not 100% sure if that’s the only way, it might be worth checking directly with DigitalOcean Support.
Also, full support for .parquet
files or more flexible metadata would definitely be a great improvement.
I’d really encourage you to send this feedback to DigitalOcean Support, you’re raising exactly the kinds of points that could help improve the product and documentation over time.
- Bobby
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.