Tutorial

Build an AI Agent to Automate Document Analysis with GenAI

Build an AI Agent to Automate Document Analysis with GenAI

Introduction

In today’s SaaS ecosystem, companies are routinely asked to complete detailed security and compliance questionnaires as part of due diligence processes during procurement, vendor onboarding, or partnership evaluations.

These questionnaires are often lengthy, repetitive, and time-sensitive—making them a bottleneck for both technical and compliance teams. This process typically consumes time ranging from a few days to a week to complete them all, depending on the complexity of the questions, the company’s security posture, and the level of documentation required, as per a research done by Vanta. The manual effort leads to delayed sales cycles, inconsistent responses across questionnaires, and pulls valuable technical resources away from core development work.

Who Needs This Solution?

This AI-powered solution is particularly valuable for:

  • SaaS companies undergoing rapid growth and frequent security reviews.
  • Compliance and security teams overwhelmed by repetitive questionnaires.
  • Sales and business development teams seeking to accelerate deal closures.
  • Startups with limited resources who can’t afford dedicated compliance staff.

Benefits of AI-Powered Questionnaire Automation

  • Time Efficiency: Reduce response time from days to hours.
  • Consistency: Ensure uniform answers across all questionnaires.
  • Resource Optimization: Free up technical staff for core development.
  • Scalability: Handle increasing questionnaire volume without adding headcount.
  • Accuracy: Leverage your actual documentation for evidence-based responses.

In this tutorial, you will build an AI-powered application that leverages Retrieval-Augmented Generation (RAG) to automatically read and understand a company’s publicly available legal, privacy, and security documents, and use them to generate accurate responses to security questionnaires. This solution can reduce response time by up to 80%, ensure consistency across all submissions, and free up technical teams to focus on product development rather than administrative tasks.

Prerequisites

Before proceeding with the demo, make sure you have the following:

Step-by-Step Guide to Building an AI-Powered Security Questionnaire App

This tutorial covers how to:

  1. Create a GenAI Agent using DigitalOcean’s platform.

  2. Configure a private endpoint for secure API access.

  3. Build a Streamlit + Python app that processes Excel files with security questions.

  4. Deploy the app on DigitalOcean’s App Platform.

Why is an AI Agent needed for Security Questionnaire Automation?

Let’s be honest—no one enjoys digging through legal and compliance docs. An AI agent is needed to intelligently understand complex security questions, retrieve relevant information from dense legal documents, and generate accurate, context-aware responses—saving time and reducing human error.

Security questionnaires often contain hundreds of questions across various domains like data protection, access controls, network security, and compliance frameworks such as:

  • GDPR (General Data Protection Regulation): The European Union’s comprehensive data privacy law that governs how organizations must handle personal data of EU citizens.
  • HIPAA (Health Insurance Portability and Accountability Act): U.S. legislation that provides data privacy and security provisions for safeguarding medical information.
  • SOC 2 (Service Organization Control 2): An auditing procedure ensuring service providers securely manage customer data based on five trust principles: security, availability, processing integrity, confidentiality, and privacy.
  • ISO 27001: An international standard for information security management systems (ISMS) that provides a systematic approach to managing sensitive company information.
  • NIST (National Institute of Standards and Technology): A framework of cybersecurity guidelines for organizations to better manage and reduce cybersecurity risk.
  • DPDP (Digital Personal Data Protection Act): India’s comprehensive data protection law that establishes guidelines for the processing of digital personal data, ensuring privacy and security of Indian citizens’ information.

Manually answering these questions typically requires:

  1. Searching through multiple policy documents
  2. Consulting with different teams (security, legal, engineering)
  3. Ensuring consistency with previous questionnaire responses
  4. Customizing answers to match the specific context of each question

By leveraging DigitalOcean’s GenAI Platform with RAG capabilities, we can automate this process by having the AI agent understand the question intent, search across your knowledge base for relevant information, and formulate professional responses that align with your company’s actual security posture and documentation. This not only accelerates response time from days to minutes but also ensures higher accuracy and consistency across all questionnaire submissions.

High Level Design of GenAI Platform

High Level Design of GenAI Platform

High Level Design of Application

Image

Hands-On Tutorial

Watch this video demonstration to see the application in action:

Automating Security Questionnaires with GenAI

Step 1 - Creating the GenAI Agent

Prepare Compliance Documents

  • Gather your public-facing legal and compliance documents (e.g., Privacy Policy, ISO policy, SOC 2 overview).
  • Format them as Markdown or plain text files for clear text extraction.
  • Upload these documents to a DigitalOcean Spaces bucket or keep them locally for the demo.

Log into the DigitalOcean GenAI Platform

Create a Knowledge Base

  • Go to the “Knowledge Base” section and click Create Knowledge Base. Name it (e.g., demo-kb).
  • Upload your documents directly or connect a Spaces bucket.
  • Select your OpenSearch vector DB (you can reuse an existing one).
  • Choose an embedding model (e.g., a small multilingual model for simple tasks).
  • Run the indexing job to convert and embed your documents into the vector DB.

Create a GenAI Agent

  • Go to the “Agents” section and click Create Agent.
  • Name your agent (e.g., demo-agent).
  • Select a foundation model (e.g., LLaMA 3 8B).
  • Attach the knowledge base you created.
  • Add a custom system prompt to format output as JSON (useful for integration).
  • Save and deploy the agent.

Step 2 - Configuring the Private Endpoint

Enable Private API Access

  • In agent dashboard, under Endpoint Access Keys, click on “Manage Endpoint Access Keys” and generate a new key.
  • Save this access key securely in your app’s environment.

Get the Endpoint URL

  • Copy the API endpoint from the agent’s dashboard.
  • You’ll use this in your Python code to send requests.

Step 3 - Building the Streamlit + Python App

Login to your Github account and create a repository with these files. You will need this Github repository to deoply your Application to App Platform in the upcoming steps.

Application Structure

project/
├── app.py              # Main Streamlit app
├── chatbot.py          # Backend logic for API calls
├── requirements.txt    # Python dependencies
├── Dockerfile          # For deployment

Create a file chatbot.py

This python file will:

  • Load GENAI_ENDPOINT and GENAI_API_KEY from environment.
  • Sends a POST request with the question to the agent API.
  • Loads the response JSON.
import os
import requests
from dotenv import load_dotenv

load_dotenv()

AGENT_ENDPOINT = os.getenv("AGENT_ENDPOINT") + "/api/v1/chat/completions"

AGENT_ACCESS_KEY = os.getenv("AGENT_ACCESS_KEY")


def ask_question(question):
    # Append the user's question to the base prompt
    prompt = base_prompt + "\nQuestion: " + question
    payload = {
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ]
    }

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {AGENT_ACCESS_KEY}"
    }
    response = requests.post(AGENT_ENDPOINT, json=payload, headers=headers)
    return response.json()

Create another file app.py

This python file will:

  • Let users upload an Excel file.
  • Iterates over each row and sends the question to the chatbot.
  • Adds the AI-generated answer and reason to the DataFrame.
  • Offers a “Download CSV” button with updated responses.
import pandas as pd
import json

def process_security_questions(uploaded_file):
    """
    Process security questions from an Excel file and get answers using the chatbot.
    
    Args:
        uploaded_file: The uploaded Excel file
    
    Returns:
        DataFrame with questions and answers
    """
    try:
        df = pd.read_excel(uploaded_file)
        
        # Find the questions column
        question_col_index = None
        for i, col in enumerate(df.columns):
            if 'question' in str(col).lower():
                question_col_index = i
                break
        
        if question_col_index is None:
            st.error("Could not find a column containing 'question' in its name")
            return None
        
        answers = []
        progress_bar = st.progress(0)
        num_rows = len(df)
        
        for i in range(num_rows):
            question = str(df.iloc[i, question_col_index])
            st.write(f"Processing question {i+1}: {question}")
            response = ask_question(question)
            try:
                # Parse the JSON response from the agent
                content = response["choices"][0]["message"]["content"]
                print("content", content)
                answer_data = json.loads(content)
            except Exception as e:
                answer_data = {
                    "answer": "Not Sure",
                    "reasoning": "Failed to get a proper response",
                }
            answers.append(answer_data)
            progress_bar.progress((i + 1) / num_rows)
            time.sleep(1)  # Delay to avoid overwhelming the endpoint
        
        # Add the results to the DataFrame as new columns
        df["Answer"] = [a.get("answer", "") for a in answers]
        df["Reasoning"] = [a.get("reasoning", "") for a in answers]
        
        return df
    except Exception as e:
        st.error(f"Error processing file: {str(e)}")
        return None

Create this Dockerfile

FROM python:3.11-slim-buster

WORKDIR /app

COPY . ./app
COPY requirements.txt ./

RUN pip3 install -U pip && pip3 install -r requirements.txt

CMD ["streamlit", "run", "app/app.py"]

Step 4 - Deploying the App on DigitalOcean’s App Platform

Below are the steps to deploy your application to the App Platform:

  • Connect your GitHub repoitory.
  • Point to your branch and Dockerfile.
  • Set the build port (e.g., 8051) and environment variables (AGENT_ENDPOINT, AGENT_ACCESS_KEY).
  • Choose instance size and region.
  • Deploy and test your live app!

Testing the Application

  1. Go to the deployed app URL.
  2. Upload an Excel file with security questions.
  3. Click “Process Questions” to get AI-generated answers.
  4. Review the answers and reasoning.
  5. Download the updated CSV with answers.

Questionaire

Review the answers from the AI Agent

FAQs

1. What types of security questionnaires work best with this solution?

This solution works best with standardized security questionnaires in Excel format, such as those based on SOC 2, ISO 27001, or GDPR frameworks. Custom questionnaires also work well as long as they’re structured in a tabular format. For more information on compliance frameworks, check out DigitalOcean’s Trust Platform.

2. How accurate are the AI-generated answers?

The accuracy depends on the quality of your knowledge base. With well-curated security documentation, accuracy rates typically exceed 85%. Always review AI-generated answers before sending them to clients. You can improve accuracy by following best practices in DigitalOcean’s RAG tutorial.

3. Can I customize the AI responses to match my company’s tone?

Yes, you can adjust the prompt templates in the AI agent configuration to match your company’s communication style and terminology. This customization allows you to maintain brand consistency across all questionnaire responses.

To customize the tone:

  • Consider adding company-specific terminology to your knowledge base.GenAI Platform supports the .txt, .html, .md, .pdf, .doc, .json, and .csv formats.
  • Add Guardrails to the Security Questionnaire Automation agent.
  • Add specific instructions about formality level, technical depth, and company voice.
  • Include examples of preferred phrasing for common security concepts.

The AI will then generate responses that sound more authentic to your organization’s communication style. Learn more about effective prompt engineering in DigitalOcean’s GenAI Platform documentation.

4. How do I handle questions that require attachments or evidence?

The current solution focuses on text-based answers. For questions requiring evidence, the AI can suggest appropriate documents to attach, but you’ll need to manually include them. Consider integrating with DigitalOcean Spaces for document storage and retrieval.

5. Is my data secure when using this solution?

Yes, when deployed on DigitalOcean’s GenAI Platform, your data remains private. The knowledge base is isolated to your account, and all processing happens within your environment. For additional security, consider implementing DigitalOcean VPC networks and Cloud firewalls.

6. How much time does this solution typically save?

Most users report 70-90% time savings compared to manual questionnaire completion. A questionnaire that might take 8-10 hours to complete manually can be processed in under an hour with this solution.

7. Can I integrate this with other business systems?

Yes, this solution can be integrated with CRM systems, ticketing platforms, or document management systems using APIs. Check out DigitalOcean’s API documentation for integration options or consider using DigitalOcean Functions for serverless integration.

8. What resources do I need to run this solution effectively?

For optimal performance, we recommend deploying on DigitalOcean App Platform with at least a Basic plan. The GenAI component works best with DigitalOcean’s GenAI Platform, which provides the necessary infrastructure for AI processing and knowledge base management.

Conclusion

Manually filling out security questionnaires is a pain—especially when you’re doing it repeatedly for different customers. With a mix of GenAI, RAG, OpenSearch, and a bit of Python glue code, we automated the whole thing.

This project not only saves hours of effort per submission, but also ensures consistent, compliant responses that reflect your company’s actual security posture. Based on user feedback, teams typically reduce questionnaire completion time by 70-90%, turning days of work into hours.

The solution leverages DigitalOcean’s GenAI Platform to create an intelligent system that:

  • Understands security compliance questions in context
  • Retrieves relevant information from your knowledge base
  • Generates accurate, tailored responses with supporting rationale
  • Maintains consistency across all questionnaire submissions

By splitting it into modular subsystems (AI Agent + Private APIs + Python App), the solution remains scalable, customizable, and easy to plug into any workflow. You can extend it to handle different questionnaire formats, integrate with your CRM, or connect to document management systems.

More importantly, it shows how DigitalOcean’s GenAI platform can be used to build real-world, powerful AI applications—without wrangling complex infrastructure or reinventing the wheel.

The combination of App Platform for deployment and GenAI Platform for intelligence creates a production-ready solution that grows with your business needs.

Ready to automate your security questionnaire process? Deploy this solution today and reclaim valuable time for your security and compliance teams.

References

You can find the code to this application here.

Continue building with DigitalOcean Gen AI Platform.

About the author(s)

Meher Vamsi
Meher VamsiStaff Solutions - AI/ML
See author profile
Category:
Tutorial

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment
Leave a comment...

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.