How We Built an AI Content Engine for Crohn's and Colitis Canada

Feb 22
6 min read

Crohn's and Colitis Canada (CCC) is the country's largest volunteer-based charity focused on finding cures for Crohn's disease and ulcerative colitis. They support over 300,000 Canadians living with inflammatory bowel disease (IBD) through research funding, advocacy, and patient education -- including the MyGUT mobile app and their Connect digital platform.

Their challenge: years of high-quality patient education content spread across hundreds of web pages, 800+ YouTube videos, research papers, and internal resources -- but no scalable way to turn that knowledge into new articles for their platforms. Every piece of content required manual research, manual writing, and manual review. Their small content team couldn't keep up with the volume their patient community needed.

We built them an AI-powered content generation system that ingests their entire knowledge base and produces medically-grounded, on-brand articles on demand -- directly from an Airtable dashboard their team already knows how to use.

What Problem CCC Was Facing?

CCC's digital health team runs the MyGUT app and Connect platform, both of which rely on a steady stream of educational content for patients and caregivers. Topics range from managing flares and nutrition during remission to mental health, insurance navigation, and fertility considerations.

The content had to be accurate. It had to reflect CCC's organizational voice. And it had to be grounded in their existing peer-reviewed and medically-approved resources -- not generic AI output pulled from the open internet.

Their team was spending significant hours per article: researching across their own scattered resources, drafting, reviewing, and revising. The bottleneck wasn't quality -- their content was excellent. The bottleneck was speed and volume.

They needed a system that could leverage everything they'd already published and turn it into new, trustworthy content without starting from scratch every time.

What Did We Build?

We designed and delivered an end-to-end AI content generation system with three core layers: a knowledge base that stores CCC's entire content library as searchable vectors, an automation backbone that handles ingestion and article generation, and an Airtable-based interface where CCC's team controls everything without touching code.

The system uses RAG -- Retrieval Augmented Generation. Instead of asking an AI model to generate content from its general training data (which would produce generic, potentially inaccurate health information), the system first searches CCC's own knowledge base for relevant content, then uses that as context when generating new articles. Every generated article is grounded in CCC's approved sources.

The Technical Architecture

Here is how the system works from end to end.

Knowledge Ingestion: CCC's website content (231 pages across topics like treatments, diet, symptom management, mental health, and lifestyle) was scraped using Apify's website content crawler. YouTube transcripts from their video library were extracted separately. All content was chunked into searchable segments, converted into vector embeddings via OpenAI, and stored in a Pinecone vector database.

The RAG API: A Cloudflare Worker serves as the system's brain -- a lightweight API layer with two endpoints. The /ingest endpoint accepts new content, chunks it, generates embeddings, and stores it in Pinecone. The /retrieve endpoint takes a query (like a proposed article topic), searches the vector database for the most relevant content chunks, and returns them as context for generation.

Content Generation Workflow: When a CCC team member wants a new article, they add a row in Airtable with a topic and optional instructions. A Make.com automation fires immediately, calls the /retrieve endpoint to pull relevant knowledge base content, sends that context plus the topic to OpenAI with a system prompt tuned for CCC's voice and medical accuracy standards, and writes the generated article back into Airtable for human review.

Change Detection: A scheduled workflow monitors CCC's tracked URLs for content changes using MD5 hashing. When a page is updated, the system automatically re-scrapes and re-ingests the new content -- keeping the knowledge base current without manual intervention.

Human Review Layer: Every generated article lands in a "Ready for Review" status. CCC's content reviewer (a medical communications specialist) approves, requests revisions, or rejects. Revision requests trigger a new generation pass with the feedback incorporated. Nothing publishes without human sign-off.

The Tools We Used

Pinecone as the vector database for storing and searching CCC's knowledge base. Cloudflare Workers as the serverless API layer handling chunking, embedding, and retrieval logic. Apify for web scraping and YouTube transcript extraction. Make.com for workflow orchestration connecting all the pieces. Airtable as the user-facing dashboard where CCC's team interacts with the entire system. OpenAI (GPT-4o) for embeddings and content generation.

Why RAG Instead of Just Using ChatGPT?

This is the most common question we get when explaining this build.

If CCC's team simply pasted a topic into ChatGPT and asked it to write an article about managing a Crohn's flare, the output would be based on ChatGPT's general training data. It might be accurate. It might not. It wouldn't cite CCC's specific resources. It wouldn't match their organizational voice. And there would be no way to verify where the information came from.

RAG solves all of these problems. The AI model only sees CCC's own content as source material. Every generated article can be traced back to specific pages and videos from CCC's library. The output reflects CCC's actual positions and guidelines, not a generic internet summary. And if CCC updates a page on their website, the knowledge base updates automatically -- so future articles reflect the latest information.

For a health charity serving hundreds of thousands of patients with serious chronic conditions, this distinction is not a nice-to-have. It's the difference between a useful tool and a liability.

How We Handled Compliance and Security

CCC is a national health charity with IT governance requirements. Before any code was deployed, we prepared a full technical specification document covering data flow architecture, privacy compliance, and security measures.

Key compliance points: no patient data or PII is processed by the system -- only CCC's publicly available educational content. All API keys are stored as encrypted environment variables, never in code. Data in transit is encrypted via TLS 1.2+. Pinecone provides AES-256 encryption at rest. The system passed review by CCC's IT manager and received approval before going live.

We also addressed the common concern about AI model training. OpenAI's API does not use customer data for model training -- a critical distinction from the consumer ChatGPT product. CCC's content remains CCC's content.

What Were the Results?

The system was built and delivered within the project timeline. Here is what CCC's team got:

231 website pages ingested across their full content library -- from "What is Crohn's Disease" to fertility, insurance, travel, and mental health resources.

800+ YouTube video transcripts extracted and stored as searchable knowledge, making years of video content accessible to the AI for the first time.

On-demand article generation from an Airtable interface -- no technical skills required. Enter a topic, click a button, get a draft grounded in CCC's own sources.

Automated knowledge base maintenance -- the change detection workflow keeps the system current as CCC updates their website.

Full account ownership -- the entire system was transferred to CCC's organizational accounts (Pinecone, Cloudflare, Apify) so they own and control everything. Rex Automaton doesn't hold the keys.

Documentation and training -- user guides, training videos, and a live walkthrough with the content team so they're self-sufficient going forward.

The system is architected to scale beyond content generation. The same knowledge base can power future use cases CCC is already exploring: a patient-facing chatbot for the MyGUT app, an internal support tool for their team, and a virtual nurse assistant -- all retrieving from the same Pinecone index.

Why This Matters for Nonprofits and Health Organizations

Nonprofits sit on enormous knowledge bases -- years of resources, guides, videos, and educational content -- but rarely have the team size or budget to fully leverage that library. Content gets published once and buried. Institutional knowledge lives in scattered documents and old YouTube uploads.

RAG changes the equation. It turns a nonprofit's existing content into a living, searchable, generative resource -- without requiring AI expertise, without exposing sensitive data to third-party models, and without replacing human judgment.

The content team still reviews everything. The AI just removes the hours of manual research and first-draft writing. The humans focus on what they're best at: accuracy, tone, and patient sensitivity.

Want Something Like This for Your Organization?

Whether you're a nonprofit with years of content that could be working harder, a health organization that needs AI-grounded content generation, or a company that wants to build internal knowledge systems powered by your own data -- we build these systems.

Custom RAG architecture, workflow automation, secure deployment, full documentation, and training so your team can run it independently.

Book a free consultation to discuss your project --> 🤖 AI x Automation Discovery Call | Jacky | Cal.com