CapeChat automatically encrypts your documents and redacts any sensitive data. It’s powered by the ChatGPT API, so you get the best language model while preserving your privacy.
Hello Product Hunt! 🚀
We built CapeChat so you can chat privately and securely with your documents.
CapeChat automatically encrypts your data and redacts any sensitive details. It’s powered by the ChatGPT API, so you get the best language model while preserving your privacy.
CapeChat is fully self-serve, free to use, and available today!
Here’s what you can accomplish with CapeChat:
✅ Upload sensitive documents like contracts, financials, and meeting notes.
✅ Ask questions, summarize, calculate or anything else you can think of.
✅ Transform messy documents into organized, structured data.
✅ Upload and prompt across multiple documents.
👉 Try it now: https://chat.capeprivacy.com/
👥 Join our Discord: https://discord.com/invite/nQW7Y...
We can’t wait to hear your feedback and suggestions!
P.S. You can also build your own secure, private products powered by GPT-4 with the CapeChat API!
@gavinuhma This sounds great and what people are looking for, but I have many questions:
At what point are the documents encrypted?
Where are the documents stored?
Is redaction done before or after encryption?
What encryption is being used?
Are you using embeddings with OpenAI to query the documents?
Are the embeddings encrypted? If so, how did you manage to do this and get it to work with OpenAI?
How do I get access to the API?
@mark_craddock Great questions! I’ll answer each one.
- Documents are uploaded over TLS and processed in an AWS Nitro Enclave.
- Documents are then client-side encrypted and stored in RDS.
- Envelope encryption is used: AES 256 for the data and RSA 4096 for the data key.
- We use a local embeddings model and vector storage/search (FAISS) which also runs in a Nitro Enclave.
- The final prompts are de-identified of PII, PHI, and PCI before being processed by the ChatGPT API.
As for API access, join our discord or send me an email and I’ll get you setup! Or I can let you know when the self serve api onboarding is released soon
@edsim Thank you!! We've kept the latency very minimal relative to the ChatGPT API, so it should feel about the same. This was a lot of work so appreciate the question! :)
Congrats on the launch!
To be honest, I don't really understand it:
The data has to be decrypted on your servers before sending it to Open AI, right? So I do not really understand the benefit of encrypting it, instead of sending it directly to Open AI.
@timobechtel Great question. There are many components involved like embeddings, vector storage, and vector search that happen before any communication with OpenAI. Any prompts are additionally de-identified of PII, PHI, PCI before being processed by the ChatGPT API.
@timobechtel The main component that protects the data while it's being prepared for the OpenAI API is that we don't decrypt the data on normal servers, but instead inside of an AWS Nitro Enclave. AWS Nitro Enclaves are designed in such a way that even we, the operators, cannot view the data that's passing through them, which is guaranteed through a process called remote attestation. Here is an old blog post our team wrote before we developed CapeChat that goes into more detail about the security of enclaves: https://capeprivacy.com/blog/how...
@jvmncs Oh I see. So the main benefit is that documents are processed (at all) before they are sent to Open AI, and this is done in a secure cloud environment. While being end to end encrypted. (Zero Trust).
So the only thing that is sent to Open AI is a prompt without personal information.
Did I get this right? That's very interesting!
I've just got confused a bit with the encryption part in combination with ChatGPT. I think I have totally misunderstood the product 😅
Thanks for your answer!
@timobechtel Yes exactly! Our service accomplishes two separate security goals: (1) prevent Cape or AWS from being able to spy on the data while it's being processed by using AWS Nitro Enclaves, and (2) prevent OpenAI from learning about PII/PCI in documents and prompts by using redaction/de-identification.
GPT (especially GPT-4) is smart enough to reason about redacted data by using placeholders and contextual information, so you can still make it do a ton of useful things without it being able to infer what the sensitive terms are.
A big problem that definitely needs to be addressed, huge congrats on the launch!
I had a quick question while watching your demo video: so for the example where you ask "concisely compare the net profit or loss for Illustrative Corporation Group from 2020 to 2021", how does the LLM (the OpenAI model in this case) actually know how to compare those two numbers when the actual figures are redacted before the OpenAI API call is made? It correctly answered that the net profit "increased from 11,195 in 2020 to 14,737 in 2021", so wondering how it was able to come up with that answer.
On that note, I'm super curious as to how CapeChat would deal with questions that ask directly or indirectly about the sensitive data (that are to be redacted before OpenAI API calls) themselves?
@channy_hong I'll let @miiklay respond to the first question since he recorded the video.
Regarding your second question, the goal of the service is to redact private information, but leave all of the surrounding context alone. For example, if you ask a question that GPT sees as "Where was [NAME_1] born?" and have uploaded a document that contains some context like "[NAME_1] was born in [LOCATION_1] on [DATE_1].", GPT is smart enough to recognize and use those placeholders in its output. The response should look like this: "[NAME_1] was born in [LOCATION_1]."
In general, you can expect useful answers to questions about sensitive info as long as the answer can be determined from (1) surrounding (un-redacted) context, or (2) other sensitive info that has a clear relationship to your question according to the surrounding context. And it turns out this is exactly the functionality we should hope for -- if GPT could do anything more than that, it means that you've leaked some of your sensitive data to it (and therefore to OpenAI, and/or any malicious actors who might gain access to your OpenAI API call history).
@channy_hong Keen eye :) That is possible with some simple prompt augmentation. For folks who subscribe to CapeChat Plus, you get access to GPT-4 and the ability to customize the prompt (amongst other features). I instructed GPT to format any math questions into a formula that could be interpreted with javascript, along with examples using the redacted placeholders syntax.
@miiklay Super cool, and thank you :) Sorry to bother you with a follow-up, but I just love digging into details like this 😎
In this exact example for instance, I'm curious as to how the LLM (through OpenAI's API) 'knows' that [MONEY_37] is greater than [MONEY_11], such that it answers that the net profit "increased". I'm probably just not understanding what prompt augmentation is 😂
@miiklay@jvmncs Gotcha gotcha, that makes a lot of sense. If I am understanding this correctly, then I guess the only type of questions that would be difficult for Cape Chat to answer would be things that require reasoning on top of the content of the redacted information?
For instance, if one's 'place of birth' is an info that is redacted before the OpenAI API call is made, it would be hard to answer the question "name three countries that neighbor the country John Doe was born in", right? Or is there some way that y'all are handling this situation too already?
@miiklay@jvmncs@channy_hong Thanks for the support @channy_hong ! And great questions. Yes, the model will fail to answer any questions where the answer is “outside” of the context. That’s why it is good at documents because the context is all inside.
You can try experimenting with this yourself.. type “Who is Wayne Gretzky?” And the model will tell you it doesn’t know. But you can give it a pdf of Gretzky and it can then answer.
I’d be curious to see what you think! It’s fun to feed it hints about a famous person and then see if it can eventually identify them (since famous people are obvious statistical outliers)
Incredible launch, CapeChat team!
Love the concept of integrating privacy with GPT-4's language capabilities.
I'm curious, though, have you considered getting certified by a third-party security vendor to ensure absolute trustworthiness?
Keep up the great work!
@maxtpham Great question! CapeChat is built on our underlying confidential compute platform which is based on AWS Nitro Enclaves. We've already completed a third-party security audit with NCC on Cape, and will be completing a new audit now specifically for the features of CapeChat. In addition, AWS recently released their report from NCC on the underlying Nitro System which you can access here: https://aws.amazon.com/blogs/com...
Thank you for the support @maxtpham !
Thank you for the interest and questions @maxtpham. Security and privacy are our top priorities, and we'll continue to help users understand how our system ensures both.
@nuno_ms_reis Great question! The documents are indexed and searched prior to any communication with OpenAI. This happens with a confidential vector store and embeddings model running in a Nitro Enclave. Once the prompt is constructed it is de-identified of PII, PCI, and PHI before being processed by the ChatGPT API. Then we re-identify client side so only the end user can see the final result!
Hi @nuno_ms_reis, before anything is sent to OpenAI it first needs to be preprocessed (embeddings, vector search, redaction, etc). We do all of this in a way that keeps it from being viewed by anyone (including us!) using AWS Nitro Enclaves. And anything that we need to store will be client-side encrypted within the enclave first. Then, what is actually sent to OpenAI will be de-identified of PII, PHI, and PCI. Hope this helps!
Cape looks amazing! With the rise in generative AI, I'm happy to know that there will be ways to maintain privacy while still getting the benefit of an LLM.
I am curious, what document types do you support exactly?
@jevon Surprised no one asked that yet! We support PDF, Word, Excel, PowerPoint, CSV, Text, and Markdown.
Moving forward we are even adding speech-to-text, and image-to-text, so you can add context from a bunch of different media types.
Thanks for the support!!
As a user researcher, I think this could be really useful and helpful in my field. UX research participants' PII is a big concern, for example. Excited to see more.
I've always been a fan of what they're doing at Cape Privacy. This is a great new release that gives people more comfort using ChatGPT with their secure data. Congrats on the launch, and I look forward to seeing what comes next.
The de-identifying of PII and PCI data is huge! Data leakage is something every enterprise user of LLMs is worried about. Love how the secure enclaves are abstracted away for users so it's an easier API and product to use! Congrats on the launch
@sg2 Thank you!! You nailed it! There is a lot of security tech behind the scenes which was the bulk of our development efforts. We designed the UI/UX so none of that gets in the way of the end-user and they can focus on a beautiful chat experience. Thanks again Shomik!
Congrats on the launch. A lot more tools is being released related to the generative AI. In the current situation, this appears to be a promising and much-needed tool.
@velusamy_subramaniam you are correct. There is a lot of activity (and noise) around GAI, and we're hoping the value of CapeChat becomes clear to potential users. Thanks for your comment!
Replies
CapeChat
CapeChat
The Twenty Minute VC
CapeChat
MoneyVision
CapeChat
Zed
MoneyVision
Zed
CapeChat
CmdKay
Zed
CapeChat
CmdKay
CmdKay
CapeChat
MegaVote
CapeChat
MegaVote
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
manifoldco
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
Easop 2.0
CapeChat
CapeChat
CapeChat
CapeChat
The Fundraising Playbook
CapeChat
CapeChat
Argil
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
Viral Tools - Viral Instagram Captions
CapeChat
CapeChat
CapeChat
Figma Templates
CapeChat
CapeChat
CapeChat
HYCU for Jira Software
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
UI Design
CapeChat
CapeChat
BoldDesk
CapeChat
CapeChat
CapeChat
The Prohuman AI
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
CapeChat
Easop 2.0
CapeChat
CapeChat
CapeChat