Since I'm not an English speaking resident, grammar can be awkward. Please understand.
---
I was initially trying to solve this problem with a semantic layer. I thought that if semantic existed, we would be able to leave auditable evidence based on the code that referenced semantic.
But the reality was not easy.
It has been confirmed that there is more involvement than expected before a single AI answer is made.
What prompt was in it,
What documents or data were retrieved,
Which semantic definitions were referenced,
Which models and settings were used,
What was the final answer,
And how you can prove that the record didn't change later.
At first, I thought it would be enough to leave all of this as a log.
However, logs are usually located inside the system.
The operator can correct it, it can be omitted, and someone later sees the question, "Can I trust this record?"
So the perspective has changed a little bit.
Audit of AI answers is not just a log-gathering problem,
I've come to think that it's more of a question of leaving "what really happened at that point" to be verified later.
So I came up with the idea of storing all of this in a reliable external system.
The primary gate is Merkletree, the secondary gate is a distributed ledger, and the tertiary gate is a witness system.
However, since LLM's answer is not fixed, sensitive information may flow in.
So the idea of saving the original text was to allow the user to turn it on and off, and to chaining both inside and outside.
The service I am creating is a service that logs LLM inputs and outputs internally and logs proofs to the outside.
Originally, the service was created for the basic AI laws of Korea, but in the end, I thought that the conditions required anywhere would be the same, so I pivoted and posted it on the product hunt.
I don't know if this approach is necessary for every team yet.
Maybe it's too much structure for early AI products.
However, as AI goes deeper into customer response, data analysis, internal approval, and policy judgment, these questions are likely to become more and more important.
"What made this answer?"
"What were the documents and definitions you used then?"
"How do you know that the record hasn't changed afterwards?"
"Can you prove your sensitive information without leaving it outside?"
If you're making an AI product, do you feel you need this proof of confidence in your answer unit?
Or do you think it's still too early?