Explore the intersection of artificial intelligence and law. This hub offers insights into regulations, confidentiality, and copyright in the context of AI.
AI Model Provider Confidentiality
Is your data safe with AI providers?
Imagine you're a criminal defense lawyer. A client confesses they killed someone, you need to write an urgent brief, but you don't have time. You input the facts—including the confession—into ChatGPT, and explain what you need, and it delivers a reasonably good first draft that you can refine. Great, right?
Hopefully, by now, we're not that naive.
Confidentiality relating to generative artificial intelligence (GenAI) is a widespread concern among industries and legal professionals. Sending confidential information to GenAI services like OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini raises issues about leaking data, waiving privileges, or exposing confidential information. To our knowledge, no court has addressed whether sending information to GenAI providers waives a privilege or destroys confidentiality protection.
Understanding GenAI Provider Policies
The likely answer, unsurprisingly, is "it depends." Mostly, it depends on the GenAI provider's terms and security, and the user's settings in their GenAI accounts. If the provider uses your inputs to train future AI models, it seriously risks confidentiality leaks and privilege waivers. This must be avoided!
If the GenAI provider stores your inputs--but expressly says it won't use them to train AI--there is a reasonable argument against waiver. Providers of other cloud services like email, chat, and file storage often have similar terms, and using sufficiently-secure cloud providers generally does not waive privilege. See, e.g., Harleysville Ins. Co. v. Holding Funeral Home, Inc., No. 1:15CV00057, 2017 WL 4368617 (W.D. Va. Oct. 2, 2017).
Making Informed Choices
For many organizations, using "zero-retention" AI services will be the correct answer. While you still need to trust the provider—and data breaches can happen even with highly secure companies—established providers like Microsoft Azure and Amazon Bedrock offer enterprise-grade security comparable to their other cloud services.
The information herein is based on publicly available documentation from providers. Grounds LLP has no affiliation with these providers other than being a user. For the most current and accurate information, please refer to the providers' official documentation and legal agreements.
We looked through a whole lot of provider terms to compile this resource, and providers are frequently updating terms. Mistakes are possible! Let us know if anything needs updating, or if we should add another provider.
Provider Comparison
Provider | Trains on Data | Data Storage | Indemnity |
---|---|---|---|
OpenAI | Varies Consumer tier trains by default (can opt out), business offerings do not train by default | Storage for abuse monitoring and service improvement, with zero-retention option for enterprise (Data Usage FAQ →) | Available for API and Enterprise users with conditions, for claims that Customer's use or distribution of Output infringes a third party's intellectual property right (Business Terms →) |
Azure OpenAI | No Does not train on customer data | 30 days for abuse monitoring, enterprise can opt out (Data Privacy Documentation →) | Indemnity for allegedly infringing outputs, provided customer implements certain mitigations (Copyright Commitment →) |
Anthropic | No By default, does not train on inputs or outputs. Exceptions for trust and safety reviews and explicit feedback. | Data stored for up to 30 days for safety monitoring, can be deleted upon request (Consumer Terms of Service →) | Indemnity for claims that outputs infringe third-party intellectual property rights, subject to certain conditions (Commercial Terms of Service →) |
Amazon Bedrock | No Your content is not used to improve the base models and is not shared with any model providers | Data is encrypted in transit and at rest, with optional customer key encryption (AWS Bedrock FAQs →) | Uncapped IP indemnity for copyright claims arising from generative output of Amazon Bedrock services (AWS Service Terms →) |
Perplexity | Varies Consumer tier may store data for AI improvements, Enterprise tier does not train on customer data | Input and responses stored for 7 days by default for consumer tier, configurable retention for enterprise (Enterprise Terms of Service →) | Enterprise customers receive indemnification protection against third-party claims related to use of services (Enterprise Terms of Service →) |
Google Gemini | Varies Google uses data submitted through its unpaid tiers to improve its services, but does not use user input to train models for paid tiers | Data retained for up to 3 years by default, with options to limit retention to 3 or 36 months (Gemini API Terms →) | Google offers indemnification for both training data and generated output (Protecting customers with generative AI indemnification →) |
Grok | Varies Free tier data may be used for training, with opt-out option; paid services do not train by default | X may use, store, and distribute input and output 'to maintain and provide the Service,' but '[e]xcept for anonymized and aggregated statistics, [Grok] will not use your Input or Output to develop or improve the Service' (Privacy Policy →) | Indemnification available for enterprise customers against third-party claims related to use of services (Enterprise Terms of Service →) |
Consumer tier trains by default (can opt out), business offerings do not train by default
Storage for abuse monitoring and service improvement, with zero-retention option for enterprise (Data Usage FAQ →)
Available for API and Enterprise users with conditions, for claims that Customer's use or distribution of Output infringes a third party's intellectual property right (Business Terms →)
Does not train on customer data
30 days for abuse monitoring, enterprise can opt out (Data Privacy Documentation →)
Indemnity for allegedly infringing outputs, provided customer implements certain mitigations (Copyright Commitment →)
By default, does not train on inputs or outputs. Exceptions for trust and safety reviews and explicit feedback.
Data stored for up to 30 days for safety monitoring, can be deleted upon request (Consumer Terms of Service →)
Indemnity for claims that outputs infringe third-party intellectual property rights, subject to certain conditions (Commercial Terms of Service →)
Your content is not used to improve the base models and is not shared with any model providers
Data is encrypted in transit and at rest, with optional customer key encryption (AWS Bedrock FAQs →)
Uncapped IP indemnity for copyright claims arising from generative output of Amazon Bedrock services (AWS Service Terms →)
Consumer tier may store data for AI improvements, Enterprise tier does not train on customer data
Input and responses stored for 7 days by default for consumer tier, configurable retention for enterprise (Enterprise Terms of Service →)
Enterprise customers receive indemnification protection against third-party claims related to use of services (Enterprise Terms of Service →)
Google uses data submitted through its unpaid tiers to improve its services, but does not use user input to train models for paid tiers
Data retained for up to 3 years by default, with options to limit retention to 3 or 36 months (Gemini API Terms →)
Google offers indemnification for both training data and generated output (Protecting customers with generative AI indemnification →)
Free tier data may be used for training, with opt-out option; paid services do not train by default
X may use, store, and distribute input and output 'to maintain and provide the Service,' but '[e]xcept for anonymized and aggregated statistics, [Grok] will not use your Input or Output to develop or improve the Service' (Privacy Policy →)
Indemnification available for enterprise customers against third-party claims related to use of services (Enterprise Terms of Service →)
Amazon Bedrock
+This information is based on publicly available documentation and terms. Always refer to the official provider documentation and legal agreements for the most current information.