HomePostsDigital RightsUnpacking California’s Data Digester Bill: An Aspirational Model?

Related Posts

Unpacking California’s Data Digester Bill: An Aspirational Model?

Reading Time: 6 minutes
Print Friendly, PDF & Email

California has been at the forefront of crafting crucial data privacy laws, with the latest addition being the proposed data digestion bill. The state’s proactiveness in this area is evident from its earlier legislations. The State enacted California Consumer Privacy Act (“CCPA”) in 2018, which granted various rights to consumers. These rights include the ability to know what data is collected, to have it deleted, and to opt-out of its sale. In fact, this was further amended through the California Privacy Rights Act of 2020 (“CRPA”), also known as Proposition 24. This armed consumers with additional rights, like right to correct inaccurate personal information and limit the use of sensitive personal information. Further, it also established the California Privacy Protection Agency (“CPPA”), a body vested with full administrative power, authority, and the jurisdiction to enforce the CCPA.

Expanding on these efforts, California introduced the Delete Act to regulate data brokers, companies that gather and sell personal information collected from the internet for purposes such as advertising and identity verification. More recently, on October 8th, 2023, two bills – Assembly Bill (“AB”) 947 & 1194, were signed into law. AB 947 revised CCPA’s definition of “sensitive personal information” subsequently increasing protection for consumers. AB 1194, on the other hand, provided additional safeguards for individuals seeing reproductive care by limiting the exceptions related to data collection in California under CCPA/CPRA.

The latest bill, AB 3204, introduced by State Assembly member Bauer-Kahan, aims to extend regulatory oversight to a new category of entities: data digesters. In this article we first unpack the contents of the bill. Subsequently, we explore the possibilities of this bill serving as an aspirational regulatory model, particularly for developing countries.

Unpacking AB 2304

AB 3204 is a crucial intervention in the AI regulatory landscape and is tied to various critical ends like privacy, transparency, effective enforcement, consumer protection, etc. In essence, it expands the regulatory framework introduced by CCPA and CPRA to data digesters – a rather unfamiliar term. Simply, the bill reads that a data digester is a “business that uses personal information to train artificial intelligence.”

Among others, it provides for a few key obligations. First, it mandates that data digesters must register with the CPPA annually. Second, during registration, they need to provide details about AI training practices, including a) the types of personal data they have; b) whether they use information from minors; c) any relevant regulations they comply with like the Fair Credit Reporting Act or the Health Insurance Portability and Accountability Act among others. Third, the CPPA will create a public web page listing registered data digesters and their information. Fourth, a “Data Digester Registry Fund” is created to cover the costs of implementing and enforcing the law – this is to be sourced from the registration fees and fines. This fund will be administered by the CPPA. Further, it is intended to offset the costs associated with maintaining an informational internet website and covering the expenses incurred by state courts and the agency in enforcing these provisions. Fifth, the CCPA has the discretion in cases where it “believes” that a data digester defaulted on its duty to register itself within 90 days as proposed in the Act. The failure to register consequently attracts fines which may exceed 5000 USD.

This law’s clear parameters for stakeholders regarding personal data can enhance legal certainty and promote responsible data practice. The effectiveness of this law as an aspirational model hinges critically on demonstrably positive outcomes and successful implementation, but especially on the state capacity and data governance realities of potential adopters. The next section delves deeper into these considerations.

An Aspirational Model?

While AB 3204 presents a compelling framework for regulating the use of personal information in AI training, its applicability as a universally adaptable model merits careful consideration. Potential hurdles include the possibility of excessively burdensome requirements, limitations placed on businesses, and the potential stifling of innovation. These concerns primarily center around three key areas: a) costs associated with compliance, b) the broad definition of “data digester,” and c) the feasibility of meeting the proposed statutory timelines.

A. Costs

At the implementation stage, the regulatory measure brings to the forefront various types of costs attached to it like administrative, enforcement, compliance costs etc. First, a primary concern is the registration fees imposed on data digesters. While it is crucial to establish a fee structure that covers the costs of the regulatory framework, there is also a need to ensure that it does not inadvertently stifle innovation. This balancing act has proven to be a perennial policy challenge. For developing economies, the challenge is even greater. These nations often aspire to promote innovation and industry growth while fostering homegrown industries and ensuring access for their citizens. In such contexts, ensuring that the registration fees are reasonable and proportionate to the size and revenue of the data digester becomes imperative. Imposing excessive fees could create undue barriers, particularly for smaller entities and startups, hindering the very innovation these economies aim to cultivate.

Second, the bill significantly raises the stakes by elevating fines to as high as $5000 for each day of delay in the registration of data digesters. While the intention behind these fines is to encourage compliance, they may also be viewed as punitive and discouraging. Striking a balance between encouraging compliance and avoiding excessive financial burdens while key, remains technical and elusive. Moreover, it is crucial to clarify how the fines and fees collected will be utilized in the interest of transparency and trust-building. They should genuinely contribute to covering enforcement and informational website costs, as intended. Third, the Indian experience with the Competition Commission of India (CCI) serves as a cautionary tale, where fines-based regulation has often fallen short. Effectively, the enforcement costs of anti-trust decisions are burdensome. Fines imposed by the CCI are frequently not collected due to immediate appeals to higher judicial forums, raising concerns about whether a similar financially self-reliant body in India may eventually lack the resources to fulfill its obligations.

At the cost of slight departure, it’s crucial to consider the second and third-order effects of the overall costs of compliance for companies. Since a significant number of Artificial Intelligence companies are headquartered in the United States a wider or national adoption of such laws could create a ripple effect. Companies might be incentivized to implement these regulations globally, even in regions where they are not mandated. This economic pressure stems from the high cost of maintaining separate compliance systems for different markets. Such widespread adoption, driven by cost-efficiency, could mirror the “Brussels effect,” where US regulations unintentionally influence global standards. This phenomenon describes how EU regulations, due to the size and economic power of the European Union, unintentionally influence global standards in various sectors. In the context of the bill, a similar effect could lead to a de facto global standard for data digester regulation, even without explicit international mandates.

B. The broad definition

Another concern is the broad definition of “data digester.” This definition could have significant implications, especially as artificial intelligence (AI) becomes more prevalent across industries. Many businesses now rely on personal information to train their AI algorithms, blurring the lines between entities that impact privacy significantly and those that use data more innocuously. The challenge lies in distinguishing between entities that significantly impact privacy and those that use data in a more benign manner. In essence, the impact on privacy by different businesses varies differently. It becomes essential to strike this balance to avoid placing any unnecessary and undue burden on smaller businesses – like entities using anonymized data for basic functionalities – whilestill addressing the privacy risks posed by bigger market players.

Instead of a broad andpotentially stifling definition, developing economies might consider classifying AI systems based on their inherent risk to user privacy. This would effectively create a hybrid model amalgamating US and EU AI regulations. Such a risk-based classification would address two key concerns: fostering responsible AI development and safeguarding individual privacy. By defining different tiers of regulation based on risk levels, smaller businesses utilizing data ethically wouldn’t be burdened by excessive regulations, while entities with the potential to significantly impact privacy would face appropriate oversight. Overall, a tired definition might allow developing countries to benefit from this transformative technology while ensuring the protection of her citizen’s privacy.

C. Statutory Timelines

Two key provisions of the bill raise potential concerns: the 90-day registration deadline and the five-year statute of limitations. The tight registration window, particularly for businesses navigating the complexities of AI training processes, could pose a significant challenge. Offering clearer guidelines and potentially establishing an outreach program to educate businesses, especially smaller entities, on their obligations might be beneficial solutions. Additionally, the five-year statute of limitations for administrative actions against non-compliance might warrant further discussion. Considering the rapid evolution of AI technology and its potential for long-term privacy impacts, a more flexible timeframe could be explored. This would allow authorities to address emerging issues that might not become evident within the current five-year window.


California’s AB 3204 represents a bold step towards regulating the use of personal data in AI training. While its intent to promote transparency and accountability is commendable, the bill raises several crucial questions that need careful consideration, particularly for countries aiming to adopt it as an aspirational model. The impact of registration fees and fines, the potential for unintended consequences due to a broad definition of “data digester,” and the limitations of the proposed statutory timelines call for further discussion and potential refinement.

Samriddh Sharma Samriddh
Undergraduate law student at West Bengal National University of Juridical Sciences

Samriddh Sharma Samriddh is an undergraduate law student at West Bengal National University of Juridical Sciences, Kolkata. He is interested in critical legal studies and law & tech.

Puneet Srivastava
Undergraduate law student at West Bengal National University of Juridical Sciences

Puneet is an undergraduate law student at West Bengal National University of Juridical Sciences, Kolkata. He is interested in the field of legal and tech policy.


Featured Artist