Deterministic Data Collection Using Probabilistic Methods

Published: 2026 May 09

Previously in Automate Thoughtfully with Agents, the key takeaway was to “prioritize using agents as architects over using agents as executors” such that “an upper bound on financial cost” is set, and then employees and entrepreneurs can focus on optimizing for the gains of those investments. With this in mind, this blogpost will elaborate on cases where an agent may be well suited to act as an executor.

In this article, an agent is defined as a computer program which has the following properties:

  1. Is accessible by using a harness (e.g. Google Antigravity)
  2. Is powered by an AI model (e.g. Google Gemini 3.1 Pro)
  3. Is served by a software company (e.g. Alphabet’s Google)1, 2, 3

Agents may be well suited to act as an executor in data collection workflows that involve ambiguity due to, and not limited to, the following dimensions:

  1. Differences in languages
  2. Differences in naming conventions
  3. Differences in data structures
  4. Differences in data accuracy
  5. Differences in data usability
  6. Differences in data completeness
  7. Differences in data comprehensiveness
  8. Differences in data freshness

Before the “agents era”, data collection workflows which involved ambiguity involved:

  1. Understanding the problem to solve
  2. Planning the data structure to enable eventual analysis
  3. Designing the information storage, retrieval, extraction, and verification methods
  4. Formulating process guidelines
  5. Requesting people to apply the guidelines

By our nature, people do not behave consistently, and by extension it could be written that people behave probabilistically. As a result, before and after the “agents era”, data collection workflows have employed probabilistic methods to create deterministic datasets.

Footnotes:
1. To date, I mostly use Google’s suite of agentic products. The number one reason is because the quotas are observably higher when using Gemini models in Google Antigravity as compared to using Sonnet and/or Opus models in Claude. This observation continues even after Anthropic’s announcement on 2026 May 06.
2. At times, I may own common stock securities in the company names written.
3. This is not a recommendation to buy, hold or sell a common stock security.