Deterministic Data Collection Using Probabilistic Methods

Published: 2026 May 09

Previously in Automate Thoughtfully with Agents, the key takeaway was to “prioritize using agents as architects over using agents as executors” such that “an upper bound on financial cost” is set, and then employees and entrepreneurs can focus on optimizing for the gains of those investments. With this in mind, this blogpost will elaborate on cases where an agent may be well suited to act as an executor.

In this article, an agent is defined as a computer program which has the following properties:

Is accessible by using a harness (e.g. Google Antigravity)
Is powered by an AI model (e.g. Google Gemini 3.1 Pro)
Is served by a software company (e.g. Alphabet’s Google)^{1, 2, 3}

Agents may be well suited to act as an executor in data collection workflows that involve ambiguity due to, and not limited to, the following dimensions:

Differences in languages
Differences in naming conventions
Differences in data structures
Differences in data accuracy
Differences in data usability
Differences in data completeness
Differences in data comprehensiveness
Differences in data freshness

Before the “agents era”, data collection workflows which involved ambiguity involved:

Understanding the problem to solve
Planning the data structure to enable eventual analysis
Designing the information storage, retrieval, extraction, and verification methods
Formulating process guidelines
Requesting people to apply the guidelines

By our nature, people do not behave consistently, and by extension it could be written that people behave probabilistically. As a result, before and after the “agents era”, data collection workflows have employed probabilistic methods to create deterministic datasets.