Per this GE docs page, the great_expectations
team added a bit of code to enable them to track usage of their code, which can be disabled in the great_expectations.yml
file. That page advises there's more information on a blog post from 2020, but the given link is dead. Still, per the wayback machine that post, the GE team states
"We do not track credentials, validation results, or arguments passed to Expectations. We consider these private, and frankly none of our business. User-created names are always hashed, to create a longitudinal record without leaking any private information. We track types of Expectations, to understand which are most useful to the community."
This is very reasonable and I'm keen to provide the GE team with information that helps them figure out what features are worth working on. However, as my project is intended to be both a specific project but also a platform that other people can fork and make their own pipelines for (but from the traffic page, I see people are mainly cloning the repo without forking), I don't know if I should strip out the UUID as it would produce a polluted longitudinal record.
So I should experiment with stripping out this UUID (both in /great_expectations/expectations/.ge_store_backend_id
and .../great_expectations.yml
files; per grep
, all other appearances of the UUID are in the /.uncommitted/
dir) and see if anything complains when I run checkpoints.