Comments (4)
I added this to the destination team backlog to investigate. @evantahler fyi
from airbyte.
Can you please confirm the sync modes you are using, and the size of the data you are moving (e.g. how big do you expect the tables to be).
Previously, were you using normalization? If not, this is expected - the destination stores the data both in a raw table and a final (typed and deduped)table.
from airbyte.
Hi
Thank you for answering so quickly
We're using a mix of full-refresh/overwrite and deduped + history mod.
I don't have exact size of data moving, but the 500Mb was the size of data after reset.
And yes, we are using normalization and never used the raw data.
Hoping this will help.
from airbyte.
A few thoughts:
- Postgres isn't a great datawarehouse (for many reasons), one of which is that it doesn't actually free up disk space until a vacuum. You can try vacuuming after each sync - that might help
- If you are using the final tables now, but in the past, were only using the raw tables (not using normalization), that will likely increase your storage by 2x. You have gone from one table to 2 in the destination (raw to raw + final).
from airbyte.
Related Issues (20)
- Airbyte CDK: Remove gradle
- Airbyte CDK publish - migrate to airbyte CI
- [DB sources] : Revisit checkpoint frequency for CDC sources HOT 1
- ✨ Source TrustPilot: Migrate Python CDK to Low-code CDK
- [connector-builder] SSL certificate failed into Docker container - Tutorial APILayer HOT 1
- Remove (`'sync','reset'`) config_type filtering from JobPersistence
- Add business logic to pagerduty HOT 1
- [helm] AWS SDK version mismatch error in 0.53.0 HOT 3
- ✨ Source Microsoft Teams: Migrate Python CDK to Low-code CDK
- Clickhouse is not syncing data to the main tables but only creating the internal tables
- Runbook for workload-launcher recycle
- Platform Handles Large Records
- Implement Payload Sizes
- Platform Handles Large Records
- Implement Refresh Truncate
- Cache workspace ID in Segment tracking client
- ✨ Source AWS Cloudtrail: Migrate Python CDK to Low-code CDK
- [source-MSSQL] incremental append failed due to `datetimeoffset` with CDC as update method HOT 5
- [source-freshdesk] `skills` stream throws 404 error HOT 3
- Migrate Source Jira to Low-Code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from airbyte.