Comments (9)
Hi, I used to work on using CRIU to implement the "lazy migration" you describe for serverless functions. I also noticed that the current lazy migration in CRIU is actually lazy-live-migration, not lazy restore.
From my understanding, I think your current idea is to track hot pages and prefetch them, and then use userfaultfd to load other cold pages. I would recommend looking at a paper (https://dl.acm.org/doi/pdf/10.1145/3445814.3446714) where the authors have implemented a similar idea on vHive. I hope their design can help you develop your idea.
from criu.
In fact I've already read vHive and this idea is exactly inspired by it. VHive works on firecracker VM whose checkpoint is dumped from anonymous memory of VM monitor process, thus it contains full data of VM memory, resulting in an easier implementation to REcord-And-Replay. But CRIU works on process, whose checkpoint only contains private data(annonymous and dirty file-backed pages) of process. That's why I need to track and dump not only private data but all accessed pages.
But thanks for your kindness.
:->
from criu.
Is that possible to achieve that only restore pages touched in phase 2, and lazily restore pages touched in phase 3 ?
Adrian has a good blog post on how this could be achieved:
https://lisas.de/~adrian/posts/2016-Oct-14-combining-pre-copy-and-post-copy-migration.html
from criu.
Thanks for your reply. This blog is fairly good but I think there are ambiguities in my description. I said 'touch' here means 'access' rather than 'modify'. What I mean is that is it possible to restore pages accessed in phase 2 only. These pages are not necessarily dirty. In the blog you mentioned above, Adrian pre-dump all pages, in my case, all pages allocated in phase 1 and phase 2, then dump dirty pages modified in phase 2, then restored all pages pre-dumped to /tmp/cp/1, then lazily restore pages dumped to /tmp/cp/2. This does not look like lazy restore to me, since it restores all pages dumped by pre-dump phase.
from criu.
This does not look like lazy restore to me, since it restores all pages dumped by pre-dump phase.
Because it was designed for lazy-live-migration. The behavior that you expects can be easy implemented. How are you going to use it? What profits do you see in this use-case?
from criu.
The behavior that you expects can be easy implemented
Could you give me some idea ? I have some naive idea on it:
CRIU use echo 4 > /proc/pid/clear_refs
to track dirty pages. If I want to track accessed pages instead, then use echo 1 > /proc/pid/clear_refs
, then dump accessed pages only. But how to restore accessed pages only rather than all pages ?
What profits do you see in this use-case?
For many applications (serverless applications for example), their memory working set size goes through an inflated-deflated phase. That is, during initialization phase (loading lots of libraries), they access much more pages than serving phase (just a rpc server sitting there waiting for requests). A big fraction of pages accessed in init phase (I call it cold pages) are rarely accessed in serving phase, so I want to store away those cold pages on disk, only restore hot pages (accessed in serving phase). Cold pages can be brought into memory on demand (use page faults).
I think this can save memory and accelerate restoration.
from criu.
And I still wonder if developers of CRIU have an interest in supporting true "lazy restore".
The behavior that you expects can be easy implemented.
Although I'm interested and I think it's useful, It's a bit difficult for me to implement.
from criu.
@LanYuqiao, thank you for clarifying your use case. As Andrei mentioned, the original implementation was designed for live migration. In this scenario, we have residual dependencies between the source and destination machines. In particular, we need to make sure that all pages are restored because if the source machine becomes unavailable (e.g., due to system failure), the restored application would fail.
Is that possible to achieve that only restore pages touched in phase 2, and lazily restore pages touched in phase 3 ?
It should be fairly easy to modify CRIU to restore memory pages only when a page fault occurs. However, this will result in poor performance for the restored application. Loading a memory page from disk (or over the network) is significantly slower compared to direct access from memory. This could be observed in the following demo: https://asciinema.org/a/4QgtYPW9XtTngTyCX5Jsibqth (the application is significantly slower for a few seconds after restore).
What is the main problem you are trying to solve? Why do you want pages to be restored only when accessed in phase 3?
from criu.
What is the main problem you are trying to solve? Why do you want pages to be restored only when accessed in phase 3?
In my case, the numbers of pages accessed in phase 2 and phase 3 are much smaller than that in phase 1. There are quite a large part of pages accessed in phase 1 will not be accessed anymore, which means we don't need to restore all pages to run the app, we only restore those pages accessed. Restoring all pages will be a waste of memory and time.
from criu.
Related Issues (20)
- [question about criu] Will it work on steam deck? HOT 2
- CRIU dumps triggers COW on all memory in all child processes HOT 18
- How a app can know that it had been dumping by criu HOT 7
- ERR: vdso01.c:378: Delta is too big HOT 2
- docker checkpoint create failed: Error (compel/src/lib/ptrace.c:27): suspending seccomp failed: Operation not permitted HOT 2
- Checkpointing runC container is giving error: Unable to connect a transport socket: Permission denied HOT 5
- How to make parasite code support glibc? HOT 5
- How disable plugin for nvidia gpu HOT 2
- How to use CRIU with CUDA HOT 2
- Cannot dump process that opened file in tracefs HOT 4
- Cannot checkpoint container: /usr/bin/nvidia-container-runtime did not terminate successfully: exit status 1 HOT 14
- gcc format-truncation warnings on Ubuntu 24.04 HOT 7
- Can't get reg-files.img by dump. HOT 7
- Following the `setcap` instruction raises 'fatal error: Invalid argument' HOT 6
- Can CRIU use arm based runners from Actuated? HOT 1
- CRIU package for Ubuntu 24.04 HOT 9
- "Fedora ASAN Test / build" fails with "cgroup.clone_children: No such file or directory"
- "Cirrus CI / Vagrant Fedora Rawhide based test" fails with error "setenforce: SELinux is disabled" HOT 2
- compel parasite sys_open return -1 always HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from criu.