Comments (5)
This was implemented before I started contributing, but, I personally like to know that I don't have a Man in the Middle Attack as 99.99% of people are syncing from PyPI to local bandersnatch repos. It's also very trivial to add HTTPS to your "DMZ" instance. So I would reccomened that route.
Even tho you're hitting your DMZ mirror, you're also still hitting PyPI's XML RPC API to calculate the differences from where your "internal" mirror is (we are unable to mirror that). You are hitting the JSON API and pulling the packages down locally tho, from your DMZ.
- You could also look at using rsync or some other mirror technologies (e.g. btrfs sends) as well.
This all said, I will accept a PR that defaults to enforcing HTTPS only, as we are today, and allows you to negate that check.
- If you go this route, please in the sample config add a comment stating why this is bad and that it really should only be used internally
Thanks for asking. Feel free to ask any more questions.
from bandersnatch.
@cooperlees thanks for the clear answer.
I think the rsync option from DMZ to internal is the way to go.
This is what we have in place now, we are not just quite happy with the performance of rsync with gazillion of files in PyPI.
Due to my lack of knowledge of PEP 381 now i realized the internal bandersnatch needs connection to the (real) PyPI server anyway for the XML RPC API call.
Just for my awareness: what's the best way to have the XML RPC server side part on DMZ? Should i use something like devpi?
Thanks
Giampaolo
from bandersnatch.
Yeah the 1000s of files and directories do not help. This is why I also suggested btrfs (or it could be zfs) differential sends. They will be fast as they are at the block level based on snapshots.
devpi also can't replicate the XML RPC API, nothing really can as it's the source of truth for package and mirror serials that PyPI calculates in real time.
I've never used devpi, but for it's PyPI operations, it could run in your DMZ and cache all the PyPI package that your infra needs, but it seems it runs as a proxy. devpi does have a "replication" feature that could possibly satisfy you needs. I don't know all your goals, but it sounds like it could do what you want, especially if the replication also syncs the PyPI cache, which I am not sure it does.
from bandersnatch.
Were you able to find a work around for this? If so please share and we'll close this issue.
from bandersnatch.
Sorry for late answer.
I gave up and we stick with rsync from DMZ to internal mirror, even if performance are not outstanding.
Thanks for your help
from bandersnatch.
Related Issues (20)
- Predefined plug-in allowlists HOT 1
- Docker running + storing packages on an overlayfs causing conflicting exceptions HOT 11
- Packages are not automatically deleted + delete CLI bugs HOT 11
- Configuration and Filtering Help HOT 2
- High memory usage for `verify --delete` due to deletion occurring at end HOT 3
- No sync of new packages when the package list changes after the first sync HOT 3
- bandersnatch mirror cannot get update data HOT 4
- Add subcmd to use metadata to roughly calculate the size of the local bandersnatch mirror HOT 3
- Update bandersnatch to latest packaging (22.0) to unblock tox 4.0
- Make CI pass in 3.11 + Docker build in 3.11 HOT 2
- Stop issuing PURGE requests on stale serial as PyPI requires authentication now
- heavy io doing glob('**') in /web/simple for global index HOT 4
- bandersnatch.master.XmlRpcError: Unable to get full list of packages HOT 1
- Will bandersnatch delete packages that are not on pypi.org but exist locally when synchronizing? HOT 1
- Exception with UTF-16 encoded requirements.txt file HOT 3
- stale serial errors blocking synchronization HOT 1
- Improve mirror filtering for latest releases HOT 2
- Add TODO file cleanup to avoid a single package blocking the entire sync process HOT 5
- package json digest dict mapped to simple json hashes dict causes pip >23 to fail HOT 3
- Generate SimpleDigests object from what metadata offers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bandersnatch.