GithubHelp home page GithubHelp logo

Comments (15)

ReactorScram avatar ReactorScram commented on June 12, 2024 1

I'll ask ChatGPT about how DNS works on Windows. If we can shadow everything yeah that's probably best cause timeouts wouldn't get cached.

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024 1

Yeah, this will happen any time a Gateway flaps, the Relays are being restarted, or the Portal is being restarted since those all can cause a timeout when the client tries a lookup.

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024 1

I think it was done like this because we don't know how many of each IPv4/IPv6 dummy addresses to generate until we do an actual lookup, and also because we don't have a good way to reverse the mapping when it reaches a Gateway.

If we immediately generated 100.96.100.100 for example, we would still need some way to tell whichever Gateway we land on that 100.96.100.100 corresponds to the original DNS name requested, and then do the actual lookup.

We should probably review this architecture

cc @conectado

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

Yeah that would explain some things I've seen lately when trying the Windows Client.

I'll see if there's any other way to handle it. I don't like removing the system DNS servers because then it'll be a mix of the iOS resolver dance and the /etc/resolv.conf thing, where we have to remove the servers, but also keep them safe, and revert them, and if we crash the user is in deep trouble, and if DHCP keeps in we have to re-remove them and update our stashed servers.

Could connlib's DNS have a more aggressive timeout and lie and say the domain doesn't exist? Then traffic can't escape the tunnel.

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024

Ah I see, on other platforms the nameservers are reverted when bringing the tunnel back down. Maybe Windows has a similar option?

Could connlib's DNS have a more aggressive timeout and lie and say the domain doesn't exist?

Yeah maybe that's a better option. If a Gateway is flapping or no Gateways are online what should connlib do though? I think it's probably most appropriate to timeout instead of incorrectly returning an NXDOMAIN which the app might cache.

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

I'm not sure exactly how it works on Windows but we want something that shadows the system resolvers without removing them from their own interfaces. That was frustrating about the /etc/resolv.conf thing on Linux.

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

Yeah replicates for me on the dev laptop on c036d1a (tip of main)

Even after I enable the Policy again, it's stuck outside the tunnel.

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

And there was a reason why we can't assign an IP and respond to the DNS query before the gateway responds to us?

I remember asking this and I think Gabi said there was a good reason, but I can't remember. If we could just always assign an IP before knowing whether the Resource was even reachable, it would avert this - The DNS query would come back in milliseconds and then the connect() call will just time out harmlessly.

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

ChatGPT suggested blocking traffic to the system's other DNS servers 🤔
Maybe we could claim their routes. Would that cause a packet loop when we try to send traffic to an IP address we've already claimed?

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

Yeah I think it was about the number of IPs we get back.

I could poke around in the DNS code and try things like returning NXDOMAIN or SERVFAIL if we run out of time. ChatGPT thought SERVFAIL might be treated as a temporary error and the resolver would try again, but I haven't found MS docs to prove it yet

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024

Claiming routes or adding firewall rules might be a creative way to solve it.

I don't think it'd cause a loop -- we'd either drop the packet in connlib or if it's a resource it would be sent through a tunnel anyway

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

I was thinking if we use the system resolvers and we also claim their routes, we won't be able to reach them. Unless we have a way to bypass our own routes

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024

If we add them as routes and ensure our tun interface has metric priority, they should be effectively blackholed by connlib for the duration of the tunnel

from firezone.

ReactorScram avatar ReactorScram commented on June 12, 2024

Yeah so connlib can't reach them if connlib is blackholing them
Like my 192.168.1.1 DNS cache on my home router will be unreachable because connlib tries to send it a query and it just comes back to connlib

from firezone.

jamilbk avatar jamilbk commented on June 12, 2024

Ah, I'm following now. Yeah this seems tricky on Windows. On Linux we have fwmark, Android protect, and Apple NECP that handles this nicely for us.

Can think more about it, but unless we make major changes in connlib's DNS proxy I don't know if responding with NXDOMAIN or SERVFAIL won't cause unexpected application behavior. How long are those cached for? If a DNS server doesn't respond, we're at least pretty confident the application will ask again and again on each request.

It might not be perfect but we could do this:

but also keep them safe, and revert them, and if we crash the user is in deep trouble, and if DHCP keeps in we have to re-remove them and update our stashed servers.

and just run the risk of them not being reverted.

I would imagine our likelihood of crashing is lower than the likelihood of customers' Gateways going down or flapping

from firezone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.