GithubHelp home page GithubHelp logo

Comments (21)

msrd0 avatar msrd0 commented on June 11, 2024 1

This is less of a security "thread" and more of a privacy issue. If I were hosting a DNS server using trust-dns software, I don't want any log of my user's ip addresses or requests. I will always have the opportunity to modify the code on my server to report those details, so a little bit of trust is always necessary.

That being said, here in the EU, IP addresses can be considered personal data under the GDPR. Given the nature of DNS queries, it is impractical to ask for a user's consent to store their IP address. This means it is only lawful to do so if storing the IP address is necessary under certain criteria. Unfortunately, I believe that logging IP addresses is unnecessary, so it will be hard to argue should someone complain.

The only option left currently is to manually implement the logger downstream and apply some regular expression that tries to remove sensitive data such as domain names or IP addresses from the log messages, given that even error messages can contain such details. It would, however, be much easier if trust-dns just removes those details from the log messages again.

from hickory-dns.

djc avatar djc commented on June 11, 2024 1

I would argue that info shouldn't contain PII either (as IIRC info is commonly enabled by default), so only debug and trace should. Also I think we can be explicit that this is only about source IPs? That is, it mainly concerns the server logging the IP of requests it receives -- which reduces the scope quite a bit. IMO IPs contained in DNS responses are unlikely to be PII, as are name server IPs.

from hickory-dns.

djc avatar djc commented on June 11, 2024 1

I think there's still value in terms of figuring out how many messages are coming in, the message ID is useful to correlate with packet captures.

from hickory-dns.

jpds avatar jpds commented on June 11, 2024 1

to be clear, I think you mean, “logs the client IP address”, correct?

The opposite, obfuscating a client IP address / the domain it requested:

$ sudo ./coredns
.:53
CoreDNS-1.11.1
linux/amd64, go1.20.7, ae2bbc2
[INFO] 127.0.0.1:46610 - 40617 "A IN bbc.com. udp 48 false 1232" NOERROR qr,aa,rd 79 0.000286699s

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

Is there any guidance you want to provide on why this is a privacy issue? This is valuable debug information, so we could split this into two separate log lines? one with full context, one with only the request parameters and not the source IP? Other option would be to have a parameter to choose to log or not? or config for the log line itself?

I'm open to options.

from hickory-dns.

LuckyTurtleDev avatar LuckyTurtleDev commented on June 11, 2024

Is there any guidance you want to provide on why this is a privacy issue?

I have write a pi-hole clone. It can also run at a public server so android can access it via dot.
But after the updating trust dns server to 0.23.0 it has start logging every ip which, connect to it and the requested domain.
Since this are very private data I think this is a big privacy issue.

I can increase the log level to warning, but then it does not even show the open ports/protocols and some other important inflammations anymore. And logging them with warn is also wrong because they are no warnings.

from hickory-dns.

msrd0 avatar msrd0 commented on June 11, 2024

Setting log level to warn.

This is not sufficient, as I have seen IP addresses and domains queried in error messages as well.

from hickory-dns.

XOR-op avatar XOR-op commented on June 11, 2024

What is your threat model? I think your program can always see the details of those requests. So in my assumption, you want to dump your logs into a file, but don't want other users/process under the same user read those logs. For this case, I would suggest you set a write-only permission for the log file (-w-------). Correct me if I make a wrong assumption.

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

I think this is a reasonable feature request, I just don't know off the top of my head how we should implement it. It would require a fairly extensive audit to identify all the potential areas where we log source IPs, and then have a solution in place to not log them.

I'm definitely open to us making this change. I think one thing we could do is say that all info logs and lower (debug, etc) can contain IPs but warnings or maybe error and up messages should be treated as restricted to not containing IPs. Then disabling info logs and only enabling warning or error would ensure that no IPs would be logged. That seems like the simplest thing to do, would that be sufficient?

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

I think the problem I have right now is that we have, imo, a conflict between operational issues and privacy. I think the IP information for requests coming in on a publicly accessible server is important in case there is some malicious behavior. This could be used to help identify if there are folks working to DOS or DDOS a system. Then there is the use case mentioned here in regards to making sure we're not disclosing the behavior of an individual user. To me, the trust-dns service operator will always have access to this information (if they want it) regardless of what we do in the trust-dns server code. That is, it's not hard to sniff traffic on a server, and collect all of these details. quic and https can make that harder, but still the potential exists.

So I see a couple of options for us. The trust-dns server currently supports these log levels out of the box: info is default, debug, and quiet, where quiet will only report warning or error messages. I'd like us to come up with a policy that balances all the needs that folks have. I think to answer what should appear in log messages at what time, we need to identify the various people involved. First, I'm going to assume we are all ok with any information being present in DEBUG log lines. This gives the power to an operator to collect a lot of information, but that is a certain amount of trust that is being put into the DNS resolver/server being used, so I think this is acceptable. So let's not discuss DEBUG log lines for now.

That leaves INFO, WARN, and ERROR. When the INFO log level is disabled, there is no output from the server at all on any request. Based on this conversation, what I'm gathering is that INFO is enabled by default in many cases (it is by the trust-dns binary as well). What I want to know is that given this, it can be disabled today, why is INFO being enabled at all? Is it reasonable for us to suggest to people that they do not enable INFO on trust-dns when privacy is important? It sounds like based on this conversation, people think that isn't putting us on a privacy first posture. I'm not sure I agree with that at the moment, but I can be convinced. So assuming we remove this information from INFO, then my question becomes what information do people want in the standard INFO log line? For reference this is the default INFO log line on a request from trust-dns:

1694015948:INFO:trust_dns_server::server::server_future:909:request:43742 src:UDP://127.0.0.1#60636 QUERY:www.example.com.:A:IN qflags:RD response:NoError rr:1/0/0 rflags:RD,AA

Let's split this out and discuss what things are ok, I'll have checkmarks for each item, unchecked things are what I assume the folks on this issue believe should not be included:

  • 1694015948 - timestamp
  • INFO:trust_dns_server::server::server_future:909:request - level, location, and type
  • 43742 - DNS message ID from the request packet
  • src:UDP - request is via UDP
  • //127.0.0.1#60636 - source IP address and port (disclosing PII)
  • www.example.com. - requested name (disclosing the name requested)
  • A - record type (is this safe?)
  • IN - DNS class
  • qflags:RD - query flags set
  • response:NoError - response code
  • rr:1/0/0 - response record counts, answer/authority/additional
  • rflags:RD,AA - response flags set

I realize I'm being a little pedantic here, but I'm trying to figure out the best path forward. Without the source IP and query name, what is valuable out of this message to keep? Or said another way, what information do people need to track on each request made to the server and for what reason?

In terms of error and warn messages, it will be problematic not to include the requested details of what query failed, and then if there is malicious behavior, the errors and warnings could help identify that (like badly configured zone or server, slow TCP connections, scanning operations trying to disclose and query the entire catalog or zone file, etc). This is why I'm struggling with a proper response to this issue. It feels like there are competing priorities depending on the needs people have for the logs.

from hickory-dns.

msrd0 avatar msrd0 commented on June 11, 2024

I guess to prevent DoS attacks an similar, a rate limit per IP would be much more effective, at least for people like me that don't inspect what their server does 99% of the time. That being said, I agree that the log messages is kinda irrelevant, especially when you remove IP and Domain name from it, so why don't you just make those log messages optional? If you suspect your server is under attack, you could enable logging, and if that's not the case, you can just leave it disabled?

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

so why don't you just make those log messages optional?

Currently I'd argue the INFO log lines are optional, you must enable them for them to be visible. In the trust-dns binary, we default to INFO being on but it's easily disabled with the quiet flag. I think we'd still need to go through and cleanse WARN and ERROR messages of any IP data, and then we'd set a policy in the project recognizing that... (maybe we could add some tests as well somehow)

from hickory-dns.

djc avatar djc commented on June 11, 2024

From the message contents you listed, I think it's really only the source IP that's problematic in terms of privacy. I do think a good default would be to avoid logging PII at levels above debug. It's fine that the operator still has ways to access this data, but at the same time it's good to avoid making it easy to leak by accident.

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

Thanks, @djc, do you feel like it's ok to keep the requested name in the logs? Is that something we might want to leave in INFO, but remove from WARN and ERROR?

from hickory-dns.

LuckyTurtleDev avatar LuckyTurtleDev commented on June 11, 2024

I think requested domains are very private data too, especially if the server is only used by a few users . I would like to avoid logging them at info/warn/error.

from hickory-dns.

djc avatar djc commented on June 11, 2024

That's fair, I guess the requested domain is also something that should not be logged at the default log levels (IMO there's no difference between INFO and WARN in terms of privacy). There's a reason a bunch of people are working on EncryptedClientHello for TLS, and it's mostly because they want to avoid leaking the server name.

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

If we are agreeing to not log name/ip etc, how much value is there in the rest of the message? should we just drop it to a DEBUG line, and then do an audit on other messages to see if there's data we don't want there?

from hickory-dns.

jpds avatar jpds commented on June 11, 2024

I cannot think of any other DNS server software that implements this (or even a HTTP server that does this). Unbound, BIND, CoreDNS, and NSD do not do this.

As for GDPR and from a production systems and networks perspective; logging client IPs would fall squarely under https://gdpr-info.eu/recitals/no-49/ and I'm not going to wait for an attack to happen, or worse, "suspecting" that one has happened before I turn on logging. You'd have very little in terms of figuring out what actually happened to the compromised system (which, at this point, cannot be trusted anymore and you'd also have senior management breathing down your neck for answers).

still value in terms of figuring out how many messages are coming in

It's far better to do this at a metric level, I filed #2032 about this.

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

any other DNS server software that implements this

to be clear, I think you mean, “logs the client IP address”, correct?

I think we’re all in agreement with this at this point. I’ll even happily make the change, the rest of the discussion was just around the value here.

from hickory-dns.

bluejekyll avatar bluejekyll commented on June 11, 2024

Thank you for clarifying, I was able to interpret your comment both ways. This gets me back to some of my original questions about just making these guarantees in different logging levels, and saying INFO will have this, but WARN and ERROR will not?

from hickory-dns.

LuckyTurtleDev avatar LuckyTurtleDev commented on June 11, 2024

I still thinking adding the information at a log message manual by the user would be easier than filter the out by using a regex. Especial for domains this would be not easy, because they end at . like many other stuff and a regex would match to much or not all domains.

An other option would be a option to configure, the logging of this information independent of the log level.
For example by function or feature flag.

from hickory-dns.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.