I’ve been experiencing some connectivity issues when trying to use Telepresence with my Kubernetes cluster, which is behind Twingate. I wanted to share my situation and see if anyone has encountered similar issues or has suggestions to resolve them.
Environment:
Kubernetes cluster hosted on Azure (Azure VNet)
Twingate connector deployed within the same VNet
Telepresence version v2.14.2
Symptoms:
I can ping the Kubernetes service using the service name, but the IP it resolves to (100.x.x.x range) seems incorrect.
Directly curling the service IP or pod IP works fine.
When attempting a recursion check with Telepresence, I receive errors related to the lookup of tel2-recursion-check.kube-system.tel2-search.
After adding tel2-recursion-check.kube-system.tel2-search resource in twingate I get the following warning: “DNS doesn’t seem to work properly”
Observations:
Twingate on its own works flawlessly.
Telepresence works correctly with a different Kubernetes cluster that is not behind Twingate.
I suspect there might be a DNS resolution conflict between Twingate and Telepresence.
Things I’ve Tried:
Checked the IP configuration on Windows and observed that the Twingate adapter is present with DNS servers in the 100.95.x.x range.
Reviewed Telepresence configurations and logs.
Modified Telepresence settings to allow conflicting subnets.
I’m wondering if Twingate might be resolving the DNS first before Telepresence gets a chance. Has anyone else faced a similar situation or have any insights on how to get Telepresence to work seamlessly with Twingate? I’m also open to suggestions on modifying settings in the Azure VNet if that might help.
Thank you in advance for your assistance and insights!
While I can’t speak specifically to the interactions with Telepresence, or how Telepresence works, I can outline how DNS is handled by Twingate and hope it points you in the right direction.
The way we handle things when a hostname is involved (using tel2-recursion-check.search (shortened, I know) as an example. For this example assume tel2-recursion-check.search is defined as a Twingate resource as you have outlined above.
While the Twingate client is running, it will adjust the DNS servers at the system level to CGNAT IPs (100.95.x.x).
When the system attempts to access www.google.com, the client says “www.google.com isn’t a Twingate resource, so I don’t need to get involved. I’ll just forward this to whatever DNS servers were set before we started.”
When the system attempts to access tel2-recursion-check.search, the Twingate client goes “Oh hey that’s something I’m supposed to handle!!!” and instead of returning the “correct” IP of 10.1.2.3 (or whatever), it will return an IP in the CGNAT range (100.x.x.x), and then the system will route the traffic to that IP, Twingate stays involved, and sends the traffic to the remote resource you have defined via your deployed connectors.
That’s it in a nutshell, and we have a much more detailed page on DNS with Twingate here
If Telepresence is expecting a particular IP/response when it is doing lookups, Twingate returning the CGNAT IP could definitely be causing problems.
And/or Telepresence could just be interpreting the CGNAT IPs as “invalid” because they’re CGNAT IPs.
I hope this helps but feel free to add any other questions you might have or additional info.
Thanks for the response, I think I had picked up most of that from the docs, I am not super familar with networking. I think telepresence tries to do something similar to twingate to connect to the resources in a kubernetes cluster. Here is a link to the DNS explanation for telepresence DNS resolution Telepresence and their docs for working with VPNs but I am not sure if it can apply for twingate though.
I would really love to get this working as I find twingate to be awesome and I dont want to have to fall back to a VPN
I’m still a little fuzzy on Telepresence but I suspect your theory is right.
That being said - I’m not a k8s guy so forgive me if any of this is completely out to lunch, but looking at Telepresence docs, it looks like TP requires the publicly available ingress IP of the cluster to get started, and then handles proxying everything back and forth.
To me, this suggests that if you add that particular IP as a Twingate resource (but don’t worry about anything else) Telepresence should be able to do what it does, with Twingate facilitating network communication between your device and the cluster.
In your example, BEFORE you add tel2-recursion-check.kube-system.tel2-search as a Twingate resource, does it resolve at all in your machine via Telepresence or otherwise? I would just assume that because it’s not a Twingate resource, we don’t even care and as long as TP is still running it should handle it.
Another thing you can do, is try stopping the telepresence service on your machine, connect to Twingate, and start TP after the fact and see if your behaviour changes.
Thanks for the help I really appreciate it, I have spent a lot of time trying to resolve this.
Removing the tel2-recursion-check.kube-system.tel2-search resource causes the following issue when connecting tp error daemon/session/dns/RecursionCheck : unexpected error during recursion check: lookup tel2-recursion-check.kube-system.tel2-search: operation was canceled
I have tried adding several different ips and CIDRs as resources but I couldn’t get anything to work.
I have a partial workaround, for now, I can add . as a resource in TG which is fine when we have just 1 cluster behind TG but I think once we add the other clusters there will be issues as each cluster has the same services and namespaces, it also doesn’t allow me to intercept k8s traffic.
I have tried every possibility of starting and stoping TP and TG I can think of
That’s very interesting that you’re finding a difference between the two. I will mention it to my client team and see if they know of any reason there might be a difference in behaviour. Let me know if you discover anything else.