So I’ve got a home server that’s having issues with services flapping and I’m trying to figure out what toolchain would be actually useful for telling me why it’s happening, and not just when it happened.

Using UptimeKuma, and it’s happy enough to tell me that it couldn’t connect or a 503 happened or whatever, but that’s kinda useless because the service is essentially immediately working by the time I get the notice.

What tooling would be a little more detailed in to the why, so I can determine the fault and fix it?

I’m not sure if it’s the ISP, something in my networking configuration, something on the home server, a bad cable, or whatever because I see nothing in logs related to the application or the underlying host that would indicate anything even happened.

It’s also not EVERY service on the server at once, but rather just one or two while the other pile doesn’t alert.

In sort: it’s annoying and I’m not really making headway for something that can do a better job at root-cause-ing what’s going on.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    5 months ago

    I agree with the other comment. Look into the actual logs of the services. If they send a 503, they should be able to provide an explanation.

    If you’re asking if your ISP is alright… You can monitor that. Monitor if DNS is working, monitor if a ping to some server has hiccups.

    And then do it methological. Is it just completely random services? Then it’s likely that your monitoring has connectivity issues. Or is there some structure to what you’re seeing? Do the issues all concern the same server? Or location? Or protocol? Then it’s maybe that. Or it’s a bit more complicated but they share a common thing, software or infrastructure element.

    Edit: Alright, I didn’t notice this was all concerning your same homeserver… Maybe set up some local monitoring? See if it’s different from the perspective of the computer itself, or just if viewed from the internet? You can also monitor some performance parameters: Is there enough free RAM, is the CPU busy, are you close to maximum upload bandwidth, is the I/O too much… But I suppose the main question is: Is it a network issue? And if yes, where and what kind.

    If you’re using Cloudflare or some other tunneling solution, that could also be the issue.