• 0 Posts
  • 32 Comments
Joined 19 days ago
cake
Cake day: September 14th, 2025

help-circle
  • tal@olio.cafetoSelfhosted@lemmy.worldhow do I find process that leads to oom?
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    1
    ·
    edit-2
    1 day ago

    OOMs happen because your system is out of memory.

    You asked how to know which process is responsible. There is no correct answer to which process is “wrong” in using more memory — all one can say is that processes are in aggregate asking for too much memory. The kernel tries to “blame” a process and will kill it, as you’ve seen, to let your system continue to function, but ultimately, you may know better than it which is acting in a way you don’t want.

    It should log something to the kernel log when it OOM kills something.

    It may be that you simply don’t have enough memory to do what you want to do. You could take a glance in top (sort by memory usage with shift-M). You might be able to get by by adding more paging (swap) space. You can do this with a paging file if it’s problematic to create a paging partition.

    EDIT: I don’t know if there’s a way to get a dump of processes that are using memory at exactly the instant of the OOM, but if you want to get an idea of what memory usage looks at at that time, you can certainly do something like leave a top -o %MEM -b >log.txt process running to get a snapshot every two seconds of process memory use. top will print a timestamp at the top of each entry, and between the timestamped OOM entry in the kernel log and the timestamped dump, you should be able to look at what’s using memory.

    There are also various other packages for logging resource usage that provide less information, but also don’t use so much space, if you want to view historical resource usage. sysstat is what I usually use, with the sar command to view logged data, though that’s very elderly. Things like that won’t dump a list of all processes, but they will let you know if, over a given period of time, a server is running low on available memory.




  • tal@olio.cafetolinuxmemes@lemmy.worldWe have POSIX at home
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    edit-2
    3 days ago

    What’s the big deal with POSIX? Why are ppl constantly discussing what is and isn’t posix compliant?

    The short version: it’s a least-common-denominator standard that spans multiple Unix and Unix-like systems, so if you write to it, your software can fairly-trivially run on various systems.

    https://en.wikipedia.org/wiki/POSIX

    Windows has some level of Microsoft-provided Posix support, which is what the post is alluding to. I am fairly confident that it doesn’t have full Posix compliance. Cygwin, a separate, non-Microsoft, open-source effort, might qualify.

    kagis

    Okay, apparently it does confirm to a portion of the Posix standard:

    https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem

    The subsystem only implements the POSIX.1 standard – also known as IEEE Std 1003.1-1990 or ISO/IEC 9945-1:1990 – primarily covering the kernel and C library programming interfaces which allowed a program written for other POSIX.1-compliant operating systems to be compiled and run under Windows NT. The Windows NT POSIX subsystem did not provide the interactive user environment parts of POSIX, originally standardized as POSIX.2. That is, Windows NT did not provide a POSIX shell nor any Unix commands out of the box, except for pax. The NT POSIX subsystem also did not provide any of the POSIX extensions that postdated the creation of Windows NT 3.1, such as those for POSIX Threads or POSIX IPC.






  • Is your concern compromise of your data or loss of the server?

    My guess is that most burglaries don’t wind up with people trying to make use of the data on computers.

    As to loss, I mean, do an off-site backup of stuff that you can’t handle losing and in the unlikely case that it gets stolen, be prepared to replace hardware.

    If you just want to keep the hardware out of sight and create a minimal barrier, you can get locking, ventillated racks. I don’t know how cost-effective that is; I’d think that that might cost more than the expected value of the loss from theft. If a computer costs $1000 and you have a 1% chance of it being stolen, you should not spend more than $10 on prevention in terms of reducing cost of hardware loss, even if that method is 100% effective.




  • Setting aside Trump, I have no idea why people who can apparently be mostly reasonable about, say, cars subscribe to utterly batshit insane views about diet and health and buy into all kinds of snake oil.

    I’m not saying that there’s no magical thinking with cars — “my magical fuel additive” or whatever — but I have seen more utterly insane stuff regarding what someone should eat or how to treat medical conditions than in most other areas.

    It’s also not new. You can go back, and find people promoting all kinds of snake oil when it comes to health. Some of my favorites are the utterly crazy stuff that came out when public awareness of radiation was new, and it was being billed as a magic cure for everything.

    I get that not everyone is a doctor or a dietician. But you’d think that any time you see someone promoting something as a fix for a wide, unrelated range of conditions, that it should be enough to raise red flags for someone, layman or no.




  • If GRUB is having problems too, not just Linux, I’d be inclined to blame hardware of some sort. Do you have another stick of NVMe that you can swap in, see if that makes the issue magically go away? Maybe run off a USB drive, see what happens?

    Maybe less likely, but that processor is a 14th gen Intel desktop processor, one of the models affected by the voltage degradation problems. I burned up both a 13th gen and 14th gen processor myself. Looked like a variety of random errors, often related to memory, eventually not even managing to get through boot unless I disabled all but one of my cores. Might look into that. I assume that there’s a potentially-affected serial number range list somewhere.

    And you can run memtest86 to bang on the memory and CPU, see if anything comes up. If it runs into errors, then it probably isn’t the NVMe at fault.


  • Eh. It sounds like the thing is likely going out of business, and people are just batting around ideas to try to bring it back. Probably good odds that it won’t happen.

    Craigslea community kindergarten, a local childcare centre in Chermside West in Brisbane’s north, made national headlines this week after a series of emails to parents. The centre has been in turmoil for weeks and was closed after a mass exodus of staff before the school holidays.

    On Sunday, the management committee sent parents a 1,000-word email claiming the centre was “insolvent”, owing more than $40,314 to the tax office and employees. It proposed to “wind up” the centre, which has been placed into voluntary administration.

    The next day, in a second email the management committee proposed to charge $2,200 for a scrapbook of artwork produced by their children and photographs of them to help pay off the debt.





  • I’d also bet against the CMOS battery, if the pre-reboot logs were off by 10 days.

    The CMOS battery is used to maintain the clock when the PC is powered off. But he has a discrepancy between current time and pre-reboot logs. He shouldn’t see that if the clock only got messed up during the power loss.

    I’d think that the time was off by 10 days prior to power loss.

    I don’t know why it’d be off by 10 days. I don’t know uptime of the system, but that seems like an implausible amount of drift for a PC RTC, from what I see online as lilely RTC drift.

    It might be that somehow, the system was set up to use some other time source, and that was off.

    It looks like chrony is using the Debian NTP pool at boot, though, and I donpt know why it’d change.

    Can DHCP serve an NTP server, maybe?

    kagis

    This says that it can, and at least when the comment was written, 12 years ago, Linux used it.

    https://superuser.com/questions/656695/which-clients-accept-dhcp-option-42-to-configure-their-ntp-servers

    The ISC DHCP client (which is used in almost any Linux distribution) and its variants accept the NTP field. There isn’t another well known/universal client that accepts this value.

    If I have to guess about why OSX nor Windows supports this option, I would say is due the various flaws that the base DHCP protocol has, like no Authentification Method, since mal intentioned DHCP servers could change your systems clocks, etc. Also, there aren’t lots of DHCP clients out there (I only know Windows and ISC-based clients), so that leave little (or no) options where to pick.

    Maybe OS X allows you to install another DHCP client, Windows isn’t so easy, but you could be sure that Linux does.

    My Debian trixie system has the ISC DHCP client installed in 2025, so might still be a factor. Maybe a consumer broadband router on your network was configured to tell the Proxmox box to use it as a NTP server or something? I mean, bit of a long shot, but nothing else that would change the NTP time source immediately comes to mind, unless you changed NTP config and didn’t restart chrony, and the power loss did it.