What storage software could I run to have an archive of my personal files (a couple TB of photos) that doesn’t require I keep a full local copy of all the data? I like the idea of a simple and focused tool like Syncthing, but they seem to be angling towards replication.

Is the simple choice to run some S3-like backend and use CLI or other client to append and browse files? I’d love something with fault tolerance that someone can gradually add disks to. If ceph were either less complicated or used less resources I’d want to do that.

  • lemmyvore@feddit.nl
    link
    fedilink
    English
    arrow-up
    14
    ·
    9 months ago

    Borg Backup. It can work locally or over network. Takes snapshots of the files you give it. Performs deduplication, compression and optionally encryption. You can check the integrity of the backups and repair them. There’s a very simple to use GUI for it called Pika Backup to get you started.

  • solrize@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    9 months ago

    I use Borg Backup to a Hetzner storage box but doing the same thing to a disk array would work fine. How much data are you talking about? What is the usage picture? Backup and archiving are really not the same thing.

    • jkrtn@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      I was looking at Borg but that’s one of the tools where it seems like I need the entire replicated copy of the dataset locally to add more. I believe Borg can open a view into previous versions of the data, so it’s technically append only, but I’d find that process tedious.

      These are a couple TB and mostly photos I’ve taken. I’d like to be able to browse and edit at some point, but my primary concern right now is keeping a copy of everything.

      • solrize@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        Yeah that’s more of an archive than a backup scenario. I have a small self hosted Nextcloud that I use for stuff like that. For a few TB, you might consider Hetzner Storage Cloud which is really Nextcloud. It is backed up daily which is a help.

        • jkrtn@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          9 months ago

          How was it setting up and running Nextcloud? I’m very curious about their office software, looks fun.

          • solrize@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            9 months ago

            As I remember, setting it up was kind of a pain, but once runnnig it hasn’t neded attention. I don’t use the fancy apps. Also, by now there might be an apt package or docker container or something of that sort. I haven’t used their fancy apps much. My main use of it is to upload photos from my phone so I can access them from other devices.

  • skilltheamps@feddit.de
    link
    fedilink
    English
    arrow-up
    6
    ·
    9 months ago

    that doesn’t require I keep a full local copy of all the data

    If you don’t do that, the place that you call “backup” is the only place where it is stored - that is not a Backup. A backup is an additional place where it is stored, for the case when your primary storage gets destroyed.

    • jkrtn@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      9 months ago

      “Local” as in the machine I am using to work on, which has a 256 GB SSD. Not as in “on-site” and “off-site.”

      • computergeek125@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        9 months ago

        In the IT world, we just call that a server. The usual golden rule for backups is 3-2-1:

        • 3 copies of the data total, of which
        • 2 are backups (not the primary access), and
        • 1 of the backups is off-site.

        So, if the data is only server side, it’s just data. If the data is only client side, it’s just data. But if the data is fully replicated on both sides, now you have a backup.

        There’s a related adage regarding backups: “if there’s two copies of the data, you effectively have one. If there’s only one copy of the data, you can never guarantee it’s there”. Basically, it means you should always assume one copy somewhere will fail and you will be left with n-1 copies. In your example, if your server failed or got ransomwared, you wouldn’t have a complete dataset since the local computer doesn’t have a full replica.

        I recently had a a backup drive fail on me, and all I had to do was just buy a new one. No data loss, I just regenerated the backup as soon as the drive was spun up. I’ve also had to restore entire servers that have failed. Minimal data loss since the last backup, but nothing I couldn’t rebuild.

        Edit: I’m not saying what your asking for is wrong or bad, I’m just saying “backup” isn’t the right word to ask about. It’ll muddy some of the answers as to what you’re really looking for.

        • jkrtn@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          Yes, I do see that. I’m definitely getting answers to a question I didn’t intend. I was hoping for more of an rsync but that something which also provides viewing and incremental backups to an offsite. I don’t know how to phrase that, and perhaps for what I want it makes more sense to have rsync/rclone to copy files around and something else to view.

  • deegeese@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    5
    ·
    9 months ago

    Are we talking personal offsite backup, or a commercial cloud service?

    For cloud backups I like BackBlaze but I’ve never tried to use it as a general cloud storage drive.

    • jkrtn@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      This would be self-hosted and local, one of the locations in a 3-2-1 strategy. BackBlaze would work for an offsite but I already have that portion covered.

      • deegeese@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 months ago

        that doesn’t require I keep a full local copy of all the data

        So you want a local self hosted backup, but also not a full copy? So like backup only recently changed files?

        • jkrtn@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          I want like one local device to have a full copy, but the devices writing new data into that one do not need a full copy.

          • ironsoap@lemmy.one
            link
            fedilink
            English
            arrow-up
            2
            ·
            9 months ago

            In technical terms you mean doing an incremental or differential back up to a local network storage location, correct?

            • jkrtn@lemmy.mlOP
              link
              fedilink
              English
              arrow-up
              1
              ·
              9 months ago

              “Incremental” sounds right. I want it to act like rsync without deleting files on the destination, so all the folders are merged. (It would be cool if it kept versions but I don’t absolutely need that.) Tools like Borg or Restic look great, but I have been searching to see if they support this kind of usage and they seem not to.

  • francisfordpoopola@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    9 months ago

    Where will the target be? Online or local? Rsync is really easy to use and the target files are browse-able. I could be too dense but I find online buckets aren’t easily browse-able. Even a homemade NAS might be a good choice and it’s easily scalable.

  • DeltaTangoLima@reddrefuge.com
    link
    fedilink
    English
    arrow-up
    3
    ·
    9 months ago

    I use rclone, with encryption, to S3. I have close to 3TB of personal data backed up to S3 this way - photos, videos, paperless-ngx (files and database).

    Only readable if you have the passwords configured on my singular backup host (a RasPi), or stored in Bitwarden.

  • zeluko@kbin.social
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    9 months ago

    So i understood you just want some local storage system with some fault tolerance.
    ZFS will do that. Nothing fancy, just volumes as either blockdevice or ZFS filesystem.

    If you want something more fancy, maybe even distributed, check out storage cluster systems with erasure coding, less storage wasted than with pure replication, though comes at reconstruction cost if something goes wrong.

    MinIO comes to mind, tough i never used it… my requirements seem to be so rare, these tools only get close :/
    afaik you can add more disks and nodes more or less dynamically with it.

    • jkrtn@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      Yeah it’s hard to find something that perfectly fits just what you want. I think it’s better if I do something simple like ZFS and maybe some kind of file server on top.

    • jkrtn@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      That’s top of my list for moving the files if I do an S3 or WebDAV backend. I’m overthinking this, aren’t I? Just find a WebDAV server, set it up, use rclone to append files and pretty much everything else will be able to browse.

      • YurkshireLad@lemmy.ca
        link
        fedilink
        English
        arrow-up
        2
        ·
        9 months ago

        Haha it’s easy to overthink things sometimes. I’m guilty of that. I’m using SFTPGo at home to serve files from a small server.

  • Nomecks@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    9 months ago

    Save your files to a local s3 object storage mount, enable versioning for immutability and use erasure coding for fault tolerance. You can use Lustre or some other S3 software for the mount. S3 is great for single user file access. You can also replicate to any cloud based S3 for offsite.

  • Decronym@lemmy.decronym.xyzB
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    9 months ago

    Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

    Fewer Letters More Letters
    Git Popular version control system, primarily for code
    NAS Network-Attached Storage
    RAID Redundant Array of Independent Disks for mass storage
    SSD Solid State Drive mass storage
    ZFS Solaris/Linux filesystem focusing on data integrity

    5 acronyms in this thread; the most compressed thread commented on today has 2 acronyms.

    [Thread #523 for this sub, first seen 17th Feb 2024, 23:55] [FAQ] [Full list] [Contact] [Source code]