why host your files when someone else can do it for you

nave@lemmy.zip · edit-2 1 year ago

why host your files when someone else can do it for you

Borkingheck@lemmy.world · 1 year ago

I don’t know what any of this means. Can I get a dumbed down explanation?

_dev_null@lemmy.zxcvn.xyz · edit-2 1 year ago

A website can be composed of a bunch of files that your browser downloads and then renders to what you see on your device.

One common type of file contains javascript code (aka js assets), which can sometimes be relatively large, like several megabytes (MB). If a website gets hit by a lot of users, those MBs add up, and can chew through the bandwidth allotted for the given website. Consuming more bandwidth can cost more money for the website operator, who pays a hosting company for the website’s resources (disk space, compute time, network bandwidth).

To help alleviate this, and to also make these downloads faster around the world, Content Distribution Networks(CDN) exist. The idea is that you upload your large files to the CDN, have your website link to the CDN for big files, and now browsers pull big files from the CDN when the website is loaded instead of the website’s host itself. However, contracting with a CDN costs money too, just maybe not as much as a web host charges for hitting bandwidth overages.

Another important component to note: archive.org is a non-profit that in part has a web crawler whose entire purpose is to periodically take a snapshot of every website on the internet. This isn’t just a screen cap of each website either, it’s a copy of all of the files that actually compose the website. This is an oversimplification, but is good enough for the concluding example that follows.

So back to the case in the OP. What the dev did, was choose not to pay for and utilize a CDN to link to, but rather used archive.org’s copy of large file(s) to link to. So when a user loads the website, all of the bandwidth hogging files are being served for free from archive.org. But it’s really not free from archive.org’s perspective, since they’re the ones ultimately paying for the bandwidth.

edit: Added the crawler bit.

aes@lemm.ee · 1 year ago

deleted by creator

aes@lemm.ee · edit-2 1 year ago

You download a copy of a photo I took to your computer.

I have a website that lets people see the photo, it’s a popular website

Except that photo on my website doesn’t point to a copy of that photo on one of my computers, it points to the copy on yours.

Millions of people visit my website, and each time they do, they download your copy of my photo.

Uploading that photo to millions of computers across the world fucks up your internet service. You could also switch out my photo for another one, maybe even an offensive one, but my website would still point visitors to it.

In the original post, this is what a multibillion dollar corporation, a bank, did to a not-for-profit service that keeps a historical record of the internet.

I hinted at the security implications of what happened, but explaining that would make the analogy too complex.

Aceticon@lemmy.world · edit-2 1 year ago

Lets go a little beyond merelly hinting at the security implications:

The files being hosted by that 3rd party are Javascript, which is code that runs on the browser.
Barclays is a bank.

So people go to the website of a bank and their browser receives code from a 3rd party with whom the bank has no contract and who have nothing in place to obbey the level of security that is required by a banking site.

This is way more “interesting” that the photo from that example of yours (which doesn’t have any executable code, only data, being fed to very mature image decoding libraries so it’s many times harder to find exploits for it than for code)

Consider the implications of getting the Barclays website to serve (from the point of view of a user) what can easilly be malware…

aes@lemm.ee · edit-2 1 year ago

Fair, although explaining a potential vector for a hypothetical XSS attack and its implications to someone who doesn’t know what Javascript is sounds like information overload