We’re (a group of friends) building a search engine from scratch to compete with DuckDuckGo. It still needs a name and logo.

Here’s some pictures (results not cherrypicked): https://imgur.com/a/eVeQKWB

Unique traits:

  • Written in pure Rust backend, HTML and CSS only on frontend - no JavaScript, PHP, SQL, etc…
  • Has a custom database, schema, engine, indexer, parser, and spider
  • Extensively themeable with CSS - theme submissions welcome
  • Only two crates used - TOML and Rocket (plus Rust’s standard library)
  • Homegrown index - not based on Google, Bing, Yandex, Baidu, or anything else
  • Pages are statically generated - super fast load times
  • If an onion link is available, an “Onion” button appears to the left of the clearnet URL
  • Easy to audit - No: JavaScript, WASM, etc… requests can be audited with F12 network tab
  • Works over Tor with strictest settings (official Tor hidden service address at the bottom of this post)
  • Allows for modifiers: hacker -news +youtube removes all results containing hacker news and only includes results that contain the word “youtube”
  • Optional tracker removal from results - on by default h No censorship - results are what they are (exception: underage material)
  • No ads in results - if we do ever have ads, they’ll be purely text in the bottom right corner, away from results, no media
  • Everything runs in memory, no user queries saved.
  • Would make Richard Stallman smile :)

THIS IS A PRE-ALPHA PRODUCT, it will get much MUCH better over the coming months. The dataset in the temporary hidden service linked below does not do our algorithm justice, its there to prove our concept. Please don’t judge the technology until beta.

Onion URL (hosted on my laptop since so many people asked for the link): ht6wt7cs7nbzn53tpcnliig6zrqyfuimoght2pkuyafz5lognv4uvmqd.onion

  • Onno (VK6FLAB)@lemmy.radio
    link
    fedilink
    arrow-up
    130
    ·
    10 months ago

    I love the notion. The marketing “better than DDG” is a little janky. Perhaps consider a positive statement, like “finally find what you’re looking for”.

    This is a crowded landscape. I’ve been here since Gopher and seen plenty of services come and go. With that in mind, here are some questions you might want to consider:

    How does it compare with products like SearXNG, specifically their ecosystem of plug-in search types?

    How do you plan to pay for it?

    How do you expect to protect the index against spam?

    How will you scale it to a global audience?

    How will you handle language?

    Good luck!

    • UnHidden@lemmy.worldOP
      link
      fedilink
      arrow-up
      28
      arrow-down
      1
      ·
      10 months ago

      To answer your questions in order:

      • We have our own index, its not a shitshow of mixed results like Searx tends to be. this also means that we’re not chasing breaking changes of some larger engine when they decide they dont want us, like Twitter did to Nitter, and Bing did to Searx.
      • We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.
      • We’ve already got spam prevention and removal measures in place, but I won’t discuss them.
      • We don’t know how to scale it since its centralized by design and the frontend and backend are tightly integrated, largely because the frontend is largely generated on the fly by the backend. Maybe host a copy for each region we’re aiming to acquire users from?
      • Our engine already understands 5 languages, and we hope to expand to CJK languages soon.
      • hydroptic@sopuli.xyz
        link
        fedilink
        arrow-up
        37
        arrow-down
        2
        ·
        edit-2
        10 months ago

        We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.

        A subscription-based model might be the only viable one, since ads will inevitably lead to a conflict of interest and voluntary donations are mostly a no-go. The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

        Personally I pay for Kagi which has been decent enough

        • Amerikan Pharaoh@lemmygrad.ml
          link
          fedilink
          arrow-up
          14
          arrow-down
          2
          ·
          edit-2
          10 months ago

          I mean, a search engine is literally the last thing on the internet I’d pay a subscription for. In a world where literally everything else nickels-and-dimes us for subscription service, search engines, torrent trackers, game modders who paywall their mods, and other kitschy non-essentials are literally the first things to get shuffled off the monthly budget.

          If we weren’t in such a deep recession that I pay as much a week for my gas as I do my groceries, with rent and ACTUAL bills eating the majority of what’s left, I’d feel a bit differently; but if wishes were horses, we’d all ride. I literally had to start growing my own green rather than buying it, the economy’s so shit.

        • KoboldCoterie@pawb.social
          link
          fedilink
          English
          arrow-up
          11
          ·
          10 months ago

          The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

          A huge part of that is that most people don’t consider privacy concerns to be a cost. All they factor into their evaluation is whether it costs them actual money.

        • Orbituary@lemmy.world
          link
          fedilink
          arrow-up
          9
          ·
          edit-2
          10 months ago

          Personally I pay for Kagi which has been decent enough

          Whats “decent enough” mean? I’ve been curious and you’re the only person I’ve known who pays for it.

          • pacmondo@sh.itjust.works
            link
            fedilink
            arrow-up
            12
            ·
            10 months ago

            I pay for it, the results are quality and the fact that my brain doesnt have to sift through ad results and can just look at the real data is so nice. Additionally, they have a large number of “lenses” which can change the scope of your search. For example, they have a lens for searching lemmy as well as lenses for the “small web”, which filters out all the results from massive corporate websites and gives way more personal project sites and the like.

            All in all I’m a fan.

            • sudneo@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              10 months ago

              I personally like a lot the gazillion bangs also available, the personal up/downranking/blocking of websites and their quick answer is often fairly good (I mostly use it for documentation lookup). The lenses are definitely the best feature though, especially coupled with bangs. I converted even my wife who really loves it.

          • ParetoOptimalDev@lemmy.today
            link
            fedilink
            arrow-up
            4
            ·
            10 months ago

            I never thought id pay for Kagi and that paying for a search engine was ridiculous. Then I kept seeing loudly positive feedback from reputable people in my circle and tried the trial.

            I pay for it and never have the “I only ever use !g on duckduckgo” problem.

            Sorting by web pages with least ad trackers is a cheat code to find old style websites with people sharing knowledge for knowledge’s sake rather than profit.

          • DominusOfMegadeus@sh.itjust.works
            link
            fedilink
            arrow-up
            1
            ·
            10 months ago

            Just my two cents, but I keep trying it out and I have not seen anything good enough to warrant paying for it. And I am not against paying for privacy, I pay Proton.

      • WetBeardHairs@lemmy.ml
        link
        fedilink
        arrow-up
        5
        ·
        10 months ago

        You could let people host their own as a method of scaling. But that limits it to geeks like us.

        Use kubernetes and let it scale and pay for hosting on cdns.

  • ProdigalFrog@slrpnk.net
    link
    fedilink
    English
    arrow-up
    88
    arrow-down
    4
    ·
    10 months ago

    Ahh, you’re the guys who posted over in reddit before your thread got locked that think it’s a good idea to promote Russian propaganda equally with Ukrainian content, because you don’t want to ‘Take sides’ politically. Closed source too, so that’s pretty much a dealbreaker right there, especially for Privacy focused users. We’ve been abused by closed source software for far too long to trust anything less.

    You also have absolutely no plan on how to monetize, as others have said in this thread already.

    I certainly won’t be supporting you, not with those values.

  • ExtremeDullard@lemmy.sdf.org
    link
    fedilink
    arrow-up
    70
    ·
    edit-2
    10 months ago

    I applaud your efforts and I admire your idealism.

    Unfortunately, the minute you get the bill from your internet provider, you’ll need to find a way to pay for it, and your good intentions will instantly dissolve in the murky realities of modern corporate surveillance capitalism.

    But at least while you haven’t gotten your first bill, it’s refreshing to watch your enthusiasm.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      arrow-up
      16
      ·
      10 months ago

      pay for it

      I wonder what a distributed search engine would look like. Basically, the index would be sharded across user computers, and queries would hit some representative sample of that index. This means:

      • hosting costs are very low - just need a way to proxy requests to the network
      • search times should improve as more people use the service
      • no risk of the service logging anything - individual nodes don’t need to know who requested the data, just who to send the response to

      My biggest concern is how to build the index, but if OP is willing to share that, I might start hacking on a distributed version.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          arrow-up
          3
          ·
          10 months ago

          Awesome! That’s pretty much exactly what I’m looking for, though I’m interested to see how easy it is limit certain peers to certain functions. Not everyone has resources to crawl and index pages, but a lot of people can store the index.

          I’m interested in having client-side web storage, so you can participate in the network by just having the search page open (opt-in of course).

          I’m honestly not actively working on it, but if OP provides the database and/or crawler, I’ll do some research on feasibility.

        • Waraugh@lemmy.dbzer0.com
          link
          fedilink
          arrow-up
          2
          ·
          10 months ago

          This is really neat and I’m just hearing about it after over twenty years of development. I need to try it out, thank you. How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

          • grue@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            10 months ago

            How do you stay in the know about this kind of stuff?

            By being terminally online, I guess?

            More concretely, I’ve spent (probably too much) time on Slashdot, Reddit and now Lemmy over the years (subscribed to Free Software and privacy-related communities in particular). Also, looking through sites like https://awesome-selfhosted.net/ and https://www.privacytools.io/, wiki-walking through articles about Free Software projects on Wikipedia, browsing the Debian repositories, etc.

            I’m sure there are plenty of things I haven’t heard of either, though.

          • ElectroVagrant@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            10 months ago

            How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

            I was going to mention YaCy as well if nobody else was, so I can chip in to this somewhat. My method is to keep wondering and researching. In this case it was a matter of being interested in alternative search engines and different applications of peer to peer/decentralized technologies that led me to finding this.

            So from this you might go: take something you’re even passingly interested in, try to find more information about it, and follow whatever tangential trails it leads to. With rare exceptions, there are good chances someone out there on the internet will also have had some interest in whatever it is, asked about it, and written about it.

            Also be willing to make throwaway accounts to get into the walled gardens for whatever info might be buried away there and, if you think others may be interested, share it outside of those spaces.

        • grue@lemmy.world
          link
          fedilink
          English
          arrow-up
          7
          ·
          10 months ago

          No, Searx is a metasearch engine that queries and aggregates results from multiple normal search engines (Google, Bing, etc.)

          A distributed search engine would be more like YaCy, which does its own crawling and stores the index as a distributed hash table shared across all instances.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            arrow-up
            1
            ·
            10 months ago

            Exactly. The main difference I would bring is a web client that hooks into the network, and perhaps an alternative client (e.g. I’m interested in Tauri, so I may rewrite part of the BE to Rust).

            But I’m probably not going to start on this project on my own. DDG is good enough for now, so I’m putting my efforts elsewhere.

      • sqw@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        3
        ·
        10 months ago

        i feel that decentralized search is an extremely valuable thing to start thinking about. but the devil is in practically every one of the details.

        • sugar_in_your_tea@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          ·
          10 months ago

          Yup. Even if you trust all your peers (which isn’t reasonable), there’s still a ton of practical issues that need to be resolved:

          • pagination with a different set of peers
          • moderation of CSAM and whatnot
          • outdated peers and stale data
          • how much data and where are results reduced

          It’s a really complex problem without getting p2p involved, and p2p just adds a ton of other problems.

          So I’m probably going to stick with building my Reddit clone, which I think is simpler (search doesn’t need to happen at the start).

    • pixelscript@lemmy.ml
      link
      fedilink
      English
      arrow-up
      9
      ·
      10 months ago

      My thoughts exactly when reading this.

      I believe people when they claim to develop free software. Often because it’s software the dev wants for themselves anyway and they’ve merely elected to share it rather than sell it. The only major cost is time to develop, which is “paid” for by the creation of the product itself.

      You (OP) are proposing a service. Services have ongoing fees to run and maintain, and the value they create goes to your users, not you. These are by definition cost centers. You will need a stable source of funding to run this. That does not in any way mix with “free”. Not unless you’re some gajillionaire who pivoted to philanthropy after a life of robber baroning, or you’re relying on a fickle stream of donations and grants.

      You indicate in other comments you will not open the source of your backend because you don’t want it scooped from you and stealing your future revenue. That’s fine, but what revenue? I thought this was free? What’s your business model?

      It sounds like what you want to do here is have a free tier anyone can use, supported by a paid tier that offers extended features. That’s fine, I guess. But if you want to “compete with DuckDuckGo”, you are going to need to generate enough revenue to support the volume of freeloaders that DDG does. If your paid tier base doesn’t cover the bill, you will need to start finding new and exciting ways to passively monetize those non-revenue-generating users. That usually means one or more of taking features away and putting them behind the paywall to drive more subscriptions, increasingly invasive ads on the platform, or data-harvesting dark patterns.

      Essentially what I’m saying here is, as-proposed, the eventual failure and/or enshittification of your service seems inevitable. Which makes it no better than DDG long term.

      It is, at any rate, a very intriguing project.

    • UnHidden@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      7
      ·
      10 months ago

      For now we’re going to host on residential connections, and if any ISPs ban us, we’ll just find other ISPs

      • fishos@lemmy.world
        link
        fedilink
        English
        arrow-up
        19
        arrow-down
        1
        ·
        10 months ago

        Yeah, when you say stuff like this, it shows how woefully unprepared you are for the realities of this. You can’t scale, can’t self host for long, don’t see a way to pay for this… When I can already pay Kagi for a fully working, excellent service, why would I choose you? This is guaranteed to crash and burn the moment your ISP tells you you can’t run a commercial grade server through your residential connection. They’ll either cap your bandwidth to unusable levels or disconnect you entirely. If you’re lucky you’ll have 1 or 2 other options to choose from, whom will blacklist you shortly after. Then, after you’re burnt through all the “easy” ways to host, all you’ll be left with is professional grade services that you admit you can’t afford.

        Also, you make zero mention of user privacy. So what happens when you get your first subpoena? Or before that, why should I trust you with my data in general? What policies do you have in place to ensure my legal rights are protected? Do you even know what the legal rights are per state/country and how the location of where someone connects from impacts you? How are you gonna handle visitors from the EU with GDPR?

        Nifty idea, but way too much “I’m gonna single handedly reinvent the wheel” vibes.

  • octopus_ink@lemmy.ml
    link
    fedilink
    English
    arrow-up
    50
    ·
    edit-2
    10 months ago

    Would make Richard Stallman smile :)

    If this is a closed source project, that statement doesn’t work even as a joke.

    However, the screenshots looked good. :)

    • Possibly linux@lemmy.zip
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      10 months ago

      Richard Stallman cares more about what is running on your computer than he does about what is running on a server.

      Fair point though

    • UnHidden@lemmy.worldOP
      link
      fedilink
      arrow-up
      8
      arrow-down
      28
      ·
      10 months ago

      That comment is there specifically to drive engagement up with all of the people correcting me in the comments.

  • My Password Is 1234@lemmy.world
    link
    fedilink
    English
    arrow-up
    44
    arrow-down
    1
    ·
    10 months ago

    I got so excited reading this post, but as I read that the project will not be open source, my excitement immediately faded away

    • wischi@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      10 months ago

      They won’t open source it because the rust code is very likely a joke. They are proud of just using two dependencies, don’t know that their “statically generated” stuff is actually called server side rendering and are hosting this stuff on a fuckin laptop.

      It’s probably a project that will teach them a lot. But in practice their implementation is worthless to everybody else because they are obviously completely inexperienced.

      That said, that project is likely not worthless to them because they will probably learn a ton of stuff why it’s hard to build a search engine.

  • Lemongrab@lemmy.one
    link
    fedilink
    arrow-up
    17
    arrow-down
    1
    ·
    10 months ago

    Closed source and privacy most of the time don’t mix. Or more so the privacy crowd and closed source doesn’t mix. You won’t see much support for your project if it remains like that. Maybe a source available but still closed license would be better. Think about your monetization strategy a bit as well. Consider having premium features and make it a freemium product.

  • Pantherina@feddit.de
    link
    fedilink
    arrow-up
    15
    ·
    10 months ago

    Wow this is great!

    if you are using your own index, I think you could use a more economical approach to fight the spam bullshit of the modern web.

    • instead of using badness enumeration, crawling everything and filtering malware, use an opt-in principle
    • have a community method of gathering new trusted websites
    • use websites internal search functions to get more results
    • use categories to split up the websites, reinventing what people should find: general, news, navigation, science, politics, IT, technology (not code), art, music, philosohy, …
    • have an app or submission website where users can submit new websites, and some form of community control over it (kinda censorship but in a good way)

    This could fix the web as it currently is, by rethinking what should be found, pushed etc. Rating websites by quality could also be helpful.

    Also if you support payments in crypto or cash, there should be no problem to make it paid.

  • Sotuanduso@lemm.ee
    link
    fedilink
    English
    arrow-up
    11
    ·
    10 months ago

    I don’t know DuckDuckGo, but what’s the purpose of trying to compete with it? This is not a rhetorical question. Is there something wrong with DuckDuckGo, something you feel you can do better, or are you just making a competitor for the principle?

    • space@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      10 months ago

      Not OP, but there is value in having competition. DDG is just a bing front-end. The big search engines have a major problem with the quality of results going down, as the internet is SEOd to death. The companies behind these engines don’t seem to be very eager to fix it, they are just hoping to replace them with AI. We’ve also seen how these engines have been turned into ad platforms, which changes the incentives… Instead of ranking quality, they are ranking who pays more.

      Taking a different approach to ranking results that isn’t ad driven, that can punish AI generated content and low quantity results would bring a huge value.

      • ShortN0te@lemmy.ml
        link
        fedilink
        arrow-up
        7
        ·
        10 months ago

        DDG is just a bing front-end.

        That is wrong. Yes there are licensing the bing search database but it is not the only one they use. They have their own crawler too.

        source

  • wischi@programming.dev
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    10 months ago

    “Only two crates used”. What’s great about reinventing the wheel? A closed source project with big claims trying to reinvent everything from scratch. Nice project 🤣

  • CameronDev@programming.dev
    link
    fedilink
    arrow-up
    8
    ·
    10 months ago

    Pages are statically generated

    Can you elaborate on that? To me, statically generated would mean you are pre-rendering a html page for every possible search, which doesnt sound possible? Do you mean that its all server side generated (at the time of search)?

    • blujan@sopuli.xyz
      link
      fedilink
      arrow-up
      9
      ·
      10 months ago

      I think he means pages are presented as static html+css pages, generated dinamically on the back end

  • SorteKanin@feddit.dk
    link
    fedilink
    arrow-up
    7
    ·
    10 months ago

    Only two crates used - TOML and Rocket (plus Rust’s standard library)

    This seems like a bit of a weird approach. There’s lots of existing nice Rust crates to build with, why use such a minimal approach?

    Also Rocket has essentially been superceded by more mature frameworks like Axum.

      • wischi@programming.dev
        link
        fedilink
        arrow-up
        3
        ·
        10 months ago

        Reducing the attack surface by not using well established and battle tested crates but reinventing the wheel inside this closed source project 🤣

      • SorteKanin@feddit.dk
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        10 months ago

        Well that’s a bit of a double-sided sword. Libraries also includes lots of failsafes built in that you’ll need to implement yourself then. And you’ll need to be confident that you don’t implement security issues in your own code instead of relying on widely used libraries. But it makes sense if you’re worried about supply chain attacks.

  • tanja@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    10 months ago

    That’s a neat project.
    You can be proud of your work 😊

    But I for one won’t donate to your cause, as the software seems to be closed-source, and I already have DuckDuckGo & Google for my searching needs.

    I genuinely believe that the only viable niches for new search engines are environmentally-friendly (e.g. Ecosia) or open-source.

    Literally no one will pay for a closed-source search engine.

    But I like your tech stack, and your project’s looking good.

    One more thing: You claim to be against censorship; how will you combat spam & SEO farming?