• mesamune@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      All we can do is make something better, reddit will do their thing and we will do ours.

  • ThyTTY@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    6 months ago

    Can’t wait to see an AI chatbot in my Google searches that behaves like a typical redditor.

  • stanleytweedle@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    6 months ago

    I deleted my comment history after the API exodus. I’m sure they could dig it up if they wanted but at least they’ll have to click like 3 more buttons if they want to train AI on my nonsense.

    • kholby@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      Before:

      SELECT * FROM `comments` WHERE is_deleted=0;
      

      After:

      SELECT * FROM `comments`;
      
    • Ebby@lemmy.ssba.com
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      Perhaps, but not worth buying if you can’t make profit or keep it from your competition.

      60M is for over almost 20 years of data, but once it’s ingested, google will only want new content. Next year, it’ll be more like 3M if the dataset isn’t poisoned by bots or the AI fad hasn’t collapsed. Reddit will struggle with finances again and users will suffer. At least that’s my prediction.

      • empireOfLove2@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        0
        ·
        6 months ago

        Spez has already grifted his money out of the initial stock pump so it literally doesn’t matter. Reddit could shut down tomorrow and he’d be happy as a clam.

        • Barbarian@sh.itjust.works
          link
          fedilink
          arrow-up
          0
          ·
          6 months ago

          It currently looks very much like a bubble. After the dot com bubble, the internet didn’t go away, but most companies died off and all the stupid monetisation went bankrupt.

          We may be seeing something similar

        • Ebby@lemmy.ssba.com
          link
          fedilink
          arrow-up
          0
          ·
          6 months ago

          Haha! Wow I guess so. I’ll keep some shelf space available in the geezer museum next to 3D TV’s, deep fakes, fidget spinners, and my pogs. :D

    • qjkxbmwvz@startrek.website
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      I wonder if Google’s unlimited legal budget plays a role. Not a lawyer, so probably way off here…

      But, for example, reddit’s success in part depends on Google ingesting their data — reddit shows up in Google searches all the time, which can only happen if Google uses reddit’s content. So reddit telling Google “you can’t use our content” doesn’t work, and they need to say something like, “you can use our content for search results but you can’t consume it as training data.”

      This is a pretty straightforward statement/request/demand, but one could imagine Google lawyers maliciously complying and throwing their hands up dramatically, claiming “well we use some amount of AI in our search results, so if we can’t use your content for AI training then we can’t risk using it for search results.” Which would, I imagine, really, really hurt reddit (no Google results would be catastrophic I suspect).

      So, perhaps the “low” 60M figure is just Google using their leverage.

      Or not. As a random person on the Internet, I can say I’m probably not contributing anything meaningful here…

    • Zaktor@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      I’m personally curious whether Reddit actually has any ability to protect that database. I don’t remember Reddit TOS, but usually those things give them license to use and copy the data, maybe even to sell it, but not actually the copyright on it. So if someone made a Reddit scraper and copied the comments, wouldn’t only the actual commenter be able to sue?

      $60M may be reflecting that, in that it’s more a convenience fee to shield Google against individual Redditors going after them than something that Reddit itself could actually sue over.

    • bobburger@fedia.io
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      To be fair it’s a pretty terrible dataset. The AI is just going to say “this” to every question you ask

      • lol@discuss.tchncs.de
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        6 months ago

        You’re exaggerating of course, but I don’t think it’s terrible at all; the opposite really. It’s likely incredibly useful for creating LLMs with specific knowledge or behavior.

        The categorization into subreddits alone opens up so many possible applications. Imagine for example training a conversational AI with data from specific subreddits like science, askscience, biology, physics, astronomy,… or posts by users that frequent such subreddits in order to create sort of an academic AI.

        You could do the same for all sorts of topics: Want a sports commentator AI, use sports related subreddits; an AI that supports you in writing a novel, use creative writing subreddits etc. Don’t want your AI to spew political opinions, exclude political subreddits from your data; don’t want it to use offensive language, only use well-moderated subreddits etc.

        • Adderbox76@lemmy.ca
          link
          fedilink
          English
          arrow-up
          0
          ·
          6 months ago

          This presumes that Reddit is populated by so-called experts answering questions and posting in those subs.

          But the vast overwhelming truth is that most people pretending to be experts are just regurgitating the answers they heard from another reddit post, and so on, and so on.

          You might as well just train your AI on the “confidently incorrect” sub and call it a day.

          • MBM@lemmings.world
            link
            fedilink
            arrow-up
            0
            ·
            6 months ago

            It’s always an eye-opener when you look at an ELI5 thread where you’re actually knowledgeable about the topic

    • trolololol@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      Considering it’s all full of Nazis and bots, and if you get to filter all of them out you’re left with reposts and low quality memes followed by comments that represent the hostile side of each of us… I’d say anything over $5 is a good deal for spez.

      Now, I hope Google uses this data exclusively for detecting inappropriate answers. Can you imagine it giving answers based on the endless threads i of " I’m not your mate, bro; I’m not your bro, dude…".

    • GBU_28@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 months ago

      How quickly you forget that half of it is just “I also choose this guy’s wife” and “the narwhal bacon’s at midnight”

  • Endorkend@kbin.social
    link
    fedilink
    arrow-up
    0
    ·
    6 months ago

    And this is how Skynet was born.

    That one Microsoft Twitter bot turned into a full blown Nazi in just one day.

    I can’t even imagine how fucked up and depraved one trained on Reddit data will get.

  • Ultragigagigantic@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    6 months ago

    It’s self hosted plus open source, or barbarism.

    You think… “Y sucks, I’m switching to Z”.

    But Z just is biding its time till it can be Y. The most principled owners in existence have a finite life span. It is only a matter of time till the vultures put their claws into anything.

    Gabe is gonna die, steam is gonna be fucked. We should have all started with Good Old Games. But we didn’t. But it’s not to late to switch. To be free with your software. Don’t be like me, wishing you had been smarter before.

    It’s okay, better now then never.

    • yeehaw@lemmy.ca
      link
      fedilink
      arrow-up
      0
      ·
      6 months ago

      In the case of steam and gog, steam was around a lot longer before gog. I only purchase what I must on steam, and if it’s available on both I always purchase from gog. They more align with my values. But valve from my point of view is such a prosumer business I don’t mind. But you’re right. When Gabe goes, I’m willing to bet some cock sucker ceo from oracle or ibm will swoop in and fuck the whole thing up.