A site similar to 12ft.io but is self hosted and works with websites that 12ft.io doesn’t work with.

How does it work?

It pretends to be GoogleBot (Google’s web crawler) and gets the same content that google will get. Google gets the whole page so that the content of the article can be indexed properly and this takes advantage of that.

link: https://github.com/wasi-master/13ft

  • redcalcium@lemmy.institute
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    1 year ago

    It amazes me that all it takes is just changing user agent to Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) and it can bypass paywalls on many sites? I thought those sites would try harder (e.g. checking if the ip address is truly belong to google), but apparently not.

    • andrew@radiation.party
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 year ago

      Checking ip ownership is a moving target more likely to result in outcomes these sites don’t want (accidentally blocking google bots and preventing results from appearing on google).

      Checking useragent is cheap, easier, unlikely to break (for this purpose, anyway) and the percentage of folks who know how to bypass this check is relatively slim, with a pretty small financial impact.

    • Aniki 🌱🌿@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Same. I thought there would be more stuff happening in the background but when I saw it’s just hijacking the google bot headers to display the html i was a bit disappointed it’s so stupidly easy.