I had a discussion about Grass that prompted me to dig a little deeper into the app. Here’s a TL;DR and some reflections on the overall concept:
TL;DR - GetGrass
Machine learning models aka LLMs require: 1) data and 2) compute to come to life. We’ve worked on compute for the past decade. It is evident as there’s been a ton of advances in how we train models and the compute/infrastructure used to train them
Data on the internet from sites such as Twitter, Reddit, eBay, etc are becoming increasingly more valuable as compute and ML training infra is becoming more and more advanced
As such, many of these sites are putting more resources on detecting web scraping bots – they are even spoofing their data (e.g. eBay mispricing certain items) if a bot is detected
Grass has packaged a network to utilize individual devices (e.g. your Solana saga) – individual IPs + device types makes “scraping” much more organic and harder to detect
Grass plan on using other crypto native primitives such as ZK tech to prove out the data being scraped is legitimate + governance tokens as a way for bandwidth providers to map out the right incentives for providing this service
If you would like to give GetGrass this a try, here’s a referral link of mine. I’d love to hear your thoughts or complaints.
There are two general takeaways that come from this:
“Data is the new oil”
Decentralized networks via browser extensions, mobile apps
“Data is the new oil”
Numerous wars were fought over access to actual oil. 18 years later, there are “wars” being fought over data access — big tech companies put more and more effort into gating off data access from web scraping bots (and even spoofing them). These bots are doing all they can to find ways around.
As GPUs and compute become more of a commodity the data access and verification layer of the machine learning stack will increasingly become more valuable. Semiconductors are a multi trillion dollar market today. Could data and data access potentially be bigger?
Just as NVIDIA and the semiconductor industry experienced exponential growth over the last decade, we will see the same for data + data access.
Decentralized networks via browser extensions, mobile apps
Grass makes things super easy to run – download an app or a chrome extension, configure the properties, and turn it on periodically.
Why aren’t other blockchain networks (L1s, L2s, etc) doing this? This doesn’t need to be done for the full fledged validator clients that are constantly indexing, but it should be possible for lite clients with low(er) hardware reqs to be run on mobile and/or other recreational, day to day computing devices (laptops, ipads, etc).
Is it just an issue with getting permissions from chrome and/or the app store? As more eyeballs shift to digging into Grass this should be an obvious move for other networks. There is also an opportunity for an aggregator to build something out that works across the variety of networks.
For those interested in learning more this was a great interview with the founders of Grass.
re: Grass-related activity could be detected and banned
Today, fraud and bot detection is typically done on your internet “fingerprint” – this is a combination of things like your device type, ip address, browser type + version, etc. Grass effectively decentralizes the network that these traditional signals to detect web scraping won’t work. With a unique fingerprint across a variety of geographies, I can’t see what these big tech companies could do. The false positives aka banning a large number of legitimate users would be too detrimental.
If I’m missing something here, would love to understand why I’m wrong!
grassssss