Algorithmic Influencing on RedBubble

For a while at IPM, we’ve been daydreaming of a beautiful experiment - if we typed the same fake product into a marketplace (e.g. “Digital Goober”) over and over, would the machines, and the people keyed into them, eventually, given enough false signal produce our fake product? Could we will a product into being simply by searching for it?

Eventually, we’ll do that project, but for now, we wanted to establish that such search systems could be gamed in the first place. In multi-sided, adversarial marketplaces, knowing what products are being sought after, and in what relative volume they are being sought after, is a crucial piece of knowledge for staying ahead of competitors. On a site like RedBubble, these dynamics are perfectly captured - thousands of sellers compete with one another to get on top of the rankings for products that buyers are searching for.

To help them gain that edge, a cottage industry has sprung up around the site to help sellers divine out the intent of users - InsightFactory, TopBubbleIndex, BubbleTrends, and Bubblesear.ch, to name a few. In fact, these have become so prevalent that others within this niche industry have published entire manifestos decrying the practice, and Redditors have poured ink out trying to unpack the trending algorithms themselves.

Clearly, buyers use these tools enough to warrant their existence. Additionally, these tools provide fine-grained data about search intents that we can use to determine what is currently trending, what could trend higher, and what competes with those trends. One of these services, TopBubbleIndex, does not do a particularly thorough job of protecting their undocumented API from automated data requests. On May 17, 2022, we captured a list of 9,573 trending topics on RedBubble, each containing fields like the search term in question, it’s week-over-week change, the current ranking index, and so forth.

From this set of initial cases, we chose one case at semi-random for a test - can we send a meaningful amount of traffic to a search term, and alter the number of results? Our hypothesis would be that with a significant bump in traffic, we would induce more search results. In our first assessment, we submitted 10,000 searches for “axlotl stickers”. We then ran another 10,000 searches for “axlotl stickers” and subsequently clicked on one of the search results to simulate more thorough search sessions.

As with any test, we attempt the simplest, cheapest, fastest forms of attack first in order to determine the “minimum viable fraud” for any particular attack vector we analyze. In this case, visiting Redbubble from IPs known to be associated with cloud providers is quickly swept up in an IP-based reputation assessment from Cloudflare:

Oh no, Cloudflare caught us, whatever shall we do!

Oh no, Cloudflare caught us, whatever shall we do!

The base cost of running our servers per minute is about $0.00009, assuming we do not run multiple processes on each server simultaneously. The page we are attempting to access is 3.9 total megabytes, on average. By incorporating a paid proxy network to bounce our requests through, our hypothetical attackers are forced to raise their engineering costs by a few lines of code and some time researching how to incorporate the right configs. With our current proxy provider, we pay $0.6 per GB of traffic on the cheapest tier of IPs we rotate through. For each request, that costs us $0.002 to pay for the full contents of the request, assuming we don’t identify ways to pare down the request (and there are ways to significantly reduce that cost, such as blocking image responses and so forth). While a significant relative increase in cost, the cost of sending 1,000 requests to a given search is just about $2 - and this completely mitigates Cloudflare’s protection - in nearly 100% of our tests, we are able to get through just fine with only a marginal increase in our attack cost (from basically free to a few bucks per thousand hits):

That’s better - a full page of results by adding a few lines of code, bypassing the efforts of a $19 billion dollar company

After our test, we waited for a week to give time for sellers to notice the traffic. After that period of time, we once again visited the search results, and to our surprise, the results had changed very radically:

4x results after a one week period where no larger apparent society wide trends co-occurred around axlotls, though we did send a significant amount of traffic to these specific search results

All of the ≈150 new products on offer were created by a single seller, and all of the products were sourced from readily available digital imagery of axlotls and appeared to be easily computationally constructed. Reviewing their store, their account was brand new, and they had other products that were related to trending topics such as “Ultra Maga” which was birthed in the news as a novel political term during the same week, as well as products relating to the video game “The Stanley Parable” which was re-released during the same week, and thus, was also heavily trending. We reached out to the seller to attempt to understand why they had begun selling products related to axlotls, but did not receive clear feedback that seemed to indicate a direct causal tie.

While we can’t be sure of the tie, the only distinguishing characteristic of axlotls that we could identify that would cause a seller who is brand new to sell a product line alongside other trending topics was suspicious. To conclusively prove the tie, a randomized controlled trial of product hits would be required, and is now in the works. Until that work is concluded, it’s fair to say that at least some suspicions were raised by this initial test into the Redbubble marketplace.

Previous
Previous

SubstackDB: Exploiting Lax Upload Validation to Create Parasitic File Servers

Next
Next

Falsifying Traffic Counts to Manipulate Website Auction Marketplaces