This is demonstrated by a research paper by Issac Johnson, et al, published in the Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems that uncovers that social-media algorithms fail to accurately geolocate rural Twitter users as well as urban Twitter users.
The researchers investigate the social media population biases between rural and urban communities by focusing on Twitter geolocation inference algorithms. The study found that the bias is largely the result of algorithms that disproportionately affect rural populations, rather than natural population biases.
Two types of algorithms are assessed: text-based (geotags individual tweets) and network-based (geotags individual users). They found that text-based algorithms are more subject to structural (inherent design) bias and showed less improved performance than network-based algorithms when population bias was removed.
These two types of algorithms were chosen because they had published descriptions, open-source code, and accessible data—rather than a black-box. Priedhorsky’s text-based algorithm is representative of many text-based algorithms because it is trained on tokens in the text of the tweet, user timezone, self-reported location field, and specific language. Jurgens’ bi-directional network-based algorithm was chosen because it links users that mention each other, using the principle that interaction decreases with distance.
Using the text-based algorithm, the researchers simply used a reverse geocoding operation to label each tweet with the county it’s located in and assigned the U.S. National Center for Health Statistics’ Urban-Rural Classification Scheme’s ordinal urban/rural code to the tweet. Using the network-based algorithm, researchers defined a user’s home as the geometric median of their tweets geotagged within 50 miles of each other.
The authors hypothesized that there would be a notable population bias in the urban-rural divide, and that could potentially be influenced by algorithmic bias. Rightfully, they found that urban users were overrepresented by 130% for text-based datasets and by 210% for network-based datasets, relative to their proportion in the overall population for the datasets. Correspondingly, the text-based algorithm successfully locates urban users within 100km on average 2.3x greater than rural users, for network-based, 1.3x greater.
The current state of society is realizing the implications of this problem. Even controlled for population biases, geotagging doesn't work for everyone equally. This is just one example of the ways technology can be inequitable, if unaccounted for and implemented on a massive scale.
Source: Johnson, I., McMahon, C., Schöning, J. & Hecht, B. (2017, May). The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum. Paper presented in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, Denver, CO. Retrieved from https://doi-org.ezproxy.rit.edu/10.1145/3025453.3026015