Replication data for “The Hidden Costs of Requiring Accounts:
Quasi-Experimental Evidence from Peer Production” by Benjamin Mako
Hill and Aaron Shaw to be published in Communication Research.
Replicating the analysis presented in the paper
Replicating this analysis involves a number of steps. We have
attempted to included the most “raw” versions of the data to allow
replication of our full data pipeline.
This includes three sources of data:
MediaWiki XML dump files: We have included full XML history data
for all wikis we have access to (including those excluded from our
final analysis) in XZ compressed MediaWiki dump XML format. This
includes more than 330GB uncompressed. All of these files are made
available in the following GNU Tar archive:
hidden_costs-wiki_xml_dumps.tar
Deployment Dates: A file with a list of the dates on which Wikia
wikis included in our analysis transitioned to requiring accounts
from would-be editors. We received this list from Wikia
staff/administrators and it is provided in the following file:
hidden_costs-login_only_wikis.csv
Data on administrators: Because data on user rights (which user
accounts have administrative rights for a given wiki) are not
included in the XML dumps, we had to collect this data after the
fact via the Wikia API. We have included the user rights data we
collected as well as the code used to collect it in the following
file: hidden_costs-admin_list.tar.xz
In addition to the code included in this archive, you will need access
to a Python program wikiq
which is a tool created by the Community
Data Science Collective to parse the MediaWiki XML dump files included
in this dataset. The output of wikiq is a set of TSV data with
revision metadata which is used by the rest of this analysis.
The wikiq code is available here:
https://code.communitydata.science/mediawiki_dump_tools.git
License
The documentation provided for this project is released under a
Creative Commons Attribution Share-Alike 4.0 license (CC-BY-SA
4.0). Details of the license are available at:
http://creativecommons.org/licenses/by-sa/4.0/.
All code provided for this project is released under the GNU GPLv3
(Available in plain text
here:https://www.gnu.org/licenses/gpl-3.0.txt).
The data are collected from Wikia.com (now largely rebranded
Fandom.com). Most or all of these data have been published as free
cultural works by Wikia/Fandom under the CC-BY-SA 3.0 (unported)
license. Details on Wikia/Fandom licensing is available on this page:
https://www.fandom.com/licensing
Contact
Please be in touch with the authors with any questions.
The documentation provided for this project is released under a Creative Commons Attribution Share-Alike 4.0 license (CC-BY-SA 4.0). Details of the license are available at: http://creativecommons.org/licenses/by-sa/4.0/.
All code provided for this project is released under the GNU GPLv3 (Available in plain text here:https://www.gnu.org/licenses/gpl-3.0.txt).
The data are collected from Wikia.com (now largely rebranded Fandom.com). Most or all of these data have been published as free cultural works by Wikia/Fandom under the CC-BY-SA 3.0 (unported) license. Details on Wikia/Fandom licensing is available on this page: https://www.fandom.com/licensing