You may want to glance at the workflow before arriving at the event or at least think through what role you’d like to play. Guides at Data Rescue Houston will help to orient and train participants. Join the Data Refuge Slack to ask questions and get help.
Seeders canvass the resources of a given government agency, identifying important URLs and whether those URLs can be crawled by the Internet Archive’s web crawler. They use the EDGI Nomination Chrome extension to nominate URLs to the End of Term (EOT) Web Archive if they are crawlable or to the Archivers app if they require manual archiving.
Consider this path if you’re comfortable browsing the web and have great attention to detail. An understanding of how web pages are structured will help you with this task.
Researchers review “uncrawlables” identified during Seeding, confirm the URL/dataset is indeed uncrawlable, and investigate how the dataset could be best harvested. Researchers need to have a good understanding of harvesting goals and have some familiarity with datasets. Researchers and harvesters typically work together.
Consider this path if you have strong front-end web experience and enjoy research. An understanding of how federal data is organized (e.g. where “master” datasets are) would be valuable.
Harvesters take the “uncrawlable” data and try to figure out how to actually capture it based on the recommendations of the Researchers. This is a complex task which can require substantial technical expertise, and which requires different techniques for different tasks. Harvesters can use tools and methods developed by EDGI and other groups.
Note: This role is currently performed by the Baggers, and does not exist separately.
Checkers inspect a harvested dataset and make sure that it is complete. The main question the checkers need to answer is “will the bag make sense to a scientist”? Checkers need to have an in-depth understanding of harvesting goals and potential content variations for datasets.
Baggers do some quality assurance on the dataset to make sure the content is correct and corresponds to what was described in the spreadsheet. Then they package the data into a bagit file (or “bag”), which includes basic technical metadata, and upload it to the final DataRefuge destination.
Consider this path if you have data or web archiving experience, or have strong tech skills and an attention to detail.
Describers create a descriptive record in the DataRefuge CKAN repository for each bag. Then they link the record to the bag and make the record public.
Consider this path if you have experience working with scientific data (particularly climate or environmental data) or with creating metadata.
Consider this path if you have domain expertise or experience conducting research using federal data.
Storytellers employ a variety of narrative forms and media formats to document data rescue events and their many volunteers. They also explore the surprising variety of uses that local communities make of public, federal environmental and climate data–from climate adaptation, to emergency management, to individuals’ “lifehacks:” resilient responses to our changing planet.
Consider this path if you have expertise in filmmaking, writing, photography or social media.