Workflow

Data Rescue Houston will follow the workflow developed by the DataRefuge project and EDGI.

You may want to glance at the workflow before arriving at the event or at least think through what role you’d like to play. Guides at Data Rescue Houston will help to orient and train participants. Join the Data Refuge Slack to ask questions and get help.

Roles/ Tracks

[adapted from https://datarefuge.github.io/workflow/ and http://www.ppehlab.org/datarefugegetinvolved]

1. Seeding

Seeders canvass the resources of a given government agency, identifying important URLs and whether those URLs can be crawled by the Internet Archive’s web crawler. They use the EDGI Nomination Chrome extension to nominate URLs to the End of Term (EOT) Web Archive if they are crawlable or to the Archivers app if they require manual archiving.

Recommended Skills
Consider this path if you’re comfortable browsing the web and have great attention to detail. An understanding of how web pages are structured will help you with this task.

2. Researching

Researchers review “uncrawlables” identified during Seeding, confirm the URL/dataset is indeed uncrawlable, and investigate how the dataset could be best harvested. Researchers need to have a good understanding of harvesting goals and have some familiarity with datasets. Researchers and harvesters typically work together.

Recommended Skills
Consider this path if you have strong front-end web experience and enjoy research. An understanding of how federal data is organized (e.g. where “master” datasets are) would be valuable.

3. Harvesting

Harvesters take the “uncrawlable” data and try to figure out how to actually capture it based on the recommendations of the Researchers. This is a complex task which can require substantial technical expertise, and which requires different techniques for different tasks. Harvesters can use tools and methods developed by EDGI and other groups.

Recommended Skills
Consider this path if you’re a skilled technologist with a programming language of your choice (e.g., Python, JavaScript, C, etc.), are comfortable with the command line (bash, shell, powershell), or experience working with structured data. Experience in front-end web development a plus.

4. Checking

Note: This role is currently performed by the Baggers, and does not exist separately.

Checkers inspect a harvested dataset and make sure that it is complete. The main question the checkers need to answer is “will the bag make sense to a scientist”? Checkers need to have an in-depth understanding of harvesting goals and potential content variations for datasets.

5. Bagging

Baggers do some quality assurance on the dataset to make sure the content is correct and corresponds to what was described in the spreadsheet. Then they package the data into a bagit file (or “bag”), which includes basic technical metadata, and upload it to the final DataRefuge destination.

Recommended Skills
Consider this path if you have data or web archiving experience, or have strong tech skills and an attention to detail.

6. Describing

Describers create a descriptive record in the DataRefuge CKAN repository for each bag. Then they link the record to the bag and make the record public.

Recommended Skills
Consider this path if you have experience working with scientific data (particularly climate or environmental data) or with creating metadata.

7. Surveying

Surveyors are responsible for identifying key programs, datasets, and documents on Federal Agency websites that are vulnerable to change and loss.
Recommended Skills
Consider this path if you have domain expertise or experience conducting research using federal data.

8. Storytelling

Storytellers employ a variety of narrative forms and media formats to document data rescue events and their many volunteers. They also explore the surprising variety of uses that local communities make of public, federal environmental and climate data–from climate adaptation, to emergency management, to individuals’ “lifehacks:” resilient responses to our changing planet.

Recommended Skills
Consider this path if you have expertise in filmmaking, writing, photography or social media.