Field
Value
Language
dc.contributor.author
Ruest, Nick
datacite.creator.affiliationIdentifier
https://ror.org/05fq50484
en_US
datacite.creator.affiliation
York University
en_US
datacite.creator.nameIdentifier
https://orcid.org/0000-0003-1891-1112
en_US
dc.contributor.author
Milligan, Ian
datacite.creator.affiliationIdentifier
https://ror.org/01aff2v68
en_US
datacite.creator.affiliation
University of Waterloo
en_US
datacite.creator.nameIdentifier
https://orcid.org/0000-0002-1470-7723
en_US
dc.contributor.author
Lin, Jimmy
datacite.creator.affiliationIdentifier
https://ror.org/01aff2v68
en_US
datacite.creator.affiliation
University of Waterloo
en_US
datacite.creator.nameIdentifier
en_US
dc.contributor.author
Deschamps, Ryan
datacite.creator.affiliationIdentifier
https://ror.org/01aff2v68
en_US
datacite.creator.affiliation
University of Waterloo
en_US
datacite.creator.nameIdentifier
en_US
dc.contributor.author
Fritz, Samantha
datacite.creator.affiliationIdentifier
https://ror.org/01aff2v68
en_US
datacite.creator.affiliation
University of Waterloo
en_US
datacite.creator.nameIdentifier
en_US
dc.date.accessioned
2018-08-24T12:25:59Z
dc.date.available
2018-08-24T12:25:59Z
dc.date.issued
2018-08-24
dc.identifier.uri
https://www.frdr-dfdr.ca/repo/dataset/c0f0bef4-a754-154a-c0a9-8053d8ef3704
dc.identifier.uri
https://doi.org/10.20383/101.036
dc.description
These are derivative files generated by the Web Archives for Longitudinal Knowledge (WALK) project, which ran between 2016 and 2018. WALK was an interdisciplinary project spearheaded by scholars at York University, the University of Waterloo, and the University of Alberta. The project's goal was to bring together major Canadian web archive holdings and provide researcher access to search indexes and derivative files, including plain text, network diagrams, and domain frequency information. These will be useful to digital humanists who want to work with text at scale or the hyperlink networks of large parts of the archived Web.
Six universities participated: the University of Toronto, University of Alberta, University of Victoria, University of Winnipeg, Dalhousie University, and Simon Fraser University. These files reflect the state of their public web archives in late-2017 to mid-2018.
Each xz file contains: derivative files for a given collection, a GraphML file which you can load with Gephi (it will not have any basic layouts or transformations done to it, requiring you to do so manually), a csv file that explains the distribution of domains within the web archive, and a txt file that contains the plain text extracted from HTML documents within the web archive. You can find the crawl date, full URL, and the plain text of each page within the txt file. It may also contain a GEXF file which you can load with Gephi. It will have a basic layout courtesy of our GraphPass program, allowing you to see major nodes and communities in the network.
This project has evolved into the Archives Unleashed Project. Information on Archives Unleashed and the WALK project can be found at https://archivesunleashed.org and on our blog at https://news.archivesunleashed.org.
en_US
dc.publisher
Federated Research Data Repository / dépôt fédéré de données de recherche
dc.rights
Creative Commons Attribution 4.0 International (CC BY 4.0)
en_US
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
en_US
dc.subject
web archives
en_US
dc.title
Derivative data for Web Archives for Longitudinal Knowledge (WALK)
en_US
globus.shared_endpoint.name
f163c1b3-9c88-42f6-a7bb-5839ed6c4063
globus.shared_endpoint.path
/1/published/publication_36/
frdr.preservation.status
AIP generation and transfer successful
frdr.preservation.datetime
2018-09-25 17:16:06
datacite.publicationyear
2018
datacite.contributor.Sponsor
Dalhousie University
datacite.contributor.Sponsor
Simon Fraser University
datacite.contributor.Sponsor
University of Alberta
datacite.contributor.Sponsor
University of Toronto
datacite.contributor.Sponsor
University of Victoria
datacite.contributor.Sponsor
University of Winnipeg
datacite.resourcetype
Dataset
en_US
datacite.fundingReference.funderName
Social Sciences and Humanities Research Council of Canada (SSHRC)
en_US
datacite.fundingReference.awardNumber
en_US
datacite.fundingReference.awardTitle
en_US
datacite.fundingReference.funderName
Andrew W. Mellon Foundation (AWMF)
en_US
datacite.fundingReference.awardNumber
en_US
datacite.fundingReference.awardTitle
en_US
datacite.fundingReference.funderName
Compute Canada
en_US
datacite.fundingReference.awardNumber
en_US
datacite.fundingReference.awardTitle
en_US
frdr.crdc.code
RDF1020707
frdr.crdc.group_en
Computer and information sciences
en_US
frdr.crdc.class_en
Library science and information studies
en_US
frdr.crdc.field_en
Archival, repository and related studies
en_US
frdr.crdc.group_fr
Informatique et systèmes d'information
fr_CA
frdr.crdc.class_fr
Bibliothéconomie et études de l'information
fr_CA
frdr.crdc.field_fr
Archivistique, gestion de référentiels et études connexes
fr_CA