2.0 KiB
title, summary, date, authors, tags, excludeSearch, draft
| title | summary | date | authors | tags | excludeSearch | draft | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A Permanent Record: Creating Web Snapshots for Data for Canada / Data for the Universe | 2026-05-04T09:00:00-04:00 |
|
|
false | true |
- Using https://github.com/ArchiveBox/ArchiveBox to archive web pages can create high quality web snapshots to complement Internet Archive operations.
{{< cards >}} {{< card link="https://web.archive.org/web/20260503194906/https://phys.org/news/2026-04-usindian-space-mission-extreme-subsidence.html" title="Internet Archive Snapshot" image="/blog/2026/2026-04-usindian-space-mission-extreme-subsidence-internet-archive-snapshot.webp" subtitle="Click on the image to preview page" >}} {{< card link="https://s3.datafortheuniverse.org/tigris/d4u-datapkg-web-corpus/archive/1777838776.472139/singlefile.html" title="ArchiveBox Snapshot" image="/blog/2026/2026-04-usindian-space-mission-extreme-subsidence-archivebox-snapshot.webp" subtitle="Click on the image to preview page" >}} {{< /cards >}}
In this case, the Internet Archive's snapshot is superior as they also save all of the URLs listed in the "Letter Text" section {{< cards >}} {{< card link="https://web.archive.org/web/20260503194906/https://phys.org/news/2026-04-usindian-space-mission-extreme-subsidence.html" title="Internet Archive Snapshot" image="/blog/2026/200-journalists-applaud-internet-archive-internet-archive-snapshot.webp" subtitle="Click on the image to preview page" >}} {{< card link="https://s3.datafortheuniverse.org/tigris/d4u-datapkg-web-corpus/archive/1777842968.455868/singlefile.html" title="ArchiveBox Snapshot" image="/blog/2026/200-journalists-applaud-internet-archive-archivebox-snapshot.webp" subtitle="Click on the image to preview page" >}} {{< /cards >}}
- Talk about needing a customized yt-dlp as YouTube has recently changed and is harder to archive.