Sunday, January 29, 2017

soft for saving several pages and/or site web (open source) and file-formats

ScrapBook and ScrapBook X

ScrapBook is a Firefox extension, which helps you to save Web pages and easily manage collections. Key features are lightness, speed, accuracy and multi-language support. Major features are:
* Save Web page
* Save snippet of Web page
* Save Web site
* Organize the collection in the same way as Bookmarks
* Full text search and quick filtering search of the collection
* Editing of the collected Web page
* Text/HTML edit feature resembling Opera's Notes
he last updates were only minor updates for compatibility with the new versions of Firefox and to add Ukrainian localization. The development was already almost dead for years.
Fortunately, this award winning add-on (see above) is now developed by a new open project, ScrapBook X, build on the source code[4] of ScrapBook, ScrapBook Plus, ScrapBook Plus 2 and ScrapBook Lite, all of them not any more developed. ScrapBook X keeps mostly the architecture of ScrapBook, but add many features and fix many bugs.[5][6] The project also takes over the development of several Firefox add-ons that extend the power of ScrapBook X.

These add-ons are:
  • ScrapBook X MAF Creator converts the ScrapBook X data item(s) into the .maff format (open format that enables saving of whole webpages in a single file (actually renaming the .maff extension in the .zip extension will make this web page archive accessible even by web browsers that don't support the .maff format)), which can be opened with Firefox's MAF addon.
  • ScrapBook X CopyPageInfo copies to clipboard the information of single or multiple ScrapBook X data items with the possibility to format it in pre-defined or custom formats (very useful for creating formatted bibliography references, for example in BibTeX).
  • ScrapBook X AutoSave captures automatically the web pages on browsing them.
  • ScrapBook X File Converter converts other formats (.enex, .maff, .html+files_directory, .epub, .zip, etc) into ScrapBook X export format or back, which can then be imported into ScrapBook or ScrapBook X. Allows also to backup the whole ScrapBook or ScrapBook X data folder.


Zotero - another Firefox extension having similar functions
voir tag "zotero" de ce blog.


The WARC File Format (ISO 28500)
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns.
backing up websites

mac and other platforms:

Darcy Ripper

only mac ≥10.11 and iOS
SiteSucker is a Macintosh application that automatically downloads websites from the Internet. It does this by asynchronously copying the site's webpages, images, PDFs, style sheets, and other files to your local hard drive, duplicating the site's directory structure. Just enter a URL (Uniform Resource Locator), press return, and SiteSucker can download an entire website.

non-interactive command line tool

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive command line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.

No comments:

Post a Comment