Tuesday, February 14, 2017

Un scraper site (traduction littérale, site rebut) est un site Web qui ne contient aucune information utile pour un internaute.
Certains scraper sites copient le contenu d'un ou de plusieurs autres sites utilisant une technique appelée web scraping.

Le web scraping (parfois appelé harvesting) est une technique d'extraction du contenu de sites Web, via un script ou un programme, dans le but de le transformer pour permettre son utilisation dans un autre contexte, par exemple le référencement.



list of tools

des extensions

Scraper gets data out of web pages and into spreadsheets.
Scraper is a very simple (but limited) data mining extension for facilitating online research when you need to get data into spreadsheet form quickly. It is intended as an easy-to-use tool for intermediate to advanced users who are comfortable with XPath.

Export data in CSV format or store it in CouchDB
The Web Scrapper is a standalone chrome extension. Sitemap building, data extraction and export are all done within browser. After scraping your site you can download the data in CSV format. For advanced use cases you might want to try saving the data into CouchDB.

Unlike other scraping tools that extract data only from HTML Web Scraper can also extract data that is loaded or generated dynamically with JavaScript. Web Scraper can:
  • Wait for dynamic data to be loaded in the page
  • Click on pagination buttons that load data via AJAX
  • Click on buttons to load more data
  • Scroll down the page to load more data
https://github.com/martinsbalodis/web-scraper-chrome-extension (last commit: dec 2014)

Data miner

Public Recipes are data extraction rules used to scrape website data.
Data Miner has over 40,000 Public Recipes.
Recipes are filtered for the site you are on.
Recipes are built by the Data Miner community.
Save recipes for quick access later.

