get-set, Fetch! web scraper v0.4.1
Open source data scraper with csv and zip export capabilities.
With a modular architecture, this extension provides a series of scraping scenarios with predefined default values for fast, minimal configuration scraping.
Binary data (images, pdf files, ...) can be exported as zip archives. Text based data can be exported as csv files.
Take a look at the "Examples" section within the extension to see what's possible.
Create a new project
- Fill in project name, start URL, scrape scenario and various plugins options.
- There are two builtin scenarios: scrape-static-content and scrape-dynamic-content responsible for scraping regular and javascript based html pages respectively.
- You can install additional community based scenarios from the scenario list page.
Start scraping
- Click the corresponding "scrape" button from the project list.
- Urls to be scraped will sequentially open in a new tab with a delay defined at project creation.
- You can end the scraping process at any time by closing the newly opened tab. Next time you start scraping, the process will resume from where it was interrupted.
Export the results
- Depending on the project settings, you can export text data as csv and binary data as zip.
Troubleshooting
- Look for warning or error entries in the logs page.
- You can adjust the log level from the settings page.
- If you find a bug, please open an issue on github and attach in the comment any relevant log entries.