mysql - How to query S3 public dataset using redshift -


amazon aws documentation awful , totally unhelpful. feels out can down actual issue.

i using sql workbench connect redshift cluster able connect fine can't run commands...

how can query common crawl s3 dataset?

the common crawl corpus dataset provided in amazon s3 apparently formatted warc files. however, amazon redshift can load csv files (uncompressed, gzip or lzop).

therefore, need pre-process common crawl files appropriate format loading amazon redshift. 1 way of doing use amazon elastic mapreduce (emr). page says:

common crawl provides glue code required launch hadoop jobs on amazon elastic mapreduce can run against crawl corpus residing here in amazon public data sets.

please note rather complex process (as involving hadoop).


Comments

Popular posts from this blog

android - Gradle sync Error:Configuration with name 'default' not found -

java - Andrioid studio start fail: Fatal error initializing 'null' -

html - jQuery UI Sortable - Remove placeholder after item is dropped -