Big Data Geospatial SQL Without SQL Geospatial Support

image02In geospatial queries, we often need to quickly find all the points of interests (POIs) within a certain distance from an anchor point. In this post, we present a simple method that scales very well for billions of data points and implemented using plain SQL; so it can be deployed on a massive data processing systems like Redshift or Hive/SparkSQL on Hadoop without utilizing any geospatial support components.

Debugging Hadoop 2.x on Amazon EMR

Hadoop 2.x upgrades the previous web UI with a detailed ResourceManager. Having previously browsed the simpler JobTracker UI of Hadoop 1.x using lynx on the master node, finding things on the new interface took a bit of experimentation.

Understanding the Decision to Move From AWS EMR/Hive to Redshift

At Thinknear we always want to make sure we are doing our best to use the right tool for the job. So when Redshift came out we decided to evaluate our current reporting and analytics pipeline and see if Redshift could help us improve. At the time we were using Hive/Hadoop on EMR for all our reporting and analytics purposes. We saw Redshift as a way to speed up our reporting infrastructure without completely rearchitecting and give our business team a much easier way to access the data. Given these goals we evaluated Redshift against our current Hive/Hadoop solution and found the following pros and cons.

Contact us. Let's create magic together.

Our Newsletter is good. Sign up so we can deliver the goods. (Not bad, huh?)

Request a call