In choosing a data warehouse for your company, you’ll be faced with three options: Redshift, Google Big Query, or Snowflake. Which one should you use? Snowflake is the newest option on the market, and it’s being hailed as “the next big thing.” Google Big Query is free to use if your data doesn’t exceed 1TB. Redshift has been around for many years and offers enterprise-level services with its pricing structure. Which one should you choose?
In this blog post, we’re going to discuss the pros and cons of each option so that you can make an informed decision on which one is best for your company’s needs. Let’s get started!
Which is better for your company, Redshift, Google BigQuery, or Snowflake?
If there is one thing you should know about Amazon Web Services, it’s that they are constantly updating and adding new features. One of the newest services from AWS is a warehouse called Redshift. The best thing about Redshift is that it offers enterprise-level capabilities at a price point far below what you would pay for traditional enterprise-level warehouses. Redshift is a petabyte warehouse, which means you can store virtually unlimited amounts of data in it. Another great feature of Redshift is the ability to query up to 10GB/s on over 2TB of data. While these speeds are possible with Google BigQuery and Snowflake as well, it’s not the preferred way to manage your data for most situations. Because Redshift is built on top of a relational database, it can perform complex queries and join multiple tables into one dataset by using a function called a star schema. This makes Redshift very easy to use for people who are already familiar with SQL. The downside of this is that you need someone who knows SQL to be able to manage the data warehouse. It’s not impossible for non-SQL people to use Redshift, but there is a steeper learning curve on this one compared to the other options listed here. While Redshift is great because of what it can do, it will cost you more than Google BigQuery and Snowflake. The costs are lower than a traditional enterprise-level warehouse, but you’ll still pay per query. The great thing is that the cost of queries with Redshift is very competitive and will most likely be cheaper than an enterprise-level alternative. If your company has been using Amazon Web Services for some time, then it makes sense to move forward with Redshift. The pricing model is simple and well thought out, which will make your transition from other AWS services a lot smoother. If you’re starting a new warehouse, then you should consider using Redshift. It’s fast, efficient and safe for storing your data in the cloud.
Redshift – Pros:
Very easy to use by SQL users Doesn’t require a lot of setups if you have used AWS before
Redshift – Cons:
You’ll be paying for every query, even simple ones (optional petabyte pricing)
Google BigQuery –
The biggest issue is that BigQuery can only handle files up to 2TB. This is an issue for a lot of companies because they have much larger datasets that can’t be broken down into smaller parts. The upside of this is that you don’t have any restrictions on the amount of data you can store in BigQuery, so it’s not as critical as it may seem at first. Also, if you’re already using Google’s services for other tasks, then it may make sense to integrate BigQuery into your workflow. Other than that, there isn’t a good reason to pick this option over Redshift or Snowflake unless you need the processing power of BigQuery and don’t have any restrictions on data size.
Google BigQuery – Pros:
Google integration makes it easier for companies that use other services like Adwords or Analytics (better UI)
Google BigQuery Cons:
The free plan has a 2TB data size restriction, which is too low for most datasets
Snowflake is an ideal option if you’re looking to start from scratch. It’s very simple to install and get your data into. It’s also very cheap (in the sense that you pay for only what you use). You can start with a free plan, but most companies will want to move up to the professional ($200/month) or business ($500/month) plans almost immediately after signing up.
Snowflake – Pros:
Free plan available (but very restricted) Setup is fast and simple, but data loading is a bit tricky to get the hang of
No free option after your first 50MB of data
As you can see, there are four different options that any company has when it comes to storing their data. Redshift is most ideal if you’re a company that wants to stick with Amazon’s ecosystem. BigQuery is another good option, especially if you’re not comfortable with SQL queries or have already built up some sort of integration with Google services.
Snowflake makes sense if you want something very easy to start off with and don’t feel like paying much for it. After doing this research, I personally decided to go with Snowflake because it was easy to get started with and doesn’t have a large number of limitations once you get past the free tier.
If this article helped you in any way or you learned something new from it, then please let me know by commenting below. Also, if there’s an option that I missed, let me know and I can write about it in the future.
The biggest mistake companies make when picking a data warehouse
One of the biggest mistakes companies make when picking a data warehouse is that they keep their old database and use it as the data warehouse. The reasons for this are simple. A lot of companies (especially startups) don’t feel like moving to a new tool can be justified because of how expensive it is, even if it would ultimately help them outperform other competitors in the same industry.
Another issue that a lot of companies have is that they don’t know what exactly they want out of their data warehouse and thus end up not making any changes to the way they were working before. This causes issues because it’s very hard to get the exact same level of detail in your new data warehouse as you did in your old database, which happens because there are limitations to the structure of a relational database.
Working with files in BigQuery is very neat now that it supports JSON as input (you can read about the feature here )
How do you know if your current data warehouse is bad?
There’s no standard way to go from one data warehouse to another because each company is in a different situation and has a different idea of what they want to achieve. However, there are a few things you should look out for that indicate it’s time to move to something new.
Another big issue that I noticed when looking at the data warehouse of my previous company was that loading data from multiple sources took a very long time compared to other companies. This causes a lot of issues, especially when you need to load new datasets frequently.
How can this be solved? If you’re using a service like a Redshift or BigQuery, then this is obviously not an option. Snowflake does have load-data capabilities, but it’s not ideal for a company the size of ours. With that said, we ended up setting up regular ETL jobs in Apache Airflow to change data from other datasets/sources into our Snowflake database. This is much better than having one big ETL that takes hours to finish and doesn’t allow you to easily get new datasets.
How do you choose an option to match your exact needs? (Monthly Cost, Data Storage Limits, Database Processing Limit & Scalability, Consistency of the service provider, etc.)
To choose an option, you first need to decide how much money you’re willing to spend on it. If your monthly cost is similar across the different providers, then look for other factors that are important to you and compare those. For instance, if you want a service where you can easily load datasets from multiple sources without spending too much time on parallelization and preprocessing, then you should choose Snowflake. On the other hand, if your monthly cost is similar to a lot of the services in this list but you need more time and effort to load your data from multiple sources, then Redshift will be perfect for you (although it might be overkill for most companies).
My personal favorite at the moment is BigQuery. As I stated above, it’s simple to use and has a lot of powerful commands that are very easy to implement compared to other tools out there. Also, if you’re working with data in JSON files, then you can’t go past the new feature that allows users to load them into BigQuery (you can read about this feature here ).
At the time of writing, BigQuery has a free preview period up until mid-August 2018. Using it is easy, so I would suggest testing it out first if you’re interested in working with their service. There are also tutorials that can help you get started (although they might be outdated now):
If you’re looking for a service with BigQuery-like capabilities that supports multiple file types, then I would suggest going with Redshift.
I recently had to move our data warehouse from Snowflake to Redshift and it was very easy.
Conclusion paragraph: After seeing the pros and cons of each service, it seems that Redshift is the best solution for most companies. It provides a more affordable option to BigQuery while still having all the features you need in an analytical database like Snowflake. If your company needs something with even higher performance or larger capacity than Redshift, then maybe Google’s product would be a better fit. But if not, we recommend going with Redshift today!