100+ public data sets for data scientist and founders

Public data sets can be a bliss. Obviously, when you are into big data or a data scientist, public data sets excite you. Like me, by the way. But, also if you are planning to start a business or accelerate your existing business, public data sets might be exactly what you are looking for.

Shows a laptop with a data analysis chart

Data can fuel your business. Photo by Carlos Muza on Unsplash

What can you do with public data sets?

Well, you can learn from them. Well, not you, but algorithm can learn from data sets. And what they learn, you can use to predict. Yes, that’s like reading into the future. If you want to start a business, then the insights you find can be your new product. If you have already a business, you might find new insights to improve your current business.

Which domain should that data set come from?

That really depends on the business you are or want to be in. One type of public data sets that I am looking for are medical data sets. Why? Because I love to research diseases and their cures and treatments. I love if I discover potential remedies that haven’t been marketed to the mass. Currently, I research those qualitatively. That means, I look them up by hand. This approach is a great start to understand the data and the domain. Using public data sets and analyzing them quantitatively takes this approach to the next level.

Be aware of the quality of the public data set

Public data sets are always of different quality. Some are very noisy. Others are of high quality. It is not easy to assess the quality at first sight. Even if data can be cleaned to some degree, you will get better results with better data. Another aspect is that data sets come in all kind of formats. Some formats are proprietary and harder to work with. Understanding the format is a large part of the work.

100+ Public data sets

If you read that far, you probably also love data sets, like I do. So without further ado, here are the public data sets I found so far.

  1. Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
  2. Many of the AWS data sets come with a usage example, which of course is very helpful to get the hang of it: https://registry.opendata.aws/
  3. Kaggel is a website focusing on data science and has an extensive list of available data sets: https://www.kaggle.com/datasets
  4. This article comprises 50 public data sets for machine learning ranging from https://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279
  5. This list is especially for students and comprises 19 data sets https://www.springboard.com/blog/free-public-data-sets-data-science-project/
  6. Here you can find a list of 7 public data sets: https://www.tableau.com/learn/articles/free-public-data-sets
  7. This article lists 33 data sources you can use for free https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/#43ca441fb54d

Please send me an email or leave a comment if you know good quality data sets.

Dr. Michaela Greiler

I help companies improve their software development processes, like code reviewing or software testing. I work for corporations such as Microsoft, but also help smaller businesses and start-ups to ensure a productive, satisfying and efficient software engineering process.

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: