{"id":325,"date":"2019-01-10T20:58:42","date_gmt":"2019-01-10T20:58:42","guid":{"rendered":"https:\/\/www.michaelagreiler.com\/blog\/?p=325"},"modified":"2019-02-17T19:09:45","modified_gmt":"2019-02-17T18:09:45","slug":"100-public-data-sets-for-data-science","status":"publish","type":"post","link":"https:\/\/www.michaelagreiler.com\/100-public-data-sets-for-data-science\/","title":{"rendered":"100+ public data sets for data scientist and founders"},"content":{"rendered":"\n

Public data sets can be a bliss. Obviously, when you are into big data or a data scientist, public data sets excite you. Like me, by the way. But, also if you are planning to start a business or accelerate your existing business, public data sets might be exactly what you are looking for.<\/p>\n\n\n\n

\"Shows

Data can fuel your business. Photo by Carlos Muza<\/a> on Unsplash<\/a>
<\/figcaption><\/figure>\n\n\n\n

What can you do with public data sets?<\/strong> <\/h2>\n\n\n\n

Well, you can learn from them. Well, not you, but algorithm can learn from data sets. And what they learn, you can use to predict. Yes, that’s like reading into the future. If you want to start a business, then the insights you find can be your new product. If you have already a business, you might find new insights to improve your current business.<\/p>\n\n\n\n

Which domain should that data set come from?<\/strong><\/h2>\n\n\n\n

That really depends on the business you are or want to be in. One type of public data sets that I am looking for are medical data sets. Why? Because I love to research diseases and their cures and treatments. I love if I discover potential remedies that haven’t been marketed to the mass. Currently, I research those qualitatively. That means, I look them up by hand. This approach is a great start to understand the data and the domain. Using public data sets and analyzing them quantitatively takes this approach to the next level.<\/p>\n\n\n\n

Be aware of the quality of the public data set<\/strong><\/h2>\n\n\n\n

Public data sets are always of different quality. Some are very noisy. Others are of high quality. It is not easy to assess the quality at first sight. Even if data can be cleaned to some degree, you will get better results with better data. Another aspect is that data sets come in all kind of formats. Some formats are proprietary and harder to work with. Understanding the format is a large part of the work.<\/p>\n\n\n\n

100+ Public data sets<\/strong><\/h2>\n\n\n\n

If you read that far, you probably also love data sets, like I do. So without further ado, here are the public data sets I found so far. <\/p>\n\n\n\n

  1. Awesome Public Datasets: https:\/\/github.com\/awesomedata\/awesome-public-datasets<\/a><\/li>
  2. Many of the AWS data sets come with a usage example, which of course is very helpful to get the hang of it: https:\/\/registry.opendata.aws\/<\/a><\/li>
  3. Kaggel is a website focusing on data science and has an extensive list of available data sets: https:\/\/www.kaggle.com\/datasets<\/a><\/li>
  4. This article comprises 50 public data sets for machine learning ranging from https:\/\/medium.com\/datadriveninvestor\/the-50-best-public-datasets-for-machine-learning-d80e9f030279<\/a><\/li>
  5. This list is especially for students and comprises 19 data sets https:\/\/www.springboard.com\/blog\/free-public-data-sets-data-science-project\/<\/a><\/li>
  6. Here you can find a list of 7 public data sets: https:\/\/www.tableau.com\/learn\/articles\/free-public-data-sets<\/a><\/li>
  7. This article lists 33 data sources you can use for free https:\/\/www.forbes.com\/sites\/bernardmarr\/2016\/02\/12\/big-data-35-brilliant-and-free-data-sources-for-2016\/#43ca441fb54d<\/a><\/li><\/ol>\n\n\n\n

    Please send me an email or leave a comment if you know good quality data sets. <\/p>\n

    Updated on February 17th, 2019 <\/p>","protected":false},"excerpt":{"rendered":"

    Public data sets are a bliss to start a new business or accelerate your existing business. With the help of algorithms you can learn insights from the data sets that can be used to solve problems or to make predictions.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"spay_email":"","jetpack_publicize_message":""},"categories":[38],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_shortlink":"https:\/\/wp.me\/paHqvV-5f","jetpack-related-posts":[{"id":2616,"url":"https:\/\/www.michaelagreiler.com\/we-are-10x-engineers\/","url_meta":{"origin":325,"position":0},"title":"We are 10x engineers","date":"July 18, 2019","format":false,"excerpt":"Engineers come in all shapes and sizes. And so do 10x engineers. But what's even more important is to build a team of diverse people, with diverse backgrounds, skills, strengths and weaknesses. This way we are prepares for the manifold challenges and tasks we have to solve in order for\u2026","rel":"nofollow","context":"In \"Productivity\"","img":{"src":"https:\/\/i1.wp.com\/www.michaelagreiler.com\/wp-content\/uploads\/2019\/07\/we-are-10x-engineers.jpg?fit=1024%2C512&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":539,"url":"https:\/\/www.michaelagreiler.com\/keynote-code-reviews\/","url_meta":{"origin":325,"position":1},"title":"Keynote about Code reviews: abstract","date":"February 9, 2019","format":false,"excerpt":"Abstract of keynote: Four eyes see more than two. Following this well-known principle, within Microsoft, code reviews are a part of the backbone of Microsoft\u2019s quality culture. Not only Microsoft bets on code reviews. Over the past decade, both open source and commercial software projects adopted code review practices as\u2026","rel":"nofollow","context":"In \"Research\"","img":{"src":"","width":0,"height":0},"classes":[]},{"id":6287,"url":"https:\/\/www.michaelagreiler.com\/year-in-review-2023\/","url_meta":{"origin":325,"position":2},"title":"A Year in Review - 2023","date":"January 5, 2024","format":false,"excerpt":"Let's recap what happened in 2023 and how this year treated me. Let's spoil it that much: It was a real roller coaster. With steep lows and a few highs! It started super productive At the beginning of 2023, I was, like so often in my life, caught up in\u2026","rel":"nofollow","context":"In \"Year Review\"","img":{"src":"https:\/\/i0.wp.com\/www.michaelagreiler.com\/wp-content\/uploads\/2024\/01\/IMG_20230713_173940.jpg?fit=1111%2C833&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/posts\/325"}],"collection":[{"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/comments?post=325"}],"version-history":[{"count":3,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/posts\/325\/revisions"}],"predecessor-version":[{"id":486,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/posts\/325\/revisions\/486"}],"wp:attachment":[{"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/media?parent=325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/categories?post=325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.michaelagreiler.com\/wp-json\/wp\/v2\/tags?post=325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}