Hi!
Time for a very interesting feature part of the Azure family: Azure Open Datasets. OK, when you read the name, you probably get 95% of the idea, however, let’s dig into the official definition (see references).
Azure Open Datasets are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Open Datasets are in the cloud on Microsoft Azure and are integrated into Azure Machine Learning and readily available to Azure Databricks and Machine Learning Studio (classic). You can also access the datasets through APIs and use them in other products, such as Power BI and Azure Data Factory.
Datasets include public-domain data for weather, census, holidays, public safety, and location that help you train machine learning models and enrich predictive solutions. You can also share your public datasets on Azure Open Datasets.
This per-se is amazing, however this feature became useful when you start to work with the new amazing Azure Machine Learning Studio (Preview). Now in the [Assets / Datasets] section we can use:
- Datasets from local files
- Datasets from DataStore
- Datasets from WebFiles
- Datasets from the Open DataSet repository

And the last one is awesome because we can work with sample and free data like
- US Population by ZIP Code, https://azure.microsoft.com/en-us/services/open-datasets/catalog/us-decennial-census-zip/
- NYC Taxi & Limousine Commission – yellow taxi trip records, https://azure.microsoft.com/en-us/services/open-datasets/catalog/nyc-taxi-limousine-commission-yellow-taxi-trip-records/
- And, of course, The MNIST database of handwritten digits, https://azure.microsoft.com/en-us/services/open-datasets/catalog/mnist/
All the datasets in the repository are optimized to be used in Machine Learning workflows. And, we have the chance to requests datasets or to submit and contribute with our own data. So Cool!
Happy coding!
Greetings @ Toronto
El Bruno
References
- Azure Open Datasets, https://azure.microsoft.com/en-us/services/open-datasets/
- Tutorial: Use automated machine learning to predict taxi fares, https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models
- Azure Machine Learning Studio (Preview) https://ml.azure.com