top of page
Search
Writer's picturerobertken

Week 9

This week seems to have been all about data. I populated my bundle with 7 popular (and open source) datasets for examples and experimentation within the bundle. These are very popular with the python neural network crowd and are included in many of the libraries. Thus, I thought it would be valuable to include them in an HPCC format for use with the bundle.


I also included the python (in the form of jupyter notebooks) code that I used to convert the datasets in their original format into a form that is easily sprayed onto an HPCC Cluster.


For example, the MNIST dataset is 28x28 sized images of hand written digits with labels for which digit each image is, 0-9. It is originally in a ubyte format (see the source for more details: http://yann.lecun.com/exdb/mnist/ ), and the output is a dataset, that when sprayed (Fixed, size=785) produces an HPCC Dataset with 60k and 10k rows (one for test and one for training) where each row has an integer for the label and the pixel data is stored as a DATA format.


The datasets are: CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, Reuters, IMDB, and Boston Housing

4 views0 comments

Recent Posts

See All

Week 11

Beginning of the last sprint! This week I pull requested my packer.io code, which is now neatly organized, to the existing repo that...

Week 10

It's beginning to look a lot like the last mile! So work continues on the Bundle. This week I started to package my code and various...

Week 8

Happy belated 4th! It was a shortened week last week, thus no blog entry... This week I worked on creating an AWS AMI with GPU/CUDA...

Comments


Post: Blog2_Post
bottom of page