Sebastin Santy: GSoC Blog [julia]

DataDepsGenerators.jl: Access to over a Million datasets in Julia

3 minute read Published:

Introduction I am Sebastin, a Google Summer of Code student working on DataDepsGenerators.jl a support package for DataDeps.jl. DataDeps.jl is a tool for reproducible data science. It means anyone trying to run your code later, in a different environment isn’t faffing around trying to work out where to download the data from and how to connect it to your scripts. DataDepsGenerators.jl helps DataDeps.jl by creating the register blocks which are needed as input to it.

Week 4: Implement CKAN API and change RegBlock structure

1 minute read Published:

Hi everyone! The fourth week has come to an end. This week was smooth considering the previous experiences I had handling the issues. Work PR #26: Implemented CKAN API setup. CKAN is used by a lot of government data providing agencies like, Hence, adding CKAN brings us atleast over 1 million datasets (800,000 from DataOne, 281,000 from Data.Gov, 28,000 from CKAN was the most successful API we integrated till today, as the API structure was uniform across multiple data sources which is a boon as we didn’t need a constructor for each and every dataset separately.

Week 3: Implement DataOneV2 Abstraction

2 minute read Published:

Hi everyone! The third week has come to an end. It was a roller coaster week with failing tests and attempts to fix them. Anyways, one of the major work in the project, that is integrating DataOneV2 abstraction was completed. This is the summary of what I have done over the previous week: Work PR #21: Implemented 2 APIs, TERN(Terrestrial Ecosystem Research Network, Australia) (which features TERN()) and KNB(Knowledge Network for Biocomplexity) (which features ArcticDataCenter() and KnowledgeNetworkforBiocomplexity()) as a part of DataOneV2 API.

Week 2: Work on the REST API interface using DataOne

2 minute read Published:

Hi everyone! The second week has come to an end. This is the summary of what I have done over the previous week: Work PR #18: Provision to add checksum in Register Blocks. Till now checksums didn’t have a separate block of space in the Register Block structure. A new function was created to handle the case of checksums along with the tests. Overall, this PR helped me a lot in learning Julia, thanks to the lengthy review it went through.

Week 1: Making DataDepsGenerators.jl ready

2 minute read Published:

Hi everyone! The first week has come to an end. Including the previous work, this is the summary of what I have done till now: Work PR #17: Added DataDeps-based Integration tests. These were needed in order to check whether the Register Blocks we are generating using DataDepsGenerators.jl are in the format DataDeps.jl wants it to be. These tests were crucial as it gives a verification that DataDeps.jl is able to understand the generated Register Block output from DataDepsGenerators.

Accepted to GSoC 2018

2 minute read Published:

Hi everyone! I am Sebastin Santy, a third year undergrad at BITS Pilani K.K. Birla Goa Campus. My proposal for GSoC 2018 has been accepted under NumFOCUS [Julia Computing]. I’ll be working on adding support for more data repositories to DataDepsGenerators.jl. DataDepsGenerators.jl is a support package for DataDeps.jl which helps in creating Register Blocks for DataDeps.jl. A Register Block is a chunk of julia code which describes the data to be downloaded.