AstroGrid was the UK contribution to the European Virtual Observatory, tasked to build beta-level grid-based distributed astronomy data integrators.
This photographic print was digitized as part of the SuperCOSMOS project, creating massive raster libraries referenced by spherical coordinates. Later digital-camera sky surveys such as the Sloan Sky Survey scanned the universe at increasing resolution and sensitivity.
Bespoke data pipelines sucked these pixel images into a process that cleaned them of satellite and camera artifacts and extracted characteristics of stars and galaxies and nebulae. Such pipelines were highly sensitive to configurations and would be re-run with different properties for different end analysis.
As instruments improved and proliferated, data size and complexity rocketed – but internet bandwidth trailed in relative terms. The data was not just ‘big’ but rapidly growing, poorly described, and difficult to associate across sensors and research domains. Data transfer often had to be physical: data was copied to a disk drive and then the drive or complete PC parcel-posted to international collaborations.
Task
As a Senior Software Engineer for the Royal Observatory Edinburgh, an Unusual Systems staff member was tasked to build the data publishing and querying components of a globally distributed ‘grid’ based on web services.
Activities
- Typical architect tasks; defining and designing large scale message-based transaction services including message definitions, data exchange standards, performance analysis, failover mechanisms, versioning, deployment dependencies, preserving compatibility over time, etc.
- Investigated visualisation & data mining techniques for applying to the Virtual Observatory.
- Design authority for the ‘Publishers AstroGrid Library’, a java codeset to help data owners publish their data. Deployed real-world data centers to ROE, European Southern Observatory, Leicester and Rutherford Appleton Laboratory.
- Java used to create web services, XML & SOAP for messaging and metadata, XSLT, Relational databases (eg SQL Server, Oracle, PostgreSQL) accessed through JDBC, bespoke data formats, collaborative tools (Wikki, forums, CVS, BugZilla, etc), designed using