Written by Masayuki Tanaka (NAOJ)
PFS survey is going to collect a huge amount of data. Thanks to the high multiplexity with 2400 fibers, we are going to collect as many as 67,000 spectra in a single night according to the current survey plan. In addition, we will measure a variety of physical parameters — redshifts of galaxies, strengths of emission and absorption lines, stellar spectral types, etc, etc, — from the individual spectra. The PFS data must be going to be “big data”.
A classical way for scientists to analyze data is reading a file, analyzing the data, and calculating some numbers. This process does not work at all for such a massive data set as we are going to obtain. Instead, to handle the massive data set, we need a database that stores all the information in an ordered fashion. The database should enable us to pick up the required information effectively. In addition, we requires a set of sophisticated tools to analyze the data from the database.
We are working on the science database for PFS. National Astronomical Observatory of Japan (NAOJ) have developed the database and tools for the HSC (Hyper Suprime-Cam) survey for the last couple of years, and the Johns Hopkins University (JHU) have developed SciServer, which is an integrated environment with various tools. We combine these NAOJ’s expertize and JHU’s expertize to develop the PFS database. Recently, we have just released the prototype database to the PFS collaboration.
Members from National Observatory of Japan and Johns Hopkins University are discussing the PFS database.
Science users in the PFS collaboration are going to test the prototype system and see if there are any missing functions. We are asking them to give such a feedback to the database team to improve the system for actual survey. We plan to make several iterations in the next few years to have the system ready for science by the time we start the PSF survey.
Front page of the prototype database opened to the collaboration