Your machine can think like you and work faster than you
Last few weeks, I was juggling with huge amount of spatial data (temporal extent: 20 years (weekly data), spatial extent: whole world) from different sources (i.e., ESA, DLR, VITO, USGS, USDA etc.) which are in different formats (i.e., *.img, *.tif, *.txt, *.ascii, .GRID etc.) and of different projections. It was a complete mess. I am supposed to process all these data by bringing all in one common format/common projection system and calculate the long-term average and provide the averaged data in a spread sheet format according to some administrative unit. ops! that was a nightmare.
What i have tried to come to a solution:
1. First, I have tried both commercial and open-source software (i.e., ArcGIS, Qgis). trust me, that was ‘THE’ most dumbest idea i have had for such processing.
2. Secondly, I tried some Java library (i.e., GeoTools). that wasn’t bad but still I had to face the problem of converting from different format to GeoTIFF and of-course projection and transformation problem; that’s usual (as some provider always have user-defined projection system and woohooo!!! that’s some crap you have to spend some more time then you might expect).
3. Thirdly, I tried Python with GDAL library. that was better but not the best. because there were still some problem such as shortage of built-in functions (i.e., dealing with file system and network security, complex file naming conventions etc. )
4. Fourthly, i tried to develop some Procedure using oracle spatial inside the oracle database/Geomatica. that’s effective, convenient but not the fastest because it has its own format of processing (i.e., *.pix) and that’s time-consuming.
5. Finally, i have found THE best solution which is simplest (70 line of code), fastest (processing required only 15 min), convenient (you can save the *.sh file and run it any time for any kind of mentioned data), time-saving (helps to stop re-doing the task over and over again) and effective (my boss got exactly what he wanted, before the dead-line…). the “Shell Scripting” is the solution. write just 40 line of basic code and that will do the whole (described above) task for you.
Shell Scripting for Geospatial data Processing/analysis:
Let me give you some example what i have used to solve the above problem. Do you want to convert all the different format/ projection/ transformation/ conversion into a common format/ projection/ transformation/ conversion or clip the raster based on *.shp file or some defined extent? use the GDAL utility inside the shell script (gdalwarp and gdal_translate) and these procedures is really awesome because these do not avoid the null values/No Data values in raster analysis/computation and so. Then, if you need to retrieve some averaged pixel value according to some boundary the solution might be using the ‘gdalinfo’ utility and that will create an *.xml/*.info file with the necessary statistics and after one can simply use the ‘grap’ or ‘xpath’ command to extract the necessary statistics from the *.xml/*.info file and print the output according to the boundary. if you need to do some mathematical operation go for ‘gdal_calc.py’. there are many other interesting actions which will make your problem easier and let you go bed early.
Then again, if you are dealing with temporal data then you might want to name the files with time stamps. If that’s the case, you can easily do that with some shell command like ‘rename’ or ‘sed’. Besides, ‘awk’ command can be very useful to find a matching name in a spread-sheet and extract the data from the spread-sheet and use that extracted value to rename the file/folder name. I also have found that the ‘date’ command can be very useful to convert date from different format.
All the tasks discussed above can be formatted separately and then if we are satisfied with the certain solution we can merge them together in a single *.sh script. next time you just put the retrieved data (from different source) in a $source folder and you will get the expected data in a $destination folder. So, your machine can think like you and work faster than you if you train and teach it.
Did you like this post? Read more and subscribe to our monthly newsletter!