GEOM90018 Spatial Database. A4: Putting it all together...
Objectives
By the end of this practical you should be able to:
1. design your own solutions to realistic spatial and spatiotemporal analysis problems;
2. combine multiple spatial operations together to support an investigation of a large spatial data set;
3. present the results of your investigation using a spatial database clearly and carefully.
Overview
Where previous practicals have asked you to find the solution to well-specified problems, this practical asks you to
design a solution to investigate more open-ended problems - as a data analyst. You will use as the focus for your
investigation the table "checkins", possibly n conjunction with any other data in the "spatial" schema. The table
contains the time (date, time), location (geom), user id (uid), location id (poi) of approximately 6M checkins made
by Gowalla (social network) users between Feb 2009 and Oct 2010. The table "friends" also contains the friend
associations between Gowalla users identified in the check-ins.
OSM and hstore
For this assignment, you may also use the newly imported OpenStreetMap (OSM) dataset of Melbourne region,
downloaded from bbbike.org on 1 May 2019. The data was imported into the geom90018 PostgreSQL database
using the osm2pgsql tool. The data is stored into 7 tables whose names start with "melbourne_osm_"
(melbourne_osm_nodes, melbourne_osm_point, melbourne_osm_polygon, melbourne_osm_rels,
melbourne_osm_roads, melbourne_osm_ways), all of which are located in schema "spatial". It is advisable that
you read about the OSM elements in order to understand how OSM models real world objects with nodes, ways,
and polygons, and how it uses tags to record their attributes. Since tags are in fact key-value pairs, they cannot be
represented with the data types we have previously used in postgresql (e.g. integer, varchar, or geometry). Instead,
they can be stored into hstore data type, which is available as an extension to the PostgreSQL database. Hstore
allows users to store multiple key-value pairs into a single column, and do some operations on them (examples of
some operations are shown below, and you can see the whole list in the hstore documentation website). When
working with these tables keep in mind the following things:
• Nodes, ways, and rels tables are the direct representation of what is stored in the OSM databases. These
tables are stored in our database without hstore columns for tags, and without geometry columns. You
may still use these tables in your analysis if you want to.
• Point, line, polygon, and roads tables are abstracted from data in the nodes, ways and rels tables by the
osm2pgsql tool (there has been some processing done on these tables in order to adapt them to and make
them usable in a PostgreSQL database). These tables are stored in our database with hstore "tags"
columns, and geometry "way" columns. Also keep in mind that these geometries are stored in the WGS
84 / Pseudo-Mercator (EPSG:3857).
Here are some examples of how you can get familiar with the
new OSM data:
SELECT ST_SRID(way) FROM
spatial.melbourne_osm_point LIMIT 1;
SELECT tags->'building', way FROM
spatial.melbourne_osm_polygon WHERE
tags?'building';
SELECT tags->'building' as building, count(*) FROM
spatial.melbourne_osm_polygon WHERE
tags?'building' GROUP BY tags->'building' ORDER BY
count DESC;
SELECT osm_id, (each(tags)).key, (each(tags)).value
FROM spatial.melbourne_osm_polygon WHERE
tags?'building' LIMIT 100;
Assessment A4: Advanced Spatial and Spatiotemporal Analysis
Preliminaries
1. This assignment is worth 16% of your final class mark.
2. The due date for the assignment is clearly stated on course web site, accessible via Canvas.
3. NO LATE ASSIGNMENTS WILL BE ACCEPTED. It is your responsibility to ensure you are aware of
the assignment due date posted on the course web site.
Assignment
Your task is to perform some analysis of this data set: to pose - and answer - some question of your choosing or
reveal some interesting patterns in the data. Examples of possible analyses might include (but are not limited to):
• Identifying interesting spatial clusters of check-ins, such as the pattern of "footprints" of check-ins for
each user, for a given region;
• Examining the relationship between the friend network and the spatial location of check ins, such as
answering the question of whether friend network distance and spatial distance between check-ins are
correlated; or
• Looking for spatiotemporal trends in the patterns of check-ins, such as identifying whether certain
locations or points of interest are tending to increase or decrease in popularity over time.
You are strongly encouraged to develop your own analyses however, and not rely on these indicative suggestions.
You can support your motivation by literature or current events (such as COVID-19) and then, reference these
properly.
Submission
You must submit two files in one zipped archive as your assignment:
1. Your plain text SQL script, which performs all of the steps necessary to perform your analysis. Your
SQL script must be named .txt (or .sql) where is your student number. This
should also be the name of your zipped archive.
2. A one-page PDF summary of your analysis. Your PDF summary must be named .pdf where
is your student number. The format of this PDF summary is flexible, but it must at least
include: o A clear, short statement of the problem your script solves or analysis your script performs;
o A clear, short motivation for your problem or analysis, why your analysis is interesting;
o A clear, short summary of the key steps used to complete your analysis, highlighting in
particular the most important, challenging, or original parts of your solution;
o A clear, short summary of the key results, i.e., what your analysis shows; and
o One or two images that summarizes your results (e.g., a map or perhaps a data graphic or chart,
depending on your chosen analysis - note the usual requirements for data presentation apply).
You must submit your files online using the course web site (accessible via Canvas. No other submission method
is acceptable.
Marking
• Your submission will be marked out of 16. • Marks will be deducted for any incorrect submissions (e.g., using a Word or HTML document instead
of a plain text SQL file or PDF summary; SQL scripts that contain non-SQL parts, or fail to execute
directly when marked, or require adaptation or editing to run).
• Marks will be allocated in four areas:
o Presentation (up to 3 marks): reflecting the clarity and care of presentation of the submission
(of both the SQL script and the PDF submission, and taking care of good practices in data
(geo)visualization).
o Technical correctness (up to 3 marks): reflecting the technical correctness of your SQL script
and the design of your analysis procedure.
o Technical challenge (up to 5 marks): reflecting the level of technical challenge of the analysis,
for example that combine geometric, topological, thematic, and even network
operations. Creativity in your tools’ selection is highly encouraged, as is ’thinking outside the
box’. In other words, if you acquired appropriate skills from other subjects, you can
incorporate them in your work, but the emphasis still needs to be on the operations learned in
this subject. You are encouraged to explore functions of PostGis we have not explored,
including advanced ones (i.e., Raster data handling, etc)
o Originality and significance (up to 5 marks): reflecting the care and attention that has gone into
developing an interesting and novel question to answer, problem to solve, or pattern to reveal.
Some other points to note:
• You may have more than one SQL statement in your script (e.g., you may have multiple SELECT,
UPDATE, INSERT, or CREATE statements, if perhaps your script creates new tables which later parts
of your analysis use). However, your SQL script must execute as a single process without any interaction
or editing by the marker.
• You may also include other data, in addition to the check-ins and friend network data, in your analysis,
on condition it is either already in the class database (e.g., US states data sets) or you include additional
data as part of your SQL script (i.e., using CREATE TABLE and INSERT INTO statements). If you
include additional data or create new tables, you must also include in your script statements to DROP
any tables created at the end of the script (if necessary, commented out so the marker can still easily
remove new tables after marking your script).
• You are not required to use the entire check-ins/friend network (indeed, you may prefer not to use all 6M
records). It is only required that you use the checkin data set to discover something new or answer an
important problem. The "checkins_small" table has been created with a subset of 100,000 records from
the "checkins" table, in case you wish to try out ideas on a smaller table. Your entire script must execute
in a reasonable amount of time (i.e., less than 2-3 minutes).
• You need to include comments in your SQL script (correctly formatted, preceded by the "--" comment ).
Good quality comments increase the clarity of your script and will improve your marks in the
"Presentation" category.
• If you are stuck in thinking of a good analysis, spend a little time just looking at (some of) the data
(perhaps in QGIS). What sorts of patterns suggest themselves? You might also approach the assignment
from the opposite direction by thinking first about the technical challenge (the most important marking
criterion): what combinations of geometric, topological, thematic and network operations (and
potentially others) could you imagine might lead to interesting results.