Info: This article is intended for data creators and publishers using data.wa.gov.au’s geospatial Data Upload Tool to publish their data. If you are not publishing geospatial data, or are but have decided not to use the Data Upload Tool, please refer to the articles Publishing a dataset and Preparing data for publishing for general guidance around preparing and publishing datasets on data.wa.gov.au.
For discussion on the pros and cons of each approach to publishing geospatial data please refer to An introduction for geospatial data publishers.
Introduction
This article will cover the key technical requirements that geospatial data must meet in order to be uploaded and published through the geospatial Data Upload Tool (abbreviated as DUT), including:
- the supported file formats
- a data preparation checklist,
- a detailed list of the caveats and requirements for each supported file format that must be met for data to be loaded through DUT.
All data loaded into DUT will go through a series of validation and repair steps to ensure it meets the data and quality requirements. Any issues encountered during validation will cause the data load to fail, and will be displayed on your publishing dashboard in DUT. They will need to be addressed, and the data re-uploaded, before it can be loaded into the data store.
Supported data formats
DUT supports two file formats for data supply: Esri File Geodatabases and Shapefiles.
Wherever possible, it is recommended to use File Geodatabases instead of Shapefiles. Shapefiles are a legacy format that are supported for backwards compatibility with the old SLIP Enabler FTP Inbox, and to provide a non-proprietary non-Esri data format.
File Geodatabases are a more modern and robust data format that don’t suffer any of the limitations that are inherent to Shapefiles. Further, all data loaded into DUT is stored as a geodatabase. Shapefile have the additional overhead of conversion between formats.
For a more detailed discussion of the differences between File Geodatabases and Shapefiles please see Shapefiles vs. Geodatabases [Duke University Library].
Feedback: Please provide feedback about other data formats that DUT should support. Longer-term, Shapefiles may be phased out and additional non-proprietary data formats will be introduced (such as GeoPackages). Please get in touch and outline your requirements.
Data Preparation Checklist
There are some common issues that cause 90% of problems during the data load. Below is a handy checklist for data publishers to use during preparing data for loading, or if you’ve received a notification about a failed data load.
Checklist
- Feature Class name mismatches. The Feature Class Name that you provided in DUT (on the Attributes tab) must match the name of the Feature Class (for File Geodatabases) or filename (for Shapefiles). The Feature Class Name is case-sensitive.
- Data schemas must match. The data being uploaded must exactly match the data schema provided in DUT (on the Attributes tab). In must have the same number of fields that use the same attribute/column names, and the same data types and sizes.
- Don’t mix NULL and empty/blank values. If you’ve chosen to allow an attribute to be Nullable in DUT (on the Attributes tab), then all empty or blank values must be set to NULL. If you’ve left Nullable unchecked, then NULL values must not be included.
- Dates must use a consistent format. Any attributes use the date data type must use a single consistent format. i.e. there can’t be some rows using 11/12/2018 and others using 2018-12-11.
- Dates must not include a time component. Due to limitations in the data store attributes using the date data type may not include a time as well as the date. If you need to store time and date information together, it is suggested to use the text/ string data type instead.
- Dates and FME. If you’re preparing data using FME please avoid using the inbuilt fme_date data type, or the DateFormatter transformer, as these use a special date data type that cannot be properly read by the data load tools. Instead, please ensure any date attributes are converted to a string/ text data type first.
- One Feature Class per File Geodatabase. If you’re supplying data as a File Geodatabase ensure that it only contains a single Feature Class.
If the above or additional issues are encountered, you’ll receive a notification about a failed data load. Please refer to the “Receiving your data upload notifications” of the Managing Data Loads article for more detailed information about data load notifications.
Tip: If your data load issue is caused by a data schema mismatch (e.g. a different Feature Class name, or different attribute names) and that change is intentional, then you need to raise a change request for the dataset. Please refer to the Managing and changing your data article where we discuss changing a published dataset.
Geometry Repair
Once the data load process has successfully validated your data, a set of standard geometry validation and repair rules is applied. In practice, it is rare to ever encounter issues with invalid geometries that need be repaired.
GIS software and data formats have advanced to the point where creating invalid geometry is very difficult. In the unusual event where invalid geometry is encountered in a dataset, Esri’s standard geometry repair tools are applied. For more information about this, include what sorts of issues we will repair (and how), please refer to Checking and repairing geometries [Esri].
Any issues encountered and fixes that are applied will be noted as warnings in the DUT console. When you load a data for the first time, it is recommended that you check to see if the issues have been repaired (even if the load was successful).
If the geometry repair tools are unable to automatically repair the issue/s they will cause the data load to fail. In this extremely rare case, you will need to resolve the identified issues in the data first before re-uploading. Both the Esri and QGIS GIS packages come with geometry repair tools that can be applied:
- ArcMap Repair Geometry tool
- ArcGIS Pro Repair Geometry tool,
- QGIS Geometry Checker Plugin.
Help: If you believe that issues with your geometry are being incorrectly raised, or you have any questions about the geometry repair process, please get in touch and the team will be happy to discuss it.
File Geodatabases
As part of preparing a File Geodatabase for loading through DUT there are important tips and key requirements that you should be aware of.
Feature Class name
Your Feature Class name is used as the name for your datasets in DUT. As such, it’s important that the Feature Class Names are:
- easily recognisable (both to yourself and Landgate)
- are human-readable (e.g. “lg-poly-34” is a poor Feature Class name because it doesn’t convey any information about the dataset),
- must be unique within your organisation (the same Feature Class name can’t be used as a dataset already loaded into DUT by your organisation).
One Feature Class per File Geodatabase
File Geodatabase supplied to DUT may only contain a single Feature Class.
Dataset extents and precision
This only applies when exporting data directly from ArcSDE. Fixed extents that are precision enabled are not supported by DUT. Feature Classes need to use ‘dynamic extents’ that are recalculated as features that are added, modified, and removed.
Reserved words
There are certain reserved words that may not be used for attribute names due to technical constraints imposed by the data store. If you are using any of the following terms as attribute names, please rename them as part of your data preparation process. For example, if you’ve used “order” as an attribute name it can be renamed to “sort_order”.
Please refer to What are the reserved words for Esri's file geodatabase? [Esri FAQs].
M (measure) or Z (elevation) values are now supported
Added support for datasets containing Z (elevation) and M (Linear referencing) values. Note, for existing Active datasets, datasets that are currently live, a change request will be required to include M and/or Z values.
Blank records
As part of data preparation, you will need to remove any blank records from your data. If data is submitted that contains blank records it will become corrupted when loading into our SDE database.
You can check if you have any blank records by:
- Open the attribute table for your dataset
- Select all features
- If your feature count summary shows that you have less features selected than you have in the dataset, then you have blank records.
- To remove blank records, reverse the selection to view your blank records, and then delete them.
Special characters
There are certain special characters that must not be used as part of attribute names. If you’re using any of these, please remove them as part of the data preparation process. As a guide, attribute names must not:
- Contain characters that are not alphanumeric or an underscore.
- Start with an underscore or a number.
Date fields
If your dataset contains an attribute using the date data type, it must meet the following requirements. It must:
- be formatted as DD/MM/YYYY
- not include a time component,
- use NULL to indicate ‘no date’, rather than leaving the field empty/ blank.
These requirements don’t apply if the date attribute is using the string/ text data type. If your data can’t be altered to suit these requirement, it is suggested to use the text/ string data type instead.
If you’re using FME to prepare the data, please avoid using the inbuilt fme_date data type, or the DateFormatter transformer. These use a special date data type that cannot be properly read by the data load tools. Instead, please ensure any date attributes are converted to a string/ text data type first.
Domains
Domains and sub-types are not fully supported by the data load process. As part of loading your data, domain codes will be loaded, but not their descriptions.
If you’re using domains and they are important to understanding the data, it is recommended that you prepare your data to store the attribute as a string/text field containing the domain description, not the domain codes. In either case, make sure that you provide this information as part of the data dictionary.
If you’re using FME to prepare the data, you can easily drop domain types using the approach to resolving domains and sub-types detailed in Working with Geodatabase Subtypes and Domains [FME Knowledgebase].
Zipping
Lastly, as part of preparing the data you will need to zip the File Geodatabase before uploading it to DUT. There are a few technical gotchas here – so please refer to the Zipping your data for uploading section of the guide on Scheduling data updates.
Shapefiles
Wherever possible, it is recommended to use File Geodatabases instead of Shapefiles. Shapefiles are a legacy format that are supported for backwards compatibility with the old SLIP Enabler FTP Inbox, and to provide a non-proprietary non-Esri data format.
File Geodatabases are a more modern and robust data format and don’t suffer any of the limitations that are inherent to Shapefiles. Further, all data loaded into DUT is stored as a geodatabase. Shapefiles have the additional overhead of conversion between formats.
For a more detailed discussion of the differences between File Geodatabases and Shapefiles please see Shapefiles vs. Geodatabases [Duke University Library].
- For shapefile: fid will change to objectid (system generated).
- For shapefile: SHAPE/GEOM will become st_length(shape) / st _area(shape).
Attribute name truncation
Due to the age of the Shapefile format, attribute names are limited to no more than 10 characters. Any attributes that are longer than 10 characters are automatically truncated and altered to fit within this limit (e.g. “description” becomes “descriptio”).
Exactly how this truncation is applied will depend on the software you’re using to export and create the Shapefiles. In some cases – such as ArcMap – the attribute names that are created will be different each time you create a new Shapefile.
If this issue is encountered and you’re using ArcMap, it is strongly suggested to switch to supplying data as File Geodatabases (for reasons discussed in more detail above). If you are currently unable to switch to using File Geodatabases, then please get in touch to discuss workarounds.
Fid and objectid issue
If you’re using a field with the name “objectid” to track an internal corporate id for each record in your dataset, it’s recommended that you rename it to something more specific – such as “lg_corp_id”. As part of the DUT load process fields with the name “objectid” will be concatenated together to form a single system-generated object id field. As a result, this will reset your id values.
Shape_length and Shape_area issue
As part of the DUT data load process the “shape_length” and “shape_area” fields will be renamed to “st_length(shape)” and “st_area(shape)”.
Getting help
If you need help preparing your data, or have any questions about this article, please get in touch.
The explanations provided here are intended to be concise and to-the-point. The team are happy to discuss the technical specifics of individual requirements in detail. Please provide feedback on the requirements if they cause issues for your organisation.