Reference: This article should be read alongside our Scheduling data updates article, where we cover many more important concepts and decisions about scheduling updates to data.
We have prepared a series of guides for data publishers that provide a step-by-step technical walkthrough of scheduling updates to your data through data.wa.gov.au’s Data Upload Tool.
Each guide covers:
- Preparing your datafor ingest through DUT
- The software requirements for uploading data
- Step-by-step instructions detailing how to perform an upload
- Guidance about scheduling uploadsto automate the process of uploading data
The purposed of this document is to provide FME users a step by step guide how to build a FME workbench that can be used to automate Data Upload Tool data updates. This document also contains work instructions how to modify an existing template workbench that we have created.
This guide and the associated template workbench are compatible with all versions of FME from 2015 onwards. Where possible, we recommend using FME 2017 – changes to the S3Uploader transformer result in a greatly simplified workbench.
Our example workbench was designed and built with FME 2016.1.0.1 (2016-0516 – Build 16494 – WIN32).
Installing the boto3 Python module in FME
Boto is the Amazon Web Services (AWS) SDK for Python, which allows Python developers to write software that makes use of Amazon services like S3 and EC2.
Why do I need boto3 when FME already has S3 Uploader Transformer?
All FME AWS transformers prior to version 2017 are hardcoded to work only with the us-east-1 AWS region. As our Data Upload Tool is only hosted in the Sydney AWS region these transformers will not work. The work around to this problem is to install boto3 and use custom Python code to handle the S3 uploading to the Data Upload Tool.
Tip: If you have multiple versions of FME installed you will need to repeat the boto install process for each version of FME that you will be using to upload data.
You’ll need to install boto within your FME-specific Python, not your system Python install or your ArcMap/QGIS Python installs.
Info: It will need to download some files from the internet so make sure your IT security allows downloads.
- Download https://bootstrap.pypa.io/ez_setup.py
- Open a DOS or PowerShell command prompt
- Change directory (cd command) to the folder you downloaded ez_setup.py to
- FME should already be on your system path, so you should be able to run fme python ez_setup.py:
Info: ez_setup will need to download the setup files for boto from pypiy.python.org. Depending on how your corporate proxy is setup, you may need to supply some additional configuration here to point Python to your proxy and provide your logon credentials.
- Install the boto3 package, fme python -m easy_install boto3:
Creating an FME workbench from scratch
If you prefer to use an existing template rather than go over the steps to build your own, go to the “Using our example workbench” section of this article.
Retrieve your Data Upload Tool upload credentials
Login to the Data Upload Tool, navigate to one of your datasets, and select the “Upload” tab. Here you’ll find your AWS S3 credentials.
Start a new workbench
Open a new blank workspace in FME:
Right click on the published parameters icon and select add parameter. When adding all the parameters, it is important that the names are exactly the same as the ones used in this document.
Add a parameter to store your AWS access key:
Repeat and add a parameter to store the secret key:
Add two more parameters to store the AWS S3 Bucket and the S3 Bucket Key values.
Note: The Bucket name and Bucket Key are also displayed in the Upload tab.
Finally, add a parameter to enable and disable uploading your data to S3. This toggle can be used for debugging purposes or preparing a new dataset ready for registering in the Data Upload Tool.
At this point your workbench should now have the following published parameters displayed in the navigator panel:
Creating a Python shutdown script
Next double click on the Shutdown Python Script icon:
This will open the Python editing dialog.
Grab the Python shutdown script from GitHub, and paste it into this dialog. https://github.com/datawagovau/fme-workbenches/blob/master/upload-geospatial-data/fme/fme-python-shutdown-script.py
Testing the workbench
The FME workbench modification are now complete. The next stage is to add in your own data workflow.
Tip: Make sure there is only one writer exporting the data into either Shapefile or FileGDB format.
Note: It is important that the FME parameter name used to define the destination writer is left as the default name. In the case of a shapefile the parameter name should be.
If the workbench has been configured to export an Esri FileGDB the parameter name should be.
To provide a simple demonstration how this script works, add a data reader and set the source to an existing datasource. Next add a Esri FileGDB or ESRI Shapefile writer. Join the reader and writer transformers together.
Ensure all the FME parameters have been configured correctly. The parameters should look similar to the below:
Now run the workbench, if the test ran successfully, the log will include details of the compressed file and the s3 bucket the data was uploaded to.
As a final check, go back to the Data Upload Tool and check the activity logs for any recently detected uploads.
Scheduling your workbench
Scroll to the top of the log and take note of the command used to run the work bench. These commands can be used in conjunction with windows schedule tasks to regular update your datasets in the Data Upload Tool.
Using our example workbench
Download our template workbench from GitHub: https://github.com/datawagovau/fme-workbenches/blob/master/upload-geospatial-data/fme/datauploadtool_uploader_v3.fmw.zip?raw=true
When opened, the template will have basic instructions summarising the instructions contained in this section of the document:
Note: If not already done follow the steps earlier in this article to install Boto within FME’s Python install.
Adding a reader
Add a Reader to your workbench and configure it to point to your agency internal dataset:
Next we’ll demonstrate how to use the workbench template to export a ArcSDE feature class to the Data Upload Tool.
Click the Parameters button to open up the SDE connection window.
If you have an existing ArcSDE connection file use it to quickly populate the connection properties. (You may need to add the word port: to the instance. Next select the feature class by clicking on Table List Button. Click OK to add the selected feature class to the workbench.
Add either a FileGDB or Shapefile output writer to your workbench:
For this example, we will use the ESRI file geodatabase format. Set the Output path. This output path will also be the path where the zip file will be created. This zipfile will be the file uploaded to Data Upload Tool.
Click the parameters button and tick on the overwrite existing Geodatabase:
Click the OK buttons on the writers dialog to add the file geodatabase writer to the workbench.
Next click on the Writers settings icon:
Enter the feature class name. Note: The name must match the feature class name used to register the original dataset in the Data Upload Tool. Change the geometry to the relevant geometry type.
Check that the schema matches what has already been registered in the Data Upload Tool. If the schema doesn’t match the upload will fail the Data Upload Tool schema validation checks.
Optionally at this stage you can add further transformer to the workbench to manipulate your data ready for the Data Upload Tool. In this example we have just connected the reader to the writer.
Check that the Upload_To_S3 parameter is set to No and the name of the destination dataset is called DestDataset_FILEGDB. Run the workbench.
Since the UP_LOAD_TO_S3 parameter is set to No the AWS parameters can be set to any value.
Check the FME log to see if the translation completed successfully. Take note of the location of the zip file.
The destination path should contain both a filegeodatabase and a zipfile.
Next proceed to testing the uploading of the zipfile for the Data Upload Tool AWS S3 bucket.
Log to the Data Upload Tool to retrieve the AWS parameters for your agency.
Ensure that your dataset Stage is Active. Click on the Upload Tab to show the upload details:
Enter these values into the corresponding workbench parameters and set the UP_LOAD_TO_S3 to Yes:
Test your work bench again. The FME log will now display the AWS S3 URL the zip file was uploaded to.
Finally, check that the upload has been detected by viewing the Upload Events of this dataset. It may take a few minutes to show up.
If the status shown in the Upload event displays “succeeded” then the workbench is ready to batch.
FME server users can use the FME server scheduler to control frequency of execution of this workbench.
User who don’t have access to FME server can create batch file and paste the commands shown up the top of the FME log. http://www.wikihow.com/Write-a-Batch-File
To schedule this job to run at regular intervals open Windows Task Scheduler and select create basic task:
Type in a meaningful task name and description:
Select how often you want the task to run:
Enter what time you want to execute the task:
Select "Start a program":
Navigate to your batch file:
Your newly created task will now be shown in the task scheduler library:
Right click on your task and select properties:
Configure the security options. Note if your account password expires the task will not execute. It is recommended to use a service account that’s password never expires.
Need more help?
Python documentation: https://www.python.org/
boto3 documentation: https://boto3.readthedocs.io/en/latest/