Data

Dataset

The full processed dataset used in this visualisation is publicly archived on Zenodo. It includes article level metadata, publication dates, extracted placenames, and approximate geolocations used to generate the map and timeline views.

View dataset on Zenodo

DOI: 10.5281/zenodo.6622328

Understanding the dataset

View dataset explainer

This web application visualises a subset of a larger bushfire dataset. Both datasets are archived on Zenodo, with the subset used here listed as the first dataset. Detailed information about the second, more comprehensive dataset is provided there.

This section outlines the fields contained within the dataset and explains how each variable should be interpreted. Placename coordinates are derived from the Gazetteer of Historical Australian Placenames (GHAP), hosted by the Time Layered Cultural Map of Australia: https://www.tlcmap.org/ghap/.

Key fields

Field	Description
Index	Index value for each record.
article_id	Unique identifier for a newspaper article. The same identifier indicates the same article.
article_placename	Placename extracted from the article in reference to bushfires. Each placename can be treated as a reported bushfire occurrence. Articles may contain multiple placenames.
filename_2	Trove-generated identifier for each article extract, including segmentation markers such as “(1 of 2)”. These reflect internal text chunking during processing.
Longitude, Latitude	Approximate point coordinates assigned to the extracted placename.
State_2	State in which the placename is located (for example VIC, NSW). “No best estimate” indicates insufficient gazetteer evidence.
n_results	Number of gazetteer entries returned for the placename.
winnerPct	Confidence rating for the selected state assignment, based on the distribution of results across states.
searchType	Indicates whether placename matching was exact or fuzzy.
Threshold	Levenshtein similarity threshold used for fuzzy matching.
Date	Publication date of the article.
url, page_url	Links to the article record and the specific newspaper page.

Quality control fields

These fields were generated for internal validation and are not required for most users.

Mean_median_dist	Internal measure used to check coordinate consistency across returned gazetteer entries.
Median_median_dist	Median distance across returned gazetteer entries. Values greater than 1 indicate ambiguity due to multiple distinct locations sharing the same name.

Citation

If you use this dataset in research, publications, or derivative works, please cite it as follows:

Fiannuala Morgan. (2022). Finnoscarmorgan/Historical_Fires_Near_Me(v1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6622328

Use and limitations

The dataset is provided for research and exploratory purposes. It reflects historical newspaper reporting practices rather than a complete or authoritative record of all bushfire events. Geographic locations are approximate and publication dates do not always correspond to the precise timing of fire activity.

Rights and licence

This dataset is licensed under the Creative Commons Attribution 4.0 International licence (CC BY 4.0). You are free to share and adapt the material for any purpose, including commercial use, provided appropriate credit is given.

Code

All software developed for the collection, processing, analysis, and visualisation of the data is openly available. The codebase includes scripts for harvesting newspaper articles, extracting placenames using Named Entity Recognition, applying the disambiguation heuristic, and generating the spatial and temporal representations used in this site.

View software on GitHub

Software citation

Morgan, F. (2021). Historical_Fires_Near_Me (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.6622328