Data
Dataset
The full processed dataset used in this visualisation is publicly archived on Zenodo. It includes article level metadata, publication dates, extracted placenames, and approximate geolocations used to generate the map and timeline views.
DOI: 10.5281/zenodo.6622328
Understanding the dataset
View dataset explainer
This web application visualises a subset of a larger bushfire dataset. Both datasets are archived on Zenodo, with the subset used here listed as the first dataset. Detailed information about the second, more comprehensive dataset is provided there.
This section outlines the fields contained within the dataset and explains how each variable should be interpreted. Placename coordinates are derived from the Gazetteer of Historical Australian Placenames (GHAP), hosted by the Time Layered Cultural Map of Australia: https://www.tlcmap.org/ghap/.
Key fields
| Field | Description |
|---|---|
| Index | Index value for each record. |
| article_id | Unique identifier for a newspaper article. The same identifier indicates the same article. |
| article_placename | Placename extracted from the article in reference to bushfires. Each placename can be treated as a reported bushfire occurrence. Articles may contain multiple placenames. |
| filename_2 | Trove-generated identifier for each article extract, including segmentation markers such as “(1 of 2)”. These reflect internal text chunking during processing. |
| Longitude, Latitude | Approximate point coordinates assigned to the extracted placename. |
| State_2 | State in which the placename is located (for example VIC, NSW). “No best estimate” indicates insufficient gazetteer evidence. |
| n_results | Number of gazetteer entries returned for the placename. |
| winnerPct | Confidence rating for the selected state assignment, based on the distribution of results across states. |
| searchType | Indicates whether placename matching was exact or fuzzy. |
| Threshold | Levenshtein similarity threshold used for fuzzy matching. |
| Date | Publication date of the article. |
| url, page_url | Links to the article record and the specific newspaper page. |
Quality control fields
These fields were generated for internal validation and are not required for most users.
| Mean_median_dist | Internal measure used to check coordinate consistency across returned gazetteer entries. |
| Median_median_dist | Median distance across returned gazetteer entries. Values greater than 1 indicate ambiguity due to multiple distinct locations sharing the same name. |
Citation
If you use this dataset in research, publications, or derivative works, please cite it as follows:
Fiannuala Morgan. (2022). Finnoscarmorgan/Historical_Fires_Near_Me(v1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6622328
Use and limitations
The dataset is provided for research and exploratory purposes. It reflects historical newspaper reporting practices rather than a complete or authoritative record of all bushfire events. Geographic locations are approximate and publication dates do not always correspond to the precise timing of fire activity.
Rights and licence
This dataset is licensed under the Creative Commons Attribution 4.0 International licence (CC BY 4.0). You are free to share and adapt the material for any purpose, including commercial use, provided appropriate credit is given.
Code
All software developed for the collection, processing, analysis, and visualisation of the data is openly available. The codebase includes scripts for harvesting newspaper articles, extracting placenames using Named Entity Recognition, applying the disambiguation heuristic, and generating the spatial and temporal representations used in this site.
Software citation
Morgan, F. (2021). Historical_Fires_Near_Me (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.6622328