The phenomenon of unidentified flying objects (UFOs) has captivated the imagination of people around the world for decades. While I don’t particularly admire Donald Trump, his decision to express interest in releasing information regarding UFOs and extraterrestrial intelligence, particularly evidenced by the Pentagon’s release of UAP videos in 2020, was intriguing. While many sightings remain unexplained, steps have been taken to allow individuals to document their sightings, including factors such as location, duration, shape, and date. For example, The National UFO Reporting Center NUFORC is a widely recognized organization dedicated to collecting and analyzing reports of UFOs from across the United States and beyond. However, the reliability of these sightings are often called into question due to various factors such as misidentifications, hoaxes, and the subjective nature of eyewitness testimony. As such, I wanted to invesigate whether there were any correlations between the locations of these sightings and the loacations of major airports in the USA.
To Determine Spatial Patterns: Investigate whether there are spatial correlations between the locations of major airports and reported UFO sightings in mainland USA.
To Assess the Duration of Sightings: Investigate the duration of UFO sightings in regard to the shape recorded.
The raw UFO data was sourced from the NUFORC website, and processed by Sigmond Axel to include geolocated and time standardized reports, which is publicly accessible on github.
The locations of major US airports was acquired from data.world and sourced from OurAirports. This dataset includes all North American airports, and is publicly accessible.
Additionally, the R package maps was used to draw a geographical map of mainland USA, which provides access to a diverse collection of geographical data, particularly maps of the world, countries, and states/provinces of various countries.
The project directory includes two main folders: “/data” and “/images”. The “/data” folder contains essential datasets for the project, while the “/images” folder houses both images necessary for the project and visual outputs generated during visualization processes.
Additionally, a comprehensive codebook detailing all labels and abbreviations utilized within the project for data, variables, functions, and other relevant entities is available at the location “/codebook.xlsx”.
The project used the ‘renv’ package to preserve package versions of the libraries used. The package versions used in this project are listed in the file /renv.lock
#load packages with renv
if (!require('renv'))
{
install.packages('renv');
}
library(renv)
renv::restore()
#import packages
library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(sf)
library(maps)
#load UFO data
ufo_data <- read.csv("https://raw.githubusercontent.com/tomhbird/PSY6422_UFO_AIRPORTS/main/Raw/complete.nosync.csv")
#load US airport data
airport_data <- read.csv("https://raw.githubusercontent.com/tomhbird/PSY6422_UFO_AIRPORTS/main/Raw/list-of-airports-in-united-states-of-america-hxl-tags-1.nosync.csv")
#trimming UFO data to only include mainland USA, removing Hawaii, Alaska, and Puerto Rico
ufo_data_usa <- ufo_data %>%
filter(country == "us" & !(state %in% c("hi", "ak", "pr")))
#remove columns "comments," "duration..hours.min.," and "date.posted"
ufo_data_trim <- subset(ufo_data_usa,
select = -c(comments, duration..hours.min., date.posted))
#detangle date.time column, spitting them into individual variables "date" and "time"
ufo_data_trim <- separate(ufo_data_trim, datetime, into = c("date", "time"), sep = " ")
#checking if latitude and longitude columns are in the correct data type
str(ufo_data_trim)
#as the latitude was a character column, it was converted into a numeric column
ufo_data_trim$latitude <- as.numeric(ufo_data_trim$latitude)
#checking to see if it was successfully converted
str(ufo_data_trim$latitude)
# Convert duration column to numerical
ufo_data_trim$duration..seconds. <- as.numeric(ufo_data_trim$duration..seconds.)
# Verify the change
str(ufo_data_trim$duration..seconds.)
# Replace blank recordings of shape with "unknown"
ufo_data_trim <- ufo_data_trim %>%
mutate(shape = ifelse(shape == "", "unknown", shape))
#display table of first 6 UFO sightings data
knitr::kable(head(ufo_data_trim))
date | time | city | state | country | shape | duration..seconds. | latitude | longitude |
---|---|---|---|---|---|---|---|---|
10/10/1949 | 20:30 | san marcos | tx | us | cylinder | 2700 | 29.88306 | -97.94111 |
10/10/1956 | 21:00 | edna | tx | us | circle | 20 | 28.97833 | -96.64583 |
10/10/1961 | 19:00 | bristol | tn | us | sphere | 300 | 36.59500 | -82.18889 |
10/10/1965 | 23:45 | norwalk | ct | us | disk | 1200 | 41.11750 | -73.40833 |
10/10/1966 | 20:00 | pell city | al | us | disk | 180 | 33.58611 | -86.28611 |
10/10/1966 | 21:00 | live oak | fl | us | disk | 120 | 30.29472 | -82.98417 |
# Filter the dataset to include only entries where the type is "large_airport" and remove rows where both longitude and latitude are 0
airport_data_filtered <- airport_data %>%
filter(type == "large_airport") %>%
filter(longitude_deg != 0 | latitude_deg != 0)
#checking if latitude and longitude columns are in the correct data type
str(airport_data_filtered$longitude_deg)
str(airport_data_filtered$latitude_deg)
#as the longitude and latitude was a character column, they were converted into a numeric column
airport_data_filtered$latitude_deg <- as.numeric(airport_data_filtered$latitude_deg)
airport_data_filtered$longitude_deg <- as.numeric(airport_data_filtered$longitude_deg)
#checking to see if it was successfully converted
str(airport_data_filtered$longitude_deg)
str(airport_data_filtered$latitude_deg)
#removing unecessary columns from airport dataframe
airport_data_filtered <- airport_data_filtered %>%
select(-id, -ident, -elevation_ft, -continent,
-iso_country, -iso_region, -scheduled_service,
-gps_code, -iata_code, -home_link, -wikipedia_link,
-keywords, -score, -last_updated)
#display table of first 6 USA major airport data
knitr::kable(head(airport_data_filtered))
type | name | latitude_deg | longitude_deg | municipality | local_code |
---|---|---|---|---|---|
large_airport | Los Angeles International Airport | 33.9425 | -118.4080 | Los Angeles | LAX |
large_airport | John F Kennedy International Airport | 40.6398 | -73.7789 | New York | JFK |
large_airport | Chicago O’Hare International Airport | 41.9786 | -87.9048 | Chicago | ORD |
large_airport | Hartsfield Jackson Atlanta International Airport | 33.6367 | -84.4281 | Atlanta | ATL |
large_airport | San Francisco International Airport | 37.6190 | -122.3750 | San Francisco | SFO |
large_airport | Newark Liberty International Airport | 40.6925 | -74.1687 | Newark | EWR |
# Plot map of USA
usa <- map("state", fill = TRUE, col = "transparent", plot = FALSE)
# Convert map data to data frame
usa_df <- fortify(usa)
# define data to plot
ufo_map <- ggplot() +
# Add the USA map polygons
geom_polygon(data = usa_df, aes(x = long, y = lat, group = group), fill = "beige", color = "black") +
# Add UFO sightings as points (darkgreen color)
geom_point(data = ufo_data_trim, aes(x = longitude, y = latitude, fill = "ufo sightings"), color = "darkgreen", size = 0.01) +
# Add airport locations as points (magenta color)
geom_point(data = airport_data_filtered, aes(x = longitude_deg, y = latitude_deg, fill = "major airports"), color = "magenta", size = 1.5) +
# Set the aspect ratio and limit the plot to the boundaries of the USA map
coord_fixed(xlim = range(usa_df$long), ylim = range(usa_df$lat)) +
# Add plot title
labs(title = "UFO Sightings in relation to Airports in the USA (1910-2013)", x = "Longitude", y = "Latitude",
color = "Data",
fill = "Key") + # Change legend title
# Set minimal theme with white background and remove grid lines
theme_minimal() +
theme(panel.background = element_rect(fill = "white", color = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
# Customize x and y axes
scale_x_continuous("Longitude", limits = range(usa_df$long)) +
scale_y_continuous("Latitude", limits = range(usa_df$lat)) +
# Position the legend at the bottom of the plot
theme(legend.position = "bottom")
# Save the graph as an image
ggsave("images/ufo_map.png", plot = ufo_map, width = 6, height = 4, dpi = 300)
To Determine Spatial Patterns: Investigate whether there are spatial correlations between the locations of major airports and reported UFO sightings in mainland USA.
This visualization illustrates the distribution of UFO sightings across the USA in proximity to major airport locations. The concentration of sightings around airports implies a potential correlation between reported “UFO” sightings and the presence of airports. Therefore, this suggests that individuals may mistake aircraft for UFOs and misinterpret their sightings.
#catagorising the top 20 most reported ufo shapes
# Count the frequency of each UFO shape
ufo_shape_counts <- table(ufo_data_trim$shape)
# Convert the table to a data frame
ufo_shape_df <- as.data.frame(ufo_shape_counts)
names(ufo_shape_df) <- c("shape", "count")
# Sort the data frame by count in descending order
ufo_shape_df <- ufo_shape_df[order(-ufo_shape_df$count), ]
# Select top 20 shapes
ufo_shape_top20 <- head(ufo_shape_df, 20)
# Calculate quantiles of duration
quantiles <- quantile(ufo_data_trim$duration..seconds., probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
# Define upper and lower bounds
Q1 <- quantiles["25%"]
Q3 <- quantiles["75%"]
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
# Filter data to remove outliers
ufo_data_trim_filtered <- ufo_data_trim[ufo_data_trim$duration..seconds. >= lower_bound & ufo_data_trim$duration..seconds. <= upper_bound, ]
# Create a new variable for duration categories
ufo_data_trim_filtered$duration_category <- cut(ufo_data_trim_filtered$duration..seconds., breaks = c(-Inf, 0, 300, 600, Inf), labels = c("Short: 0-300 seconds", "Short: 0-300 seconds", "Medium: 301-600 seconds", "Long: > 600 seconds"))
# Combine both data frames
combined_data <- merge(ufo_shape_top20, ufo_data_trim_filtered, by = "shape")
# Set count to 1 for every data point
combined_data <- combined_data %>%
mutate(count = 1)
# Create a stacked bar plot to visualize the frequency of top 20 UFO shapes and their durations
# Define data to plot
stacked_barchart <- ggplot(combined_data, aes(x = reorder(shape, count), y = count, fill = duration_category)) +
# Add stacked bars representing the frequency of each UFO shape and duration category
geom_bar(stat = "identity", position = "stack", width = 0.7) +
# Add plot title and axis labels
labs(title = "Most Common Shapes of UFOs and their Durations in Mainland USA",
x = "Shape",
y = "Frequency",
fill = "Duration") +
# Manually set fill colors for each duration category
scale_fill_manual(values = c("Short: 0-300 seconds" = "darkgreen", "Medium: 301-600 seconds" = "forestgreen", "Long: > 600 seconds" = "lightgreen")) +
# Flip the x and y coordinates to create a horizontal bar plot
coord_flip() +
# Set breaks for the y-axis to ensure proper scale
scale_y_continuous(breaks = c(0, 2000, 4000, 6000, 8000, 10000, 12000, 14000)) +
# Apply a minimal theme with white background and bottom legend
theme_minimal() +
theme(axis.title = element_text(size = 12), # Customize axis title font size
plot.title = element_text(size = 16, face = "bold", vjust = 1.5), # Customize plot title, font size, and style
legend.position = "bottom") # Position legend at the bottom of the plot
# Save the graph as an image
ggsave("images/stacked_barchart.png", plot = stacked_barchart, width = 12, height = 8, dpi = 350)
To Assess the Duration of Sightings: Investigate the duration of UFO sightings in regard to the shape recorded.
This visualization highlights a notable trend: the majority of reported sightings describe objects with a “light” shape. Additionally, there are frequent mentions of other light-related shapes such as “flash” and “fireball.” This suggests that aircraft may account for these sightings, given their characteristic appearance, especially at night time where they often have blinking lights. This further emphasises the influence of airport locations on recorded UFO sightings, as not only does visualisation 1 show that reports tend to congregate around airports, visualisation 2 shows that light related terms account for the majority of shapes recorded.
Furthermore, the duration of these sightings is particularly relevant. As the majority of sightings tend to fall into the short (0 - 300 second) variable, this suggests individuals may just be observing airplanes flying overhead and are mistaking it for a UFO.
Overall, the visualizations presented shed light on the intriguing relationship between reported UFO sightings, major airport locations, and the characteristics of observed objects. The concentration of sightings around airports suggests a potential correlation, raising the possibility that individuals may misinterpret aircraft as UFOs. Moreover, the prevalence of “light” related shapes in sightings, coupled with the typical duration of these events, reinforces this notion. It appears that many reported sightings align with typical aircraft activity, especially considering the prominence of blinking lights at night. These findings underscore the importance of considering environmental factors, such as airport proximity, when analyzing UFO reports. Ultimately, while the mystery of UFOs continues to captivate public interest, a critical examination of the data suggests that many sightings may have terrestrial explanations rooted in human perception and environmental context.
A limitation of this project was that I only focused on mainland USA, it would be interesting to see if this trend persists on a global scale. Especially, in countries that may not have as many large airports. Additionally, this project only looked at the relationship between UFO sightings and major US airports, it could add value to investigate smaller airports as well, and see whether UFO sighting reports also pool around them, even if its on a smaller scale. Furthermore, the project did not investigate the time of day of these sightings. Future research could investigate the correlation between the time of day of sightings and airport locations, to see whether factors such as nighttime may affect the number of sightings reported.
A further limitation comes from the data used. The UFO data set ranged from 1910-2013, which may have had some confounding effects on the data presentation, considering many of the airports investigated were build after 1910. As such, it may have been more beneficial to use a more updated data set that with more recent sightings included and older sightings removed. Despite this, the project still highlights a relationship between airport locations and UFO sightings, suggesting these “unidentified flying objects” may just be identifiable flying objects - planes!
The citations provided in the “Data Origins” section of this project are not permanent references as the websites utilized lack a DOI for inclusion in this project. Consequently, these web pages are subject to change or removal, potentially affecting the reproducibility of certain project components.