2. Loc 3: Geocoding and Great-Circle Distances¶

ISE 754, Fall 2024

Package Used: Functions from the following package are used in this notebook for the first time:

  • DataFrames: Getting Started

  • GeoMakie: Geographic plots for Makie

  • Logjam: https://github.com/mgkay/Logjam

To install Logjam, use the following command in your Julia REPL:

using Pkg

Pkg.add(url=("https://github.com/mgkay/Logjam.git")

1. DataFrames¶

The DataFrame package can be used to work with structured tabular (i.e., row-column) data. To create a single-row table:

In [1]:
using DataFrames
df = DataFrame(Name = "Mike", Age = 44)
Out[1]:
1×2 DataFrame
RowNameAge
StringInt64
1Mike44

To add a row to the table:

In [2]:
push!(df,("Bill",40))
Out[2]:
2×2 DataFrame
RowNameAge
StringInt64
1Mike44
2Bill40

To create a two-row table:

In [3]:
df = DataFrame(Name = ["Mike","Bill"], Age = [44, 40])
Out[3]:
2×2 DataFrame
RowNameAge
StringInt64
1Mike44
2Bill40

To access portions of a table:

In [4]:
name = df.Name
Out[4]:
2-element Vector{String}:
 "Mike"
 "Bill"
In [5]:
df.Age
Out[5]:
2-element Vector{Int64}:
 44
 40
In [6]:
df[2,:]
Out[6]:
DataFrameRow (2 columns)
RowNameAge
StringInt64
2Bill40
In [7]:
df[2,"Name"]
Out[7]:
"Bill"

2. U.S. Places¶

Use Logjam to create a DataFrame with geographic and population data on U.S. places (city, town, or census-designated place (CDP)):

In [8]:
using Logjam.DataTools, DataFrames

df = usplace()
Out[8]:
31617×13 DataFrame
31592 rows omitted
RowSTFIPPLFIPNAMESTLATLONPOPALANDAWATERLSADFUNCSTATCBSAISCUS
Int64Int64StringString3Float64Float64Int64Float64Float64String3String1Int64?Bool
11988AlbertvilleAL34.2631-86.21072238626.9390.125A10700true
211132Alexander CityAL32.9272-85.93711484343.7010.2925A10760true
311852AnnistonAL33.6735-85.81092156445.8250.0725A11500true
413076AuburnAL32.6077-85.48957614360.6511.02825A12220true
517000BirminghamAL33.5272-86.797200733147.0282.52225A13820true
6118976CullmanAL34.1769-86.84071821321.6261.33125A18980true
7119648DaphneAL30.6277-87.8842746219.0010.11825A19300true
8120104DecaturAL34.573-86.98995793854.3586.50225A19460true
9121184DothanAL31.2337-85.40687107289.8270.32625A20020true
10124184EnterpriseAL31.3269-85.8442871130.9580.06725A21460true
11124568EufaulaAL31.9099-85.15271288259.51513.9625A21640true
12125240FairhopeAL30.5209-87.87912247714.4750.05625A19300true
13126896FlorenceAL34.8303-87.66614018426.5180.21125A22520true
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
316065681300WamsutterWY41.668-107.9792031.6270.043Amissingtrue
316075681640Warren AFB CDPWY41.1471-104.86228634.8110.04957Smissingtrue
316085681703Washam CDPWY41.0056-109.699393.1120.057Smissingtrue
316095682967Westview Circle CDPWY42.0616-105.068412.30.03757Smissingtrue
316105683040WheatlandWY42.0517-104.95935884.0980.043Amissingtrue
316115683100Whiting CDPWY42.0089-104.971941.1590.057Smissingtrue
316125683765Wilson CDPWY43.4809-110.921156723.0240.41657Smissingtrue
316135684852Woods Landing-Jelm CDPWY41.1015-106.02911816.7050.057Smissingtrue
316145684925WorlandWY44.0204-107.96247734.5880.05425Amissingtrue
316155685015WrightWY43.7492-105.49516443.0390.043Amissingtrue
316165686665YoderWY41.917-104.2951310.2130.043Amissingtrue
316175686737Y-O Ranch CDPWY42.0352-104.9231722.4020.057Smissingtrue
In [9]:
st2fips("NC")   # State to FIPS code
MethodError: no method matching st2fips(::String)

Closest candidates are:
  st2fips(::Symbol)
   @ Logjam C:\Users\kay\.julia\packages\Logjam\GxF7Z\src\DataTools.jl:316


Stacktrace:
 [1] top-level scope
   @ In[9]:1
In [10]:
@doc st2fips
Out[10]:
st2fips(state::Symbol) -> Integer

Convert a two-character symbol for US states and territories to its corresponding FIPS code.

Valid two-character symbols of the state or territory are AK, AL, AR, AS, AZ, CA, CO, CT, DC, DE, FL, GA, GU, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, PR, RI, SC, SD, TN, TX, UT, VA, VI, VT, WA, WI, WV, WY

Examples¶

julia> st2fips(:NC)
37

julia> st2fips.([:NC, :NY])
2-element Vector{Int64}:
37
36

Filter U.S. place data for cities in North Carolina with populations over 100,000:

In [11]:
df = filter(r -> (r.STFIP == st2fips(:NC)) && (r.POP > 100_000), usplace())
Out[11]:
10×13 DataFrame
RowSTFIPPLFIPNAMESTLATLONPOPALANDAWATERLSADFUNCSTATCBSAISCUS
Int64Int64StringString3Float64Float64Int64Float64Float64String3String1Int64?Bool
13710740CaryNC35.7814-78.81917472159.2331.08343A39580true
23712000CharlotteNC35.209-80.831874579308.2851.97725A16740true
33714100ConcordNC35.3921-80.635510524063.4790.03525A16740true
43719000DurhamNC35.98-78.901283506112.7910.82125A20500true
53722920FayettevilleNC35.0828-78.9735208501148.2581.82825A22180true
63728000GreensboroNC36.0951-79.827299035129.5845.23625A24660true
73731400High PointNC35.9908-79.992711405956.3651.52325A24660true
83755000RaleighNC35.8302-78.6415467665147.1161.08925A39580true
93774440WilmingtonNC34.2092-77.885811545151.4051.56425A48900true
103775000Winston-SalemNC36.1029-80.2608249545132.6781.18825A49180true
In [12]:
select!(df, :NAME, :LAT, :LON, :POP)
Out[12]:
10×4 DataFrame
RowNAMELATLONPOP
StringFloat64Float64Int64
1Cary35.7814-78.819174721
2Charlotte35.209-80.831874579
3Concord35.3921-80.6355105240
4Durham35.98-78.901283506
5Fayetteville35.0828-78.9735208501
6Greensboro36.0951-79.827299035
7High Point35.9908-79.9927114059
8Raleigh35.8302-78.6415467665
9Wilmington34.2092-77.8858115451
10Winston-Salem36.1029-80.2608249545
In [13]:
dfR = filter(r -> r.NAME == "Raleigh", df)
Out[13]:
1×4 DataFrame
RowNAMELATLONPOP
StringFloat64Float64Int64
1Raleigh35.8302-78.6415467665
In [14]:
xyR = dfR[1,[:LON, :LAT]]
Out[14]:
DataFrameRow (2 columns)
RowLONLAT
Float64Float64
1-78.641535.8302
In [15]:
xyR = filter(r -> r.NAME == "Raleigh", df)[1,[:LON, :LAT]]
Out[15]:
DataFrameRow (2 columns)
RowLONLAT
Float64Float64
1-78.641535.8302
In [16]:
name2lonlat(name, df) = filter(r -> r.NAME == name, df)[1,[:LON, :LAT]]
xyC = name2lonlat("Charlotte", df)
Out[16]:
DataFrameRow (2 columns)
RowLONLAT
Float64Float64
1-80.83135.209

3. Great-Circle Distances¶

It uses Haversine's formula instead of the direct spherical trigonometry formula to avoid numerical issues.

In [17]:
function dgc(xy₁, xy₂; unit=:mi)
    length(xy₁) == length(xy₂) == 2 || error("Inputs must have length 2.")
    unit in [:mi, :km] || error("Unit must be :mi or :km")

    Δx, Δy = xy₂[1] - xy₁[1], xy₂[2] - xy₁[2]
    a = sind(Δy / 2)^2 + cosd(xy₁[2]) * cosd(xy₂[2]) * sind(Δx / 2)^2
    2 * asin(min(sqrt(a), 1.0)) * (unit == :mi ? 3958.75 : 6371.00)
end
Out[17]:
dgc (generic function with 1 method)
In [18]:
dgc(xyR, xyC)   # Great circle distance Raleigh to Charlotte
Out[18]:
130.3903764072013
In [19]:
pt = collect(zip(df.LON, df.LAT))   # lon-lats 10 largest NC cities
Out[19]:
10-element Vector{Tuple{Float64, Float64}}:
 (-78.819011, 35.78145)
 (-80.83099, 35.209045)
 (-80.63551, 35.392094)
 (-78.900976, 35.98005)
 (-78.973515, 35.082766)
 (-79.826996, 36.095148)
 (-79.992713, 35.990803)
 (-78.641493, 35.830204)
 (-77.885767, 34.209225)
 (-80.260829, 36.102863)
In [20]:
d = dgc.([xyR], pt)   # Distance Raleigh to all NC cities
Out[20]:
10-element Vector{Float64}:
  10.502099246467056
 130.3903764072013
 116.02351130294406
  17.834712647712013
  54.91952500796353
  68.77840336073692
  76.42474095411053
   0.0
 119.88324448080039
  92.492983886023
In [21]:
argmax(d)   # Find furthest city from Raleigh
Out[21]:
2
In [22]:
df[argmax(d), :NAME]
Out[22]:
"Charlotte"
In [23]:
sortperm(d)   # Find the three closest cities to Raleigh
Out[23]:
10-element Vector{Int64}:
  8
  1
  4
  5
  6
  7
 10
  3
  9
  2
In [24]:
sortperm(d)[2:4]
Out[24]:
3-element Vector{Int64}:
 1
 4
 5
In [25]:
df[sortperm(d)[2:4], :NAME]
Out[25]:
3-element Vector{String}:
 "Cary"
 "Durham"
 "Fayetteville"

4. Mercator Projection Map¶

In [26]:
using Logjam.DataTools, Logjam.MapTools
using GeoMakie, DataFrames

fig, ax = makemap()   # Create a map figure and axis
fig
Out[26]:
No description has been provided for this image
In [27]:
fig, ax = makemap(region=:CUS)   # Continental U.S. 
fig
Out[27]:
No description has been provided for this image
In [28]:
fig, ax = makemap(df.LON, df.LAT)   # Region where NC cities located
fig
Out[28]:
No description has been provided for this image
In [29]:
scatter!(ax, df.LON, df.LAT)   # Plot cities
fig
Out[29]:
No description has been provided for this image
In [30]:
text!(ax, df.LON, df.LAT, text=df.NAME; aligntext(df.LON, df.LAT)...)  # City names
fig
Out[30]:
No description has been provided for this image

5. Find population-weighted optimal NC location¶

In [31]:
df = filter(r -> (r.STFIP == st2fips(:NC)), usplace())
select!(df, :NAME, :LAT, :LON, :POP)
Out[31]:
776×4 DataFrame
751 rows omitted
RowNAMELATLONPOP
StringFloat64Float64Int64
1Albemarle35.36-80.192116432
2Anderson Creek CDP35.2638-78.958313636
3Asheville35.571-82.552794589
4Boone36.2138-81.665219092
5Brevard35.2431-82.72697744
6Burlington36.0761-79.468357303
7Cary35.7814-78.819174721
8Chapel Hill35.9259-79.038361960
9Charlotte35.209-80.831874579
10Concord35.3921-80.6355105240
11Durham35.98-78.901283506
12Elizabeth City36.294-76.235418631
13Fayetteville35.0828-78.9735208501
⋮⋮⋮⋮⋮
765Winterville35.5287-77.400110462
766Winton36.3894-76.9347629
767Woodfin35.6461-82.59447936
768Woodland36.3306-77.2149557
769Woodlawn CDP36.118-79.2959912
770Wrightsboro CDP34.2876-77.93015526
771Wrightsville Beach34.213-77.79792473
772Yadkin College CDP35.8721-80.3861377
773Yadkinville36.1316-80.65932995
774Yanceyville36.4102-79.32941937
775Youngsville36.0211-78.48732016
776Zebulon35.8322-78.3176903
In [32]:
using Optim, Statistics

pt = collect.(zip(df.LON, df.LAT))      # Array of 2-vectors
TC(xy) = sum(df.POP .* dgc.([xy], pt))
xy° = optimize(TC, mean(pt)).minimizer
Out[32]:
2-element Vector{Float64}:
 -79.70921229804314
  35.64959800439122
In [33]:
d = dgc.([xy°], pt)
println("Opt loc ", round(minimum(d)), " miles from ", df[argmin(d), :NAME])
Opt loc 6.0 miles from Franklinville
In [34]:
df.D2OPT = d
df
Out[34]:
776×5 DataFrame
751 rows omitted
RowNAMELATLONPOPD2OPT
StringFloat64Float64Int64Float64
1Albemarle35.36-80.19211643233.7353
2Anderson Creek CDP35.2638-78.95831363649.9659
3Asheville35.571-82.552794589159.814
4Boone36.2138-81.665219092116.164
5Brevard35.2431-82.72697744172.154
6Burlington36.0761-79.46835730332.411
7Cary35.7814-78.81917472150.7629
8Chapel Hill35.9259-79.03836196042.1742
9Charlotte35.209-80.83187457970.1074
10Concord35.3921-80.635510524055.0449
11Durham35.98-78.90128350650.7143
12Elizabeth City36.294-76.235418631199.271
13Fayetteville35.0828-78.973520850157.0267
⋮⋮⋮⋮⋮⋮
765Winterville35.5287-77.400110462130.01
766Winton36.3894-76.9347629163.25
767Woodfin35.6461-82.59447936161.987
768Woodland36.3306-77.2149557147.162
769Woodlawn CDP36.118-79.295991239.7802
770Wrightsboro CDP34.2876-77.93015526137.847
771Wrightsville Beach34.213-77.79792473146.874
772Yadkin College CDP35.8721-80.386137740.9442
773Yadkinville36.1316-80.6593299562.7476
774Yanceyville36.4102-79.3294193756.6766
775Youngsville36.0211-78.4873201673.1009
776Zebulon35.8322-78.317690379.0887
In [35]:
df[sortperm(d)[1:5], [:D2OPT, :POP, :NAME]]
Out[35]:
5×3 DataFrame
RowD2OPTPOPNAME
Float64Int64String
16.36281197Franklinville
26.771531774Ramseur
37.3829827156Asheboro
48.61908235Seagrove
511.0704284Bennett CDP
In [36]:
kwargs2 = (; color = :red, marker = :star5, markersize = 14)
scatter!(xy°...; kwargs2...)
fig
Out[36]:
No description has been provided for this image

6. Cities in NC and VA¶

All cities in NC and VA with a population greater than 200,000 and east of longitude -80.

In [37]:
df = filter!(r -> any(i -> r.STFIP == i, st2fips.([:NC,:VA])) &&
        r.POP > 200000 && r.LON > -80.0, usplace())
select!(df, :NAME, :LAT, :LON, :POP)
Out[37]:
9×4 DataFrame
RowNAMELATLONPOP
StringFloat64Float64Int64
1Durham35.98-78.901283506
2Fayetteville35.0828-78.9735208501
3Greensboro36.0951-79.827299035
4Raleigh35.8302-78.6415467665
5Arlington CDP38.8783-77.1007238643
6Chesapeake36.6794-76.3018249422
7Norfolk36.923-76.2446238005
8Richmond37.5314-77.476226610
9Virginia Beach36.7795-76.0291459470
In [38]:
df[!, :NAME] = replace.(df[!, :NAME], r" CDP" => "")
df
Out[38]:
9×4 DataFrame
RowNAMELATLONPOP
StringFloat64Float64Int64
1Durham35.98-78.901283506
2Fayetteville35.0828-78.9735208501
3Greensboro36.0951-79.827299035
4Raleigh35.8302-78.6415467665
5Arlington38.8783-77.1007238643
6Chesapeake36.6794-76.3018249422
7Norfolk36.923-76.2446238005
8Richmond37.5314-77.476226610
9Virginia Beach36.7795-76.0291459470
In [39]:
fig, ax = makemap(df.LON, df.LAT)

scatter!(ax, df.LON, df.LAT, markersize=10, color=:red)
text!(df.LON, df.LAT; text=df.NAME, aligntext(df.LON, df.LAT)..., fontsize=12)
fig
Out[39]:
No description has been provided for this image
In [40]:
fig, ax = makemap(df.LON, df.LAT, xexpand=0.5)

scatter!(ax, df.LON, df.LAT, markersize=10, color=:red)
text!(df.LON, df.LAT; text=df.NAME, aligntext(df.LON, df.LAT)..., fontsize=12)
fig
Out[40]:
No description has been provided for this image