2. Loc 3: Geocoding and Great-Circle Distances¶
ISE 754, Fall 2024
Package Used: Functions from the following package are used in this notebook for the first time:
DataFrames: Getting StartedGeoMakie: Geographic plots for MakieLogjam: https://github.com/mgkay/Logjam
To install Logjam, use the following command in your Julia REPL:
using Pkg
Pkg.add(url=("https://github.com/mgkay/Logjam.git")
1. DataFrames¶
The DataFrame package can be used to work with structured tabular (i.e., row-column) data. To create a single-row table:
using DataFrames
df = DataFrame(Name = "Mike", Age = 44)
| Row | Name | Age |
|---|---|---|
| String | Int64 | |
| 1 | Mike | 44 |
To add a row to the table:
push!(df,("Bill",40))
| Row | Name | Age |
|---|---|---|
| String | Int64 | |
| 1 | Mike | 44 |
| 2 | Bill | 40 |
To create a two-row table:
df = DataFrame(Name = ["Mike","Bill"], Age = [44, 40])
| Row | Name | Age |
|---|---|---|
| String | Int64 | |
| 1 | Mike | 44 |
| 2 | Bill | 40 |
To access portions of a table:
name = df.Name
2-element Vector{String}:
"Mike"
"Bill"
df.Age
2-element Vector{Int64}:
44
40
df[2,:]
| Row | Name | Age |
|---|---|---|
| String | Int64 | |
| 2 | Bill | 40 |
df[2,"Name"]
"Bill"
2. U.S. Places¶
Use Logjam to create a DataFrame with geographic and population data on U.S. places (city, town, or census-designated place (CDP)):
using Logjam.DataTools, DataFrames
df = usplace()
| Row | STFIP | PLFIP | NAME | ST | LAT | LON | POP | ALAND | AWATER | LSAD | FUNCSTAT | CBSA | ISCUS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Int64 | Int64 | String | String3 | Float64 | Float64 | Int64 | Float64 | Float64 | String3 | String1 | Int64? | Bool | |
| 1 | 1 | 988 | Albertville | AL | 34.2631 | -86.2107 | 22386 | 26.939 | 0.1 | 25 | A | 10700 | true |
| 2 | 1 | 1132 | Alexander City | AL | 32.9272 | -85.9371 | 14843 | 43.701 | 0.29 | 25 | A | 10760 | true |
| 3 | 1 | 1852 | Anniston | AL | 33.6735 | -85.8109 | 21564 | 45.825 | 0.07 | 25 | A | 11500 | true |
| 4 | 1 | 3076 | Auburn | AL | 32.6077 | -85.4895 | 76143 | 60.651 | 1.028 | 25 | A | 12220 | true |
| 5 | 1 | 7000 | Birmingham | AL | 33.5272 | -86.797 | 200733 | 147.028 | 2.522 | 25 | A | 13820 | true |
| 6 | 1 | 18976 | Cullman | AL | 34.1769 | -86.8407 | 18213 | 21.626 | 1.331 | 25 | A | 18980 | true |
| 7 | 1 | 19648 | Daphne | AL | 30.6277 | -87.884 | 27462 | 19.001 | 0.118 | 25 | A | 19300 | true |
| 8 | 1 | 20104 | Decatur | AL | 34.573 | -86.9899 | 57938 | 54.358 | 6.502 | 25 | A | 19460 | true |
| 9 | 1 | 21184 | Dothan | AL | 31.2337 | -85.4068 | 71072 | 89.827 | 0.326 | 25 | A | 20020 | true |
| 10 | 1 | 24184 | Enterprise | AL | 31.3269 | -85.844 | 28711 | 30.958 | 0.067 | 25 | A | 21460 | true |
| 11 | 1 | 24568 | Eufaula | AL | 31.9099 | -85.1527 | 12882 | 59.515 | 13.96 | 25 | A | 21640 | true |
| 12 | 1 | 25240 | Fairhope | AL | 30.5209 | -87.8791 | 22477 | 14.475 | 0.056 | 25 | A | 19300 | true |
| 13 | 1 | 26896 | Florence | AL | 34.8303 | -87.6661 | 40184 | 26.518 | 0.211 | 25 | A | 22520 | true |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 31606 | 56 | 81300 | Wamsutter | WY | 41.668 | -107.979 | 203 | 1.627 | 0.0 | 43 | A | missing | true |
| 31607 | 56 | 81640 | Warren AFB CDP | WY | 41.1471 | -104.862 | 2863 | 4.811 | 0.049 | 57 | S | missing | true |
| 31608 | 56 | 81703 | Washam CDP | WY | 41.0056 | -109.699 | 39 | 3.112 | 0.0 | 57 | S | missing | true |
| 31609 | 56 | 82967 | Westview Circle CDP | WY | 42.0616 | -105.068 | 41 | 2.3 | 0.037 | 57 | S | missing | true |
| 31610 | 56 | 83040 | Wheatland | WY | 42.0517 | -104.959 | 3588 | 4.098 | 0.0 | 43 | A | missing | true |
| 31611 | 56 | 83100 | Whiting CDP | WY | 42.0089 | -104.971 | 94 | 1.159 | 0.0 | 57 | S | missing | true |
| 31612 | 56 | 83765 | Wilson CDP | WY | 43.4809 | -110.921 | 1567 | 23.024 | 0.416 | 57 | S | missing | true |
| 31613 | 56 | 84852 | Woods Landing-Jelm CDP | WY | 41.1015 | -106.029 | 118 | 16.705 | 0.0 | 57 | S | missing | true |
| 31614 | 56 | 84925 | Worland | WY | 44.0204 | -107.962 | 4773 | 4.588 | 0.054 | 25 | A | missing | true |
| 31615 | 56 | 85015 | Wright | WY | 43.7492 | -105.495 | 1644 | 3.039 | 0.0 | 43 | A | missing | true |
| 31616 | 56 | 86665 | Yoder | WY | 41.917 | -104.295 | 131 | 0.213 | 0.0 | 43 | A | missing | true |
| 31617 | 56 | 86737 | Y-O Ranch CDP | WY | 42.0352 | -104.923 | 172 | 2.402 | 0.0 | 57 | S | missing | true |
st2fips("NC") # State to FIPS code
MethodError: no method matching st2fips(::String) Closest candidates are: st2fips(::Symbol) @ Logjam C:\Users\kay\.julia\packages\Logjam\GxF7Z\src\DataTools.jl:316 Stacktrace: [1] top-level scope @ In[9]:1
@doc st2fips
st2fips(state::Symbol) -> Integer
Convert a two-character symbol for US states and territories to its corresponding FIPS code.
Valid two-character symbols of the state or territory are AK, AL, AR, AS, AZ, CA, CO, CT, DC, DE, FL, GA, GU, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, PR, RI, SC, SD, TN, TX, UT, VA, VI, VT, WA, WI, WV, WY
Examples¶
julia> st2fips(:NC)
37
julia> st2fips.([:NC, :NY])
2-element Vector{Int64}:
37
36
Filter U.S. place data for cities in North Carolina with populations over 100,000:
df = filter(r -> (r.STFIP == st2fips(:NC)) && (r.POP > 100_000), usplace())
| Row | STFIP | PLFIP | NAME | ST | LAT | LON | POP | ALAND | AWATER | LSAD | FUNCSTAT | CBSA | ISCUS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Int64 | Int64 | String | String3 | Float64 | Float64 | Int64 | Float64 | Float64 | String3 | String1 | Int64? | Bool | |
| 1 | 37 | 10740 | Cary | NC | 35.7814 | -78.819 | 174721 | 59.233 | 1.083 | 43 | A | 39580 | true |
| 2 | 37 | 12000 | Charlotte | NC | 35.209 | -80.831 | 874579 | 308.285 | 1.977 | 25 | A | 16740 | true |
| 3 | 37 | 14100 | Concord | NC | 35.3921 | -80.6355 | 105240 | 63.479 | 0.035 | 25 | A | 16740 | true |
| 4 | 37 | 19000 | Durham | NC | 35.98 | -78.901 | 283506 | 112.791 | 0.821 | 25 | A | 20500 | true |
| 5 | 37 | 22920 | Fayetteville | NC | 35.0828 | -78.9735 | 208501 | 148.258 | 1.828 | 25 | A | 22180 | true |
| 6 | 37 | 28000 | Greensboro | NC | 36.0951 | -79.827 | 299035 | 129.584 | 5.236 | 25 | A | 24660 | true |
| 7 | 37 | 31400 | High Point | NC | 35.9908 | -79.9927 | 114059 | 56.365 | 1.523 | 25 | A | 24660 | true |
| 8 | 37 | 55000 | Raleigh | NC | 35.8302 | -78.6415 | 467665 | 147.116 | 1.089 | 25 | A | 39580 | true |
| 9 | 37 | 74440 | Wilmington | NC | 34.2092 | -77.8858 | 115451 | 51.405 | 1.564 | 25 | A | 48900 | true |
| 10 | 37 | 75000 | Winston-Salem | NC | 36.1029 | -80.2608 | 249545 | 132.678 | 1.188 | 25 | A | 49180 | true |
select!(df, :NAME, :LAT, :LON, :POP)
| Row | NAME | LAT | LON | POP |
|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | |
| 1 | Cary | 35.7814 | -78.819 | 174721 |
| 2 | Charlotte | 35.209 | -80.831 | 874579 |
| 3 | Concord | 35.3921 | -80.6355 | 105240 |
| 4 | Durham | 35.98 | -78.901 | 283506 |
| 5 | Fayetteville | 35.0828 | -78.9735 | 208501 |
| 6 | Greensboro | 36.0951 | -79.827 | 299035 |
| 7 | High Point | 35.9908 | -79.9927 | 114059 |
| 8 | Raleigh | 35.8302 | -78.6415 | 467665 |
| 9 | Wilmington | 34.2092 | -77.8858 | 115451 |
| 10 | Winston-Salem | 36.1029 | -80.2608 | 249545 |
dfR = filter(r -> r.NAME == "Raleigh", df)
| Row | NAME | LAT | LON | POP |
|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | |
| 1 | Raleigh | 35.8302 | -78.6415 | 467665 |
xyR = dfR[1,[:LON, :LAT]]
| Row | LON | LAT |
|---|---|---|
| Float64 | Float64 | |
| 1 | -78.6415 | 35.8302 |
xyR = filter(r -> r.NAME == "Raleigh", df)[1,[:LON, :LAT]]
| Row | LON | LAT |
|---|---|---|
| Float64 | Float64 | |
| 1 | -78.6415 | 35.8302 |
name2lonlat(name, df) = filter(r -> r.NAME == name, df)[1,[:LON, :LAT]]
xyC = name2lonlat("Charlotte", df)
| Row | LON | LAT |
|---|---|---|
| Float64 | Float64 | |
| 1 | -80.831 | 35.209 |
3. Great-Circle Distances¶
It uses Haversine's formula instead of the direct spherical trigonometry formula to avoid numerical issues.
function dgc(xy₁, xy₂; unit=:mi)
length(xy₁) == length(xy₂) == 2 || error("Inputs must have length 2.")
unit in [:mi, :km] || error("Unit must be :mi or :km")
Δx, Δy = xy₂[1] - xy₁[1], xy₂[2] - xy₁[2]
a = sind(Δy / 2)^2 + cosd(xy₁[2]) * cosd(xy₂[2]) * sind(Δx / 2)^2
2 * asin(min(sqrt(a), 1.0)) * (unit == :mi ? 3958.75 : 6371.00)
end
dgc (generic function with 1 method)
dgc(xyR, xyC) # Great circle distance Raleigh to Charlotte
130.3903764072013
pt = collect(zip(df.LON, df.LAT)) # lon-lats 10 largest NC cities
10-element Vector{Tuple{Float64, Float64}}:
(-78.819011, 35.78145)
(-80.83099, 35.209045)
(-80.63551, 35.392094)
(-78.900976, 35.98005)
(-78.973515, 35.082766)
(-79.826996, 36.095148)
(-79.992713, 35.990803)
(-78.641493, 35.830204)
(-77.885767, 34.209225)
(-80.260829, 36.102863)
d = dgc.([xyR], pt) # Distance Raleigh to all NC cities
10-element Vector{Float64}:
10.502099246467056
130.3903764072013
116.02351130294406
17.834712647712013
54.91952500796353
68.77840336073692
76.42474095411053
0.0
119.88324448080039
92.492983886023
argmax(d) # Find furthest city from Raleigh
2
df[argmax(d), :NAME]
"Charlotte"
sortperm(d) # Find the three closest cities to Raleigh
10-element Vector{Int64}:
8
1
4
5
6
7
10
3
9
2
sortperm(d)[2:4]
3-element Vector{Int64}:
1
4
5
df[sortperm(d)[2:4], :NAME]
3-element Vector{String}:
"Cary"
"Durham"
"Fayetteville"
4. Mercator Projection Map¶
using Logjam.DataTools, Logjam.MapTools
using GeoMakie, DataFrames
fig, ax = makemap() # Create a map figure and axis
fig
fig, ax = makemap(region=:CUS) # Continental U.S.
fig
fig, ax = makemap(df.LON, df.LAT) # Region where NC cities located
fig
scatter!(ax, df.LON, df.LAT) # Plot cities
fig
text!(ax, df.LON, df.LAT, text=df.NAME; aligntext(df.LON, df.LAT)...) # City names
fig
5. Find population-weighted optimal NC location¶
df = filter(r -> (r.STFIP == st2fips(:NC)), usplace())
select!(df, :NAME, :LAT, :LON, :POP)
| Row | NAME | LAT | LON | POP |
|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | |
| 1 | Albemarle | 35.36 | -80.1921 | 16432 |
| 2 | Anderson Creek CDP | 35.2638 | -78.9583 | 13636 |
| 3 | Asheville | 35.571 | -82.5527 | 94589 |
| 4 | Boone | 36.2138 | -81.6652 | 19092 |
| 5 | Brevard | 35.2431 | -82.7269 | 7744 |
| 6 | Burlington | 36.0761 | -79.4683 | 57303 |
| 7 | Cary | 35.7814 | -78.819 | 174721 |
| 8 | Chapel Hill | 35.9259 | -79.0383 | 61960 |
| 9 | Charlotte | 35.209 | -80.831 | 874579 |
| 10 | Concord | 35.3921 | -80.6355 | 105240 |
| 11 | Durham | 35.98 | -78.901 | 283506 |
| 12 | Elizabeth City | 36.294 | -76.2354 | 18631 |
| 13 | Fayetteville | 35.0828 | -78.9735 | 208501 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 765 | Winterville | 35.5287 | -77.4001 | 10462 |
| 766 | Winton | 36.3894 | -76.9347 | 629 |
| 767 | Woodfin | 35.6461 | -82.5944 | 7936 |
| 768 | Woodland | 36.3306 | -77.2149 | 557 |
| 769 | Woodlawn CDP | 36.118 | -79.2959 | 912 |
| 770 | Wrightsboro CDP | 34.2876 | -77.9301 | 5526 |
| 771 | Wrightsville Beach | 34.213 | -77.7979 | 2473 |
| 772 | Yadkin College CDP | 35.8721 | -80.3861 | 377 |
| 773 | Yadkinville | 36.1316 | -80.6593 | 2995 |
| 774 | Yanceyville | 36.4102 | -79.3294 | 1937 |
| 775 | Youngsville | 36.0211 | -78.4873 | 2016 |
| 776 | Zebulon | 35.8322 | -78.317 | 6903 |
using Optim, Statistics
pt = collect.(zip(df.LON, df.LAT)) # Array of 2-vectors
TC(xy) = sum(df.POP .* dgc.([xy], pt))
xy° = optimize(TC, mean(pt)).minimizer
2-element Vector{Float64}:
-79.70921229804314
35.64959800439122
d = dgc.([xy°], pt)
println("Opt loc ", round(minimum(d)), " miles from ", df[argmin(d), :NAME])
Opt loc 6.0 miles from Franklinville
df.D2OPT = d
df
| Row | NAME | LAT | LON | POP | D2OPT |
|---|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | Float64 | |
| 1 | Albemarle | 35.36 | -80.1921 | 16432 | 33.7353 |
| 2 | Anderson Creek CDP | 35.2638 | -78.9583 | 13636 | 49.9659 |
| 3 | Asheville | 35.571 | -82.5527 | 94589 | 159.814 |
| 4 | Boone | 36.2138 | -81.6652 | 19092 | 116.164 |
| 5 | Brevard | 35.2431 | -82.7269 | 7744 | 172.154 |
| 6 | Burlington | 36.0761 | -79.4683 | 57303 | 32.411 |
| 7 | Cary | 35.7814 | -78.819 | 174721 | 50.7629 |
| 8 | Chapel Hill | 35.9259 | -79.0383 | 61960 | 42.1742 |
| 9 | Charlotte | 35.209 | -80.831 | 874579 | 70.1074 |
| 10 | Concord | 35.3921 | -80.6355 | 105240 | 55.0449 |
| 11 | Durham | 35.98 | -78.901 | 283506 | 50.7143 |
| 12 | Elizabeth City | 36.294 | -76.2354 | 18631 | 199.271 |
| 13 | Fayetteville | 35.0828 | -78.9735 | 208501 | 57.0267 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 765 | Winterville | 35.5287 | -77.4001 | 10462 | 130.01 |
| 766 | Winton | 36.3894 | -76.9347 | 629 | 163.25 |
| 767 | Woodfin | 35.6461 | -82.5944 | 7936 | 161.987 |
| 768 | Woodland | 36.3306 | -77.2149 | 557 | 147.162 |
| 769 | Woodlawn CDP | 36.118 | -79.2959 | 912 | 39.7802 |
| 770 | Wrightsboro CDP | 34.2876 | -77.9301 | 5526 | 137.847 |
| 771 | Wrightsville Beach | 34.213 | -77.7979 | 2473 | 146.874 |
| 772 | Yadkin College CDP | 35.8721 | -80.3861 | 377 | 40.9442 |
| 773 | Yadkinville | 36.1316 | -80.6593 | 2995 | 62.7476 |
| 774 | Yanceyville | 36.4102 | -79.3294 | 1937 | 56.6766 |
| 775 | Youngsville | 36.0211 | -78.4873 | 2016 | 73.1009 |
| 776 | Zebulon | 35.8322 | -78.317 | 6903 | 79.0887 |
df[sortperm(d)[1:5], [:D2OPT, :POP, :NAME]]
| Row | D2OPT | POP | NAME |
|---|---|---|---|
| Float64 | Int64 | String | |
| 1 | 6.3628 | 1197 | Franklinville |
| 2 | 6.77153 | 1774 | Ramseur |
| 3 | 7.38298 | 27156 | Asheboro |
| 4 | 8.61908 | 235 | Seagrove |
| 5 | 11.0704 | 284 | Bennett CDP |
kwargs2 = (; color = :red, marker = :star5, markersize = 14)
scatter!(xy°...; kwargs2...)
fig
6. Cities in NC and VA¶
All cities in NC and VA with a population greater than 200,000 and east of longitude -80.
df = filter!(r -> any(i -> r.STFIP == i, st2fips.([:NC,:VA])) &&
r.POP > 200000 && r.LON > -80.0, usplace())
select!(df, :NAME, :LAT, :LON, :POP)
| Row | NAME | LAT | LON | POP |
|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | |
| 1 | Durham | 35.98 | -78.901 | 283506 |
| 2 | Fayetteville | 35.0828 | -78.9735 | 208501 |
| 3 | Greensboro | 36.0951 | -79.827 | 299035 |
| 4 | Raleigh | 35.8302 | -78.6415 | 467665 |
| 5 | Arlington CDP | 38.8783 | -77.1007 | 238643 |
| 6 | Chesapeake | 36.6794 | -76.3018 | 249422 |
| 7 | Norfolk | 36.923 | -76.2446 | 238005 |
| 8 | Richmond | 37.5314 | -77.476 | 226610 |
| 9 | Virginia Beach | 36.7795 | -76.0291 | 459470 |
df[!, :NAME] = replace.(df[!, :NAME], r" CDP" => "")
df
| Row | NAME | LAT | LON | POP |
|---|---|---|---|---|
| String | Float64 | Float64 | Int64 | |
| 1 | Durham | 35.98 | -78.901 | 283506 |
| 2 | Fayetteville | 35.0828 | -78.9735 | 208501 |
| 3 | Greensboro | 36.0951 | -79.827 | 299035 |
| 4 | Raleigh | 35.8302 | -78.6415 | 467665 |
| 5 | Arlington | 38.8783 | -77.1007 | 238643 |
| 6 | Chesapeake | 36.6794 | -76.3018 | 249422 |
| 7 | Norfolk | 36.923 | -76.2446 | 238005 |
| 8 | Richmond | 37.5314 | -77.476 | 226610 |
| 9 | Virginia Beach | 36.7795 | -76.0291 | 459470 |
fig, ax = makemap(df.LON, df.LAT)
scatter!(ax, df.LON, df.LAT, markersize=10, color=:red)
text!(df.LON, df.LAT; text=df.NAME, aligntext(df.LON, df.LAT)..., fontsize=12)
fig
fig, ax = makemap(df.LON, df.LAT, xexpand=0.5)
scatter!(ax, df.LON, df.LAT, markersize=10, color=:red)
text!(df.LON, df.LAT; text=df.NAME, aligntext(df.LON, df.LAT)..., fontsize=12)
fig