1. LON-CAPA Logo
  2. Help
  3. Log In
 

Browsing resource, all submissions are temporary.


Searching for Two Cities Farthest Apart in a Dataset

This problem requires you to write a short code to find the distance between two cities that are farthest apart in a data file containing 100 cities. Read this page and check your code. If your code passes all the tests mentioned there, you can be sure that you will get a perfect score.


Rounding Instruction: For all Lon Capa problems, when you are asked to round a number to e.g. 4 significant figures, it means you should round it to at least 4 significant figures. For example, if the correct answer is 314.159265358979 and you are asked to round it to 4 significant figures. Numbers such as 314.2, 314.16, 314.159, 314.159265358979 will be marked correct. Even 314.17 will be accepted since it's the same as 314.2 when rounded to 4 significant figures. If you are asked to round the number to 4 decimal places, 314.1593, 314.15927, 314.159265, ... etc will be accepted, but 314.159 will be marked wrong.


Finding the distances between points is common in computer science and in statistics/data sciences too. This kind of problem arises in some clustering algorithms, for instance, and in genomics applications. Here, we will look at the common example of finding distances between cities, as it is easier to describe than, say, finding distances between DNA strands.

The position of a point on Earth's surface can be specified by two coordinates: longitude λ and latitude φ. It can be shown, from spherical trigonometry or vector algebra, that the distance between two cities can be calculated by the formula

,

where R = 6371 km is Earth's radius, λ1, φ1 are the longitude and latitude of the first city, and λ2, φ2 are the longitude and latitude of the second city. Here distance is defined as the length of the shortest path between the two cities on Earth's surface. It is the minimum distance an airplane has to travel between the two cities.

We adopt a convention that a positive (negative) longitude means that the city is to the east (west) of Greenwich; a positive (negative) latitude means that the city is in the northern (southern) hemisphere.

In R, sin() and cos() are the sine and cosine functions. The inverse cosine function cos-1 is acos(). Note that all angles in the sin(), cos() and acos() functions are in radians. For example, acos(0) returns 1.570796 (radians), which is π/2. The formula above also requires cos-1 returning an angle in radians. However, the data you will be analyzing have the longitude and latitude in degrees. You will have to convert them to radians before using the sin() and cos() functions. Recall that 1° = π/180 radians.

a. (2 points) The longitude and latitude of Urbana, IL, USA are (in degrees) λUrbana = -88.20418746° and φUrbana = 40.10999229°. The longitude and latitude of Stockholm, Sweden are λStockholm = 18.09733473° and φStockholm = 59.35075995°. Calculate the distance between Urbana and Stockholm. Give your answer to 4 significant figures.

Note: The most tricky part is the conversion between radians and degrees. Do the calculation first with R, and then with a calculator. Make sure you get the same answer. In R, you need to convert degrees to radians. In a calculator, you can enter angles in degrees for the sin() and cos() functions, but the cos-1 function probably returns an angle in degrees, which you will need to convert to radians. Make sure you've checked that your calculation of distances are correct for the cities on this page.

Distance between Urbana and Stockholm = km
 Tries 0/5

The longitudes and latitudes of major cities around the world can be found on the internet. This csv file contains the information for 100 cities around the world, which are randomly selected from the data set downloaded from simplemaps. The 100 selected cities are plotted as red points on the following map.

100 cities plotted on a world map

You can load the data directly to R using the following command.

cities <- read.csv("https://ytliu0.github.io/Stat390EF-R-Independent-Study-archive/data/world_cities/cities819.csv", as.is=TRUE)

The "as.is=TRUE" option is to tell R not to convert strings to factors. The column names are self-explanatory. The longitude and latitude are given in degrees.

Write a code to find two cities that are farthest apart in the data set. Calculate the distance and identify the pair of cities and countries.

Hint: There are more than one way to carry out the calculation. One possible way is to calculate dij, the distance between city i and city j for all pairs (i,j) to find the maximum. Note that dji = dij, so you will need to construct two nested for-loops: loop over i from 1 to 99 and j from i+1 to 100. There are 100×99/2 = 4950 pairs of cities to loop over. Then update the maximum distance and the corresponding pair of cities inside the loop. The answer is obtained after the loop is over. Here are the step-by-step instructions if you want to follow this approach.

  1. Set n <- nrow(cities) (number of cities in the data file) and dmax <- 0.
  2. Loop over i from 1 to (n-1), and loop over j from (i+1) to n.
  3. Inside the nested for-loops, calculate dij, the distance between city i and city j; if dij > dmax, update dmax <- dij and set pair <- cities[c(i,j), c("city","country")].
  4. When the loops exit, the value stored in dmax is the maximum distance. The corresponding pair of cities and countries are stored in the data frame pair.

Note: The command 1:n-1 is not the same as 1:(n-1). 1:n-1 means (1:n)-1, which is a vectorized operation returning c(0,1,2,...,n-1) (try, e.g. 1:10-1 vs 1:(10-1) and you'll see). Similarly, i+1:n is not the same as (i+1):n (try, e.g., 5+1:10 vs (5+1):10 and you'll see).

b. (10 points) What is the distance between the two cities that are farthest apart in the data set? Enter your answer to 4 significant figures.

Distance = km
 Tries 0/5

Enter the corresponding pair of cities and countries. Enter the names exactly as in the R output without quotes. The names are case sensitive but the order is not important, i.e. you can interchange city 1 and 2.

City 1:           country:

City 2:           country:

 Tries 0/5