Basic Networking & Infastructure

Everything you need to know to understand the steps from you typing a URL in your browser to seeing a webpage

Structure of a web app

Traditional Structure

There are two parts of an app, the front end and the back end

Front end

This is the part people see, in a web app it's the HTML/CSS/JS that people actually see in their browsers when they go to your site

Backend

This is the name given to everything else. Everything that does data processing and delivering of the front-end and any data the front end needs

Traditional Structure example

User requests webpage --> Frontend is delievered through backend --> Frontend gets data from backend and shows user page

The problem with this view

The backend has way too many peices in this view, and it's responsible for multiple jobs. If you want to talk about an app saying it's a problem with the'backend' is not helpful information

A more accurate Structure

A more accurate structure is that there's a front end, a back end and then infastructure in-between the client and the app/site

Infastructure

Infastructure is the name given to the peices that deliver the data to a user. It's also the name for peices that deliver the data from the frontend to the backend, but aren't involved in any of the processing.

Why Split out infastructure?

This makes it a bit easier to intuitively tell which peices go where, and allows you to isolate which part of an app/site you're talking about more easily, for example this means if the program running an app/site is running, but people outside the building can't see it it's an infastructure problem, not a backend problem.

A more accurate Structure example

User requests a webpage --> Goes through infastructure to retrieve front-end --> Front end goes back to infastructure to get data from backend --> User see's page with all content

So how does infastructure work?

For this presentation we don't care about the frontend or backend. As far as we're concerned there's a working server with all the webpages built and we're just hooking it up so people can access it outside the developers computer

Let's start with some terms

What is a browser?

A browser lets you connect to a server over a network and then visualizes the response it recieves back (HTML/CSS/JS, images, pdf's etc.)

What is a network?

A network is a collection of computers that are connected together and can communicate with one another. Generally speaking the internet is a massive network of many computers connected together

What is a server/host vs client?

The server or host is the computer that is SENDING responses to the client (i.e. when you go to google, the computer that gives you the page is the server and google is the host)

The client is whatever is RECIEVING the responses from the server (i.e. the person trying to access google and their browser)

Technically speaking both groups do both actions, but overall the goal of the client is to recieve from the server and overall the goal of the server is to send to the client

What is a URL?

A URL is what you type in a browser to get a webpage, for example https://schulichignite.com/beginner it has multiple parts and follows the form:

https://schulichignite.com/beginner
$Protocol://$domain.$tld/$slug

*Anything in angle brackets or starting with a $ is a variable

What is a slug?

The slug is the bit at the end, so in https://schulichignite.com/beginner the /beginner is the slug. This is used to specify what you are looking for on the server. One site can have many pages so a slug is how the server knows what to look for

Slugs used to just be file paths, so for example you would have https://schulichignite.com/beginner.html, which the server would just look for the beginner.html file and send it

What is a tld?

The TLD (Top level domain) is the .com in https://schulichignite.com/beginner. The difference is that some TLD's have rules, for example .ca domains require you to provide your address and name, and the .io domain requires certain security standards etc.

What is a protocol?

We'll come back to this in a bit, on the web this is basically always http:// or https://, if you're loading a file it will be file://$filepath

What is a domain?

A domain is essentially a name you purchase. You buy a domain name (which includes A tld not all tld's), and that indicates you own it and can do what you want with it. For example schulichignite.com (this is also sometimes called a FQDN (fully qualified domain name)

How does your browser know who owns a domain?

This is where a domain name registrar comes in. You pay them to validate that you own a domaain

Common Domain Name Registrars

These are a dime a dozen but in Canada I would recommend either GoDaddy, webnames or namecheap (remember you have to renew your ownership or other people can take your domain)

Great so we own the domain, but how do people talk to it?

Well they need to request a URL, and we need to give them a response

HTTP

hyper-text transfer protocol; This is what defines how two computers structure their data to talk to each other (it's the http or https at the beginning of a URL)

HTTP Diagram

Imagine bobby wants to get the shulich ignite beginner sessions page (https://schulichignite.com/beginner)

This whole interaction and how it works is defined by the HTTP protocol

HTTP Headers

HTTP requests and responses both have headers. Headers are key-value pairs that tell the client & host the details they need to interact properly

Mail analogy

Imagine you're sending mail, you need 3 things

Your address
Their Address
The content of the letter

That is what HTTP needs

Address the communication was sent from
Address it's going to
The Content (text, html, image etc.)

Additionally like how you would add FRAGILE to mail to tell the post service how to handle your message, headers can also contain information about how to handle the request/response

How to examine headers in browser

First open your browser dev tools, go to the network tab and then either refresh or navigate to a webpage

HTTP Requests

The first part of HTTP communication is to request a URL

HTTP Request Example

We're going to request https://schulichignite.com/beginner

GET /beginner HTTP/1.1
Host: schulichignite.com
User-Agent:... Removed for readability

HTTP Request types (GET/HEAD)

A GET request asks for a response that includes a header and some content, a HEAD request asks for just a header response without any content

HTTP Request types (POST)

A POST request POSTs some content to a server. This is usually done for forms to submit the data

HTTP Request types (PUT)

A PUT request tells the server to PUT whatever is being sent at the location specified. This might be used on a photo sharing service to indicate replacing a photo with a new version.

HTTP Request types (DELETE)

A DELETE request is used to ask a server to DELETE a resource (i.e. a photo)

Other HTTP Request types

This link contains more request types

Breaking down the headers

This won't cover everything, just the things that are useful

First line

The first line tells the browser the request type, slug, protocol and version

GET /beginner HTTP/1.1

Host

This tells the server & browser the domain you're trying to access

Host: schulichignite.com

User Agent

This tells the server information about your browser. This is used by developers often to patch compatability if a browser doesn't support a feature (usually REALLY long)

User-Agent: Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Mobile Safari/537.36 Edg/103.0.1264.37

HTTP Response

The response is comprised of a header, and the response content (the HTML or file contents of the resource) Example

HTTP/1.x 200 OK
Content-Type: text/html; charset=UTF-8
Cache-Control: max-age=3600, public
Content-Encoding: gzip

<title> Schulich Ignite </title><!--More HTML here-->

HTTP Response header example

Like requests the response headers provide additional details. Usually about the type of content being sent and other information

HTTP/1.x 200 OK
Content-Type: text/html; charset=UTF-8
Cache-Control: max-age=3600, public
Content-Encoding: gzip

Breaking down HTTP Response header

Again this isn't everything, just things that are often useful

First line

First line is always the response protocol, version and status code

HTTP/1.x 200 OK

HTTP Response status codes 2xx

These codes means everything is working as intended. Examples:

200 OK; Successful response
201 Created; A new file was created successfully
202 Accepted; The request has been accepted, but isn't done yet

HTTP Response status codes 3xx

These codes are called redirect codes. Examples:

300 Multiple choices; Means there are a few options (like various image or file formats the browser can choose from)
301 Moved permanantly; This request and any future requests should look to a provided URL instead of the one they requested
302 Moved Temporarily; Temporarily go to a different provided URL, but next time check back here again

HTTP Response status codes 4xx

These are Client error codes (meaning you made a mistake). Examples:

400 Bad Request; You broke something, probably a syntax error or request size is too big
403 Forbidden; You need permissions the server thinks you don't have
404 Not found; The slug you provided has no valid response on the server
418 I'm a teapot; :)

HTTP Response status codes 5xx

These are server errors (meaning the host made a mistake)

500 Internal Server Error; Something broke, but the server won't give specifics (like a web segfault)
502 Bad Gateway; Server couldn't get a valid response when trying to generate one from another server's data
503 Service Unavailable; Server is borked and completely down (hopefully it's not yours)

Other HTTP Response codes

Wikipedia link

Content-Type

This is used to tell the browser the type of info you recieved. Not everything we do is going to have an html file as a response (can have images, pdfs, file downloads etc.). This will have the file's MIME type (type of file it is) & encoding (the type of characters it supports)

Content-Type: text/html; charset=UTF-8

List of common MIME types Common encodings

Cache Control

The cache control sets how long a page should be considered 'fresh' by the browser before asking the server to update it

Cache-Control: max-age=3600, public

Details about settings

Content Encoding

Web pages aren't usually sent as plaintext that is human readable. They often use an encoding to save space. This header tells you which encoding is used so you know how to decode it

Content-Encoding: gzip

More details

Great, so we know about domains & HTTP let's go

So we should be able to just connect to the domain and send HTTP requests and get responses, right?

Public IP addresses and ports

HTTP connects to the PUBLIC IP address and port of a computer, not the domain!

Let's say that a computer is like an apartment building. You can think of the IP address as the coordinates to the building (computer), and the port as which apartment to go to in the building (computer)

192.0. 2.146:80
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:443

How does a browser know a server's PUBLIC IP address and port based off the domain?

The domain name registrar tells us who owns it, but once you own the domain you need to configure it. That's where DNS comes in!

But wait, there's another server involved

Nameservers are the in-between step where your domain name registrar tells your browser where to go to find your DNS server and records

So to recap at this point

We have made a request to a domain, we then check with the domain name registrar for the nameservers to find the DNS. Now we need to know how the DNS works

DNS is made up of records

Your domain is used for lots of things, so DNS has lots of different types of records to handle those situations. You can think of records like a contact list. It lets you look up where to communicate with a server.

Common DNS Record types

Here are the most common DNS records you should know about. Please note the records are listed in the following format

$record_type $domain_name $content

also keep in mind @ in place of domain name is self-referential (i.e. using @ for dns of schulichignite.com would be the same as typing schulichignite.com)

Alias records (A or AAAA)

There are two types of alias records A or AAAA, these map a domain to an IP. A records for IPV4 adresses and AAAA for IPV6

These are the primary records used to tell the browser which server to look for to get a site

A @ 185.199.111.153
AAAA @ 2001:0db8:85a3:0000:0000:8a2e:0370:7334

Canonical Name Records (CNAME)

These are used to set a domain to another domain. For example this would allow you to have www.schulichignite.com which maps to the same content as schulichignite.com, or spark.schulichignite.com map to the content of schulich-ignite.github.io

CNAME spark schulich-ignite.github.io

Mail Exchanger Record (MX)

These are records used to setup a SMTP (email) server

MX shuclichignite.com mail.domain.com

TXT Records

These are just plain text, they are often used to verify to third party services (like google) that you own a domain

TXT @ $verification_code

NS Records

These records tell you which nameserver is used for a domain

NS @ $nameserver_domain

The rest

Wikipedia page of all DNS records

DNS providors

Sometimes your domain name registrar will provide DNS. It is usually better to go with a dedicated DNS, I highly recommend

cloudflare

Checking DNS records

There are some online tools:

dnschecker

linux also includes a tool typically called dig, you can install this on windows by installing BIND

I also wrote a python tool you can use:

sws

Putting it all together

Let's go through a full example one step at a time

Step 1

Someone types https://schulichignite.com/beginner/ into their browser and hit's enter

Step 2

Browser pings the domain name registrar the domain was bought with and determines the nameservers

Step 3

Now the browser has the nameservers, since we want the site we're looking for an A, AAAA or CNAME record (in this case it's an A record pointing to github pages)

Step 4

The HTTP request is then passed to the IP address and port listed

Step 5

The site is hosted with github pages, which recieves the request and looks for the corresponding HTML page for the /beginner slug

Step 6

The browser recieves an HTTP response with a 200 status code that has the content of the webpage requested (additional HTTP requests will be made for assets in the HTML file like images and external CSS files)

Extra stuff

Here are some extra concepts that are useful to know

Subdomains

Along with your regular domains you can have subdomains, these allow you to have records that are controlled by the same DNS but point to different servers (i.e. https://spark.schulichignite.com/ vs https://schulichignite.com/)

We said before the protocol is called HTTP, then why do most URL's have HTTPS?

HTTPS(ecure) is an addition to HTTP that makes it safer to use. It adds in what is called an SSL (secure socket layer) certificate (or TLS) which encrypts web traffic. Most browsers REQUIRE a valid SSL certificate in order for people to access your site.

CDN's

These are servers that operate at the infastructure level, they store resources and cache them so you don't have to go to the server to retrieve them. For example if you have a CDN for all your images then when your pages load you reduce the stress on your servers because it doesn't need to handle those requests.

Having CDN versions of CSS and JS files is also very common instead of keeping all your source files on one server

Ok but then how does HTTP REALLY communicate under the hood

I'm going to cover this very broadly, essentially it relies on something called websockets. Websockets (or just sockets) allow your computer to communicate to ther computers over a network. All HTTP requests and responses get passed over sockets (as well as other protocols)

Example socket-based HTTP server integrated with ezcv

Socket servers/hosts have 4 steps

You can look into these with more detail yourself

Set socket options

You need to set the socket options such as what type of IP address (IPV4 or IPV6) to use, what connection type (TCP, UDP etc.) and any additional options

Code example

Bind the socket

Binding a socket means you set the INTERNAL IP address (0.0.0.0 is the easiest since it allows anything with network access to talk to it) you want to use as well as the port (80 for unencrypted and 443 for encrypted is standard)

Code Example

Ask the socket to listen

Does what it says on the tin, it sets the server to wait for a request to come in (HTTP Requests in our case)

Code example (also line 38)

Process client request and send response

Take in the client request, determine the content and headers you want to send back

Code example

Some handy tools/commands you can use

(ifconfig/ip)/ipconfig

Gives you the information about the LOCAL IP of your machine (for PUBLIC IP you need to google it or have it provided)

ping

ping can be used to check if a server is running you can use a domain or IP

ping shulichignite.com
ping 8.8.8.8

netcat/telnet

Netcat/telnet are used for many things, but one thing both can be used for is connecting to a server and sending requests and getting responses (example is just connecting)

linux

nc -l port

windows

telnet hostname port