Thursday, November 17, 2016

HTTP Protocol


 HTTP Protocol





1.   Introduction

Before we dive into understanding about HTTP, let’s try to understand meaning of the word “Protocol”.
A Protocol is a set of rules that we use for specific purposes. In the current scenario, when we are talking about protocols, it is about communication- the way we talk to each other. For instance, we speak in English and because you understand English, you can understand. Here English is the protocol. The moment we start speaking in a language that you don’t understand; the protocol beats its purpose. Thus, we need both the parties to agree to a set of rules for the communication to take place.
Now, talking about the web, multiple protocols are used to communicate. Primarily for end users the most important and visible protocols are HTTP and HTTPS. Though there are many other protocols as well, but HTTP and HTTPS protocols cater to most of the population.

2.   Now, what does HTTP mean?

HTTP is hypertext transfer protocol. As we all know, computers work in a language of 1’s and 0’s i.e. Binary language.
Let’s say I want to write ‘a’. Now, if 0 stands for ‘a’, 1 stands for ‘b’, and 01 stands for ‘c’, I can infer that a combination of 0’s and 1’s can construct a word as well. In this case, the text is already constructed and is being sent on the wire. Here, what is being transferred is text (in form of bytes). I am emphasising on ‘text’ because this text is interpreted by the browser and the moment browser interprets it, it becomes hypertext, and the protocol that transfers the text is referred to as hypertext transfer protocol – HTTP.
NOTE: Hyper is also a prefix, from the Greek hyper-, meaning over, above, or excessive, used in such terms as hypertext (text that extends to point to or include other text).
3.   HTTP Overview

Basically, HTTP is a TCP/IP based communication protocol, that is used to deliver data (HTML files, image files, query results, etc.) on the World Wide Web. This is an Application Layer protocol. The default port is TCP 80, but other ports can be used as well. It provides a standardized way for computers to communicate with each other. HTTP specification specifies how clients' request data will be constructed and sent to the server, and how the servers respond to these requests.

 

Basic Features

There are three basic features that make HTTP a simple but powerful protocol:
·        HTTP is connectionless*: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the client disconnects from the server and waits for a response.
·        HTTP is media independent: It means, any type of data can be sent by HTTP if both the client and the server know how to handle the data content. It is required for the client as well as the server to specify the content type using appropriate MIME-type.
·        HTTP is stateless: As mentioned above, HTTP is connectionless and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request. Afterwards, both forget about each other. Due to this nature of the protocol, neither the client nor the browser can retain information between different requests across the web pages.

* HTTP/1.0 uses a new connection for each request/response exchange, where as  
   HTTP/1.1 connection may be used for one or more request/response exchanges.


HTTP Version

HTTP uses a . numbering scheme to indicate versions of the protocol.
Here is the general syntax of specifying HTTP version number:
HTTP-Version   = "HTTP" "/" 1*DIGIT "." 1*DIGIT


Example

HTTP/1.0 or HTTP/1.1

4.   HTTPS
 4.1. Why we need HTTPS?
 When you log into your site, your login credentials are easy to intercept if not encrypted with HTTPS. The "password" field may show only circles in your Web browser, but your actual password is transmitted "in the clear" across the Internet for anyone to see. So, sending data on internet over HTTP is like sending item from courier without security. Criminals can access that traffic in several ways, including monitoring WiFi connections, having an inside position at an Internet service provider or backbone network, or by hacking into routers across the Internet so they can watch the traffic that flows across them.

So, new protocol introduced to handle valuable sensitive data. This protocol is known as SSL (Secured Socket Layer).

HTTPS= HTTP + SSL (HTTP protocol working in tandem with SSL)

So, what is SSL? Before we understand SSL, first we need to understand Cryptography.

4.2. What is Cryptography?

It is basically a science of hiding information. It’s a method of storing and transmitting data in a form so that only those for whom it is intended can read and process it.

                             
                                           Fig 4.2

In fig 4.2, we have a text “Hello World” that is being encrypted by algorithm and the output text after the encryption is called cipher/encrypted text as sown in above diagram.

Sometimes encryption needs a key to encrypt data. Data encrypted via key can be send to respective consumer who can decrypt to its normal form via same kay, here it is called semantic key that is used for both encryption and decryptions.

So, one key can be shared among all the users but this is situation like all the houses have lock and each have key and any one can open any’s lock. So, cryptography come to rescue again. In cryptography, there is a way to encrypt with one key and decrypt with another key.

  
4.3. Communication between browser and server
Coming back around SSL Protocol, so following steps happens when a data is send to serve.
1.     When we type URL with HTTPS then bowser connect over TCP port
443 (default port for HTTPS) over transport layer.

2.     After the connection is successful the next, SSL handshake starts.

3.     Server response with “server HELLO message”.

4.     Server send digital certificate signed by signing authority (Verisign etc.)

5.     Then server sends “server HELLO done” message hinting browser to start processing at its end.

6.     Browser response to the server by sending “Certificate verify” message. Its means server is verified.

7.     Then client sends “Changed cipher specification”. It means the data send over HTTPS by browser will be encrypted.

8.     After that browser sends the “Finish Message” which have digest message that contains all the communication held till now.

9.     Now server sends “Change cipher specification” message.

10. Again, server sends “Finish message” which also contains all the information of the communication held till now.

The purpose of Finish message is, confirmation that all the previous message not conferred / tempered. At this point SSL handshake is complete. The client sends Semantic secret key to server for encryption and decryption.

So, after that actual message shared by browser to the server and this way our data is secured via HTTPS over internet.

 5.   Message Format
HTTP requests and HTTP responses use a generic message format of RFC 822 for transferring the required data. This generic message format consists of the following four items.
  • A Start-line
  • Zero or more header fields followed by CRLF
  • An empty line (i.e., a line with nothing preceding the CRLF)
  • indicating the end of the header fields
  • Optionally a message-body

 

5.1  Message Start-Line

 

A start-line will have the following generic syntax:
start-line = Request-Line | Status-Line
We will discuss Request-Line and Status-Line while discussing HTTP Request and HTTP Response messages respectively. For now, let's see the examples of start line in case of request and response:
GET /hello.htm HTTP/1.1      (This is Request-Line sent by the client)
 
HTTP/1.1 200 OK              (This is Status-Line sent by the server)

 

5.2  Header Fields


HTTP header fields provide required information about the request or response, or about the object sent in the message body.

 

Syntax of the header field is as follow

message-header = field-name ":" [ field-value]

 

  

HTTP Request message

         

 

The above image is explaining all the components of the request. Following is separate example of GET and POST.

 

GET Request example

GET /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Here we are not sending any request data to the server because we are fetching a plain HTML page from the server. Connection is a general-header, and the rest of the headers are request headers. 
The following example shows how to send form data to the server using request message body:
Post Request example
POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Content-Type: application/x-www-form-urlencoded
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
 
licenseID=string&content=string&/paramsXML=string
Here the given URL /cgi-bin/process.cgi will be used to process the passed data and accordingly, a response will be returned. Here content-type tells the server that the passed data is a simple web form data and length will be the actual length of the data put in the message body.
After receiving and interpreting a request message, a server responds with an HTTP response message:
HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
Content-Length: 88
Content-Type: text/html
Connection: Closed

Hello, World!




 
Accept
Content-Types those are acceptable for the response. See Content negotiation.
Accept-Encoding
List of acceptable encodings. See HTTP compression.
Accept-Language
List of acceptable human languages for response. See Content negotiation.
Content-Length
The length of the request body in octets (8-bit bytes).
Content-Type
The MIME type of the body of the request (used with POST and PUT requests).

The following example shows an HTTP response message displaying error condition when the web server could not find the requested page:
HTTP/1.1 404 Not Found
Date: Sun, 18 Oct 2012 10:36:20 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 230
Connection: Closed
Content-Type: text/html; charset=iso-8859-1
   404 Not Found

  

Not Found

   The requested URL /t.html was not found on this server.


Following is an example of HTTP response message showing error condition when the web server encountered a wrong HTTP version in the given HTTP request:
HTTP/1.1 400 Bad Request
Date: Sun, 18 Oct 2012 10:36:20 GMT
Server: Apache/2.2.14 (Win32)
Content-Length: 230
Content-Type: text/html; charset=iso-8859-1
Connection: Closed
 
   400 Bad Request

  

Bad Request

   Your browser sent a request that this server could not understand.
   The request line contained invalid characters following the protocol string.



S.N.
Code and Description
1
1xx: Informational
It means the request was received and the process is continuing.
2
2xx: Success
It means the action was successfully received, understood, and accepted.
3
3xx: Redirection
It means further action must be taken in order to complete the request.
4
4xx: Client Error
It means the request contains incorrect syntax or cannot be fulfilled.
5
5xx: Server Error
It means the server failed to fulfill an apparently valid request.
HTTP status codes are extensible and HTTP applications are not required to understand the meaning of all registered status codes. A list of all the status codes has been given in a status code topic defined down the line of document

6.   HTTP – Methods


The set of common methods for HTTP/1.1 is defined below and this set can be expanded based on requirements. These method names are case sensitive and they must be used in uppercase.
.
S.N.
Method and Description
1
GET
The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data.
2
POST
A POST request is used to send data to the server, for example, customer information, file upload, etc. using HTML forms.
3
PUT
Replaces all current representations of the target resource with the uploaded content.
4
DELETE
Removes all current representations of the target resource given by a URI.
5
OPTIONS
Describes the communication options for the target resource.



GET Method

A GET request retrieves data from a web server by specifying parameters in the URL portion of the request. This is the main method used for document retrieval. The following example makes use of GET method to fetch hello.htm:
GET /hello.htm HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
The server response against the above GET request will be as follows:
HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
ETag: "34aa387-d-1568eb00"
Vary: Authorization,Accept
Accept-Ranges: bytes
Content-Length: 88
Content-Type: text/html
Connection: Closed

Hello, World!




POST Method

The POST method is used when you want to send some data to the server, for example, file update, form data, etc. The following example makes use of POST method to send a form data to the server, which will be processed by a process.cgi and finally a response will be returned:
POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: www.tutorialspoint.com
Content-Type: text/xml; charset=utf-8
Content-Length: 88
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
xml version="1.0" encoding="utf-8"?>
 xmlns="http://clearforest.com/">string


The server side script process.cgi processes the passed data and sends the following response:
HTTP/1.1 200 OK
Date: Mon, 27 Jul 2009 12:28:53 GMT
Server: Apache/2.2.14 (Win32)
Last-Modified: Wed, 22 Jul 2009 19:15:56 GMT
ETag: "34aa387-d-1568eb00"
Vary: Authorization,Accept
Accept-Ranges: bytes
Content-Length: 88
Content-Type: text/html
Connection: Closed

Request Processed Successfully




7.   HTTP - Status Codes


As I described above status code in brief, now we are going to see all the codes with its description. Status code play role in response and the browser or user behaves according to the status code received from the server.

S.N.
Code and Description
1
1xx: Informational
It means the request was received and the process is continuing.
2
2xx: Success
It means the action was successfully received, understood, and accepted.
3
3xx: Redirection
It means further action must be taken in order to complete the request.
4
4xx: Client Error
It means the request contains incorrect syntax or cannot be fulfilled.
5
5xx: Server Error
It means the server failed to fulfill an apparently valid request.





1XX Information:
100 Continue
A status code of 100 indicates that (usually the first) part of a request has been received without any problems, and that the rest of the request should now be sent.
101 Switching Protocols
HTTP 1.1 is just one type of protocol for transferring data on the web, and a status code of 101 indicates that the server is changing to the protocol it defines in the "Upgrade" header it returns to the client. For example, when requesting a page, a browser might receive a status code of 101, followed by an "Upgrade" header showing that the server is changing to a different version of HTTP.

2XX Information:
  200 - OK
The 200 status code is by far the most common returned. It means, simply, that the request was received and understood and is being processed.
  201 - Created
A 201 status code indicates that a request was successful and as a result, a resource has been created (for example a new page).
  202 - Accepted
The status code 202 indicates that server has received and understood the request, and that it has been accepted for processing, although it may not be processed immediately.
  203 - Non-Authoritative Information
A 203 status code means that the request was received and understood, and that information sent back about the response is from a third party, rather than the original server. This is virtually identical in meaning to a 200 status code.
  204 - No Content
The 204 status code means that the request was received and understood, but that there is no need to send any data back.
  205 - Reset Content
The 205 status code is a request from the server to the client to reset the document from which the original request was sent. For example, if a user fills out a form, and submits it, a status code of 205 means the server is asking the browser to clear the form.
  206 - Partial Content
A status code of 206 is a response to a request for part of a document. This is used by advanced caching tools, when a user agent requests only a small part of a page, and just that section is returned





3XX Information:

  300 - Multiple Choices
The 300 status code indicates that a resource has moved. The response will also include a list of locations from which the user agent can select the most appropriate.
  301 - Moved permanently
A status code of 301 tells a client that the resource they asked for has permanently moved to a new location. The response should also include this location. It tells the client to use the new URL the next time it wants to fetch the same resource.
  302 - Found
A status code of 302 tells a client that the resource they asked for has temporarily moved to a new location. The response should also include this location. It tells the client that it should carry on using the same URL to access this resource.
  303 - See Other
A 303 status code indicates that the response to the request can be found at the specified URL, and should be retrieved from there. It does not mean that something has moved - it is simply specifying the address at which the response to the request can be found.
  304 - Not Modified
The 304 status code is sent in response to a request (for a document) that asked for the document only if it was newer than the one the client already had. Normally, when a document is cached, the date it was cached is stored. The next time the document is viewed, the client asks the server if the document has changed. If not, the client just reloads the document from the cache.
  305 - Use Proxy
A 305 status code tells the client that the requested resource has to be reached through a proxy, which will be specified in the response.
  307 - Temporary Redirect
307 is the status code that is sent when a document is temporarily available at a different URL, which is also returned. There is very little difference between a 302 status code and a 307 status code. 307 was created as another, less ambiguous, version of the 302 status code.

4XX Information:
  400 - Bad Request
A status code of 400 indicates that the server did not understand the request due to bad syntax.
  401 - Unauthorized
A 401 status code indicates that before a resource can be accessed, the client must be authorised by the server.
  402 - Payment Required
The 402 status code is not currently in use, being listed as "reserved for future use".
  403 - Forbidden
A 403 status code indicates that the client cannot access the requested resource. That might mean that the wrong username and password were sent in the request, or that the permissions on the server do not allow what was being asked.
  404 - Not Found
The best known of them all, the 404 status code indicates that the requested resource was not found at the URL given, and the server has no idea how long for.
  405 - Method Not Allowed
A 405 status code is returned when the client has tried to use a request method that the server does not allow. Request methods that are allowed should be sent with the response (common request methods are POST and GET).
  406 - Not Acceptable
The 406 status code means that, although the server understood and processed the request, the response is of a form the client cannot understand. A client sends, as part of a request, headers indicating what types of data it can use, and a 406 error is returned when the response is of a type not i that list.
  407 - Proxy Authentication Required
The 407 status code is very similar to the 401 status code, and means that the client must be authorised by the proxy before the request can proceed.
  408 - Request Timeout
A 408 status code means that the client did not produce a request quickly enough. A server is set to only wait a certain amount of time for responses from clients, and a 408 status code indicates that time has passed.
  409 - Conflict
A 409 status code indicates that the server was unable to complete the request, often because a file would need to be editted, created or deleted, and that file cannot be editted, created or deleted.
  410 - Gone
A 410 status code is the 404's lesser known cousin. It indicates that a resource has permanently gone (a 404 status code gives no indication if a resource has gine permanently or temporarily), and no new address is known for it.
  411 - Length Required
The 411 status code occurs when a server refuses to process a request because a content length was not specified.
  412 - Precondition Failed
A 412 status code indicates that one of the conditions the request was made under has failed.
  413 - Request Entity Too Large
The 413 status code indicates that the request was larger than the server is able to handle, either due to physical constraints or to settings. Usually, this occurs when a file is sent using the POST method from a form, and the file is larger than the maximum size allowed in the server settings.
  414 - Request-URI Too Long
The 414 status code indicates the the URL requested by the client was longer than it can process.
  415 - Unsupported Media Type
A 415 status code is returned by a server to indicate that part of the request was in an unsupported format.
  416 - Requested Range Not Satisfiable
A 416 status code indicates that the server was unable to fulfill the request. This may be, for example, because the client asked for the 800th-900th bytes of a document, but the document was only 200 bytes long.
  417 - Expectation Failed
The 417 status code means that the server was unable to properly complete the request. One of the headers sent to the server, the "Expect" header, indicated an expectation the server could not meet.

5XX Information
  500 - Internal Server Error
A 500 status code (all too often seen by Perl programmers) indicates that the server encountered something it didn't expect and was unable to complete the request.
  501 - Not Implemented
The 501 status code indicates that the server does not support all that is needed for the request to be completed.
  502 - Bad Gateway
A 502 status code indicates that a server, while acting as a proxy, received a response from a server further upstream that it judged invalid.
  503 - Service Unavailable
A 503 status code is most often seen on extremely busy servers, and it indicates that the server was unable to complete the request due to a server overload.
  504 - Gateway Timeout
A 504 status code is returned when a server acting as a proxy has waited too long for a response from a server further upstream.
  505 - HTTP Version Not Supported
A 505 status code is returned when the HTTP version indicated in the request is no supported. The response should indicate which HTTP versions are supported.



8.   Redirection
HTTP allows servers to redirect a client request to a different location. Although, this will usually result in another network round trip, it has some useful applications:
  • A web application may use redirection to navigate between parts of the application.
  • If content has moved to a different URL or domain name, redirection can be used to avoid breaking old URLs or bookmarks.
  • It is possible to convert a POST request to a GET request using redirection.
  • A client can be directed to use its local cache for content that has not changed.
·         A server specifies redirection by returning a 3xx status code:
301
This indicates that the content now resides permanently at the location specified by the Location header and future requests should be directed to this location.
302
Same as 301, except that the new location is temporary and future requests should still be sent to the original location. Another feature of this status code is that if the original request was a POST the client will change to using a GET when it re-issues the request (See below for more details).
303
This status code was intended to be the only status code that caused a POST to be converted to a GET. However, most browsers treat a 302 like a 303.
304
Used in response to an If-Modified header to redirect a request to the browser's local cache.

How redirection happens actually that I show in pictorial form as below
The above picture shows, how redirection happens on behalf of status code. All of these status codes require the URL of the redirect target to be given in the Location: header of the HTTP response.


All of these status codes require the URL of the redirect target to be given in the Location: header of the HTTP response.

Example HTTP response for a 301 redirect

A HTTP response with the 301 "moved permanently" redirect looks like this:
HTTP/1.1 301 Moved Permanently
Location: http://www.example.org/
Content-Type: text/html
Content-Length: 174
 
<html>
<head>
<title>Moved</title>
</head>
<body>
<h1>Moved</h1>
<p>This page has moved to <a href="http://www.example.org/">http://www.example.org/</a>.</p>
</body>
</html>



9.     Cross Domain
To understand what is cross domain, let’s understand Ajax request. Suppose you need to populate states on the selection of country and you don’t want to reload the page again for each request. So what we do is we send a silent request (i.e. Ajax request) from back ground and get the particular contain and using the help of JavaScript and reloads the only the stat’s dropdown. SO it get loaded and page doesn’t get refreshed. So suppose A.com want to read data from B.com and both are different servers. Pulling data form another domain is called cross domain.
Following are the pictorial representation of the cross domain request. The following will show what all the steps happens when we send the request cross domain.

Preflight / Preflight requests
  Unlike simple requests (discussed above), "preflighted" requests first send an HTTP request by the OPTIONS method to the resource on the other domain, in order to determine whether the actual request is safe to send.  Cross-site requests are preflighted like this since they may have implications to user data.  In particular, a request is preflighted if:
  It uses methods other than GET, HEAD or POST.  Also, if POST is used to send request data with a Content-Type other than application/x-www-form-urlencoded,multipart/form-data, or text/plain, e.g. if the POST request sends an XML payload to the server using application/xml or text/xml, then the request is preflighted.



An example:
When performing certain types of cross-domain AJAX requests, modern browsers that support CORS will insert an extra "preflight" request to determine whether they have permission to perform the action.


In the above request and response the “

Preflight Request:


 


OPTIONS /cors HTTP/1.1
Origin: http://api.bob.com
Access-Control-Request-Method: PUT
Access-Control-Request-Headers: X-Custom-Header 
Host: api.alice.com Accept-Language: en-US
Connection: keep-alive
User-AgentMozilla/5.0...
 




Preflight Response:

 

Access-Control-Allow-Origin: http://api.bob.com
Access-Control-Allow-Methods: GET, POST, PUT
Access-Control-Allow-Headers: X-Custom-Header
Content-Type: text/html; charset=utf-8
 





Access-Control-Request-Method” sends the requested method and in response server sends Allowed methods, which are accepted by servers.

10.           Multipart Request

10.1.                  What is multipart?
A HTTP multipart request is a HTTP request that HTTP clients construct to send files and data over to a HTTP Server. It is commonly used by browsers and HTTP clients to upload files to the server.
Thus, a typical multipart Content-Type header field might look like this:
Content-Type: multipart/mixed;
boundary=gc0p4Jq0M2Yt08jU534c0p
This indicates that the entity consists of several parts, each itself with a structure that is syntactically identical to an RFC 822 message, except that the header area might be completely empty, and that the parts are each preceded by the line
--gc0p4Jq0M2Yt08jU534c0p

Example:
File Upload


File:

Destination:
 





This is what submitted data from the fileupload form looks like, after selecting sample.txt as the file that will be uploaded to the tmp directory on the local file system:
POST /fileupload/upload HTTP/1.1
Host: localhost:8080
Content-Type: multipart/form-data;
boundary=---------------------------263081694432439
Content-Length: 441
-----------------------------263081694432439
Content-Disposition: form-data; name="file"; filename="sample.txt"
Content-Type: text/plain
Data from sample file
-----------------------------263081694432439
Content-Disposition: form-data; name="destination"
/tmp
-----------------------------263081694432439
Content-Disposition: form-data; name="upload"
Upload
-----------------------------263081694432439--

Therefore it is clear that:
  Content-Type: multipart/form-data; boundary=---------------------------9051914041544843365972754266 sets the content type to multipart/form-data and says that the fields are separated by the given boundary string.
  every field gets some sub headers before its data:
Content-Disposition: form-data;,
  the field name, the filename, followed by the data.
  The server reads the data until the next boundary string. The browser must choose a boundary that will not appear in any of the fields, so this is why the boundary may vary between requests.
  Because we have the unique boundary, no encoding of the data is necessary: binary data is sent as is.