Click here to Skip to main content
13,260,383 members (43,942 online)
Click here to Skip to main content
Add your own
alternative version

Stats

7.5K views
13 bookmarked
Posted 10 Jul 2017

The HTTP Series (Part 3): Client Identification

, 3 Aug 2017
Rate this:
Please Sign up or sign in to vote.
The HTTP series: Client identification

http-part-3-title

Up until now, you learned about the basic concepts and some of the architectural aspects of HTTP. This leads us to the next important subject to the HTTP: client identification.

In this article, you’ll learn why client identification is important and how Web servers can identify you (your Web client). You will also get to see how that information is used and stored.

This is what we have learned so far, and where we are now:

In this article, you will learn more about:

First, let’s see why websites would need to identify you.

Client Identification and Why It’s Extremely Important

As you are most definitely aware, every website, or at least those that care enough about you and your actions, include some form of content personalization.

What do I mean by that?

Well, that includes suggested items if you visit e-commerce website, or “the people you might now/want to connect with” on social networks, recommended videos, ads that almost spookily know what you need, news articles that are relevant to you and so on.

This effect feels like a double edged sword. On one hand, it’s pretty nifty having personalized, custom content delivered to you. On the other hand, it can lead to Confirmation bias that can result in all kinds stereotypes and prejudice. There is an excellent Dilbert comic that touches upon Confirmation bias.

Yet, how can we live without knowing how our favorite team scored last night, or what celebrities did last night?

Either way, content personalization has become part of our daily lives - we can’t and we probably don’t even want to do anything about it.

Let’s see how the Web servers can identify you to achieve this effect.

Different Ways to Identify the Client

multipass

There are several ways that a Web server can identify you:

  • HTTP request headers
  • IP address
  • Long URLs
  • Cookies
  • Login information (authentication)

Let’s go through each one. Authentication is described in more detail in part 4 of the HTTP series.

HTTP Request Headers Used for Identification

Web servers have a few ways to extract information about you directly from the HTTP request headers.

Those headers are:

  • From – Contains user’s email address if provided
  • User-Agent – Contains the information about Web client
  • Referer – Contains the source user came from
  • Authorization – Contains username and password
  • Client-ip – Contains user’s IP address
  • X-Forwarded-For – Contains user’s IP address (when going through the proxy server)
  • Cookie – Contains server-generated ID label

In theory, the From header would be ideal to uniquely identify the user, but in practice, this header is rarely used due to the security concerns of email collection.

The user-agent header contains information like the browser version, operating system. While this is important for customizing content, it doesn’t identify the user in a more relevant way.

The Referer header tells the server where the user is coming from. This information is used to improve the understanding of the user behavior, but less so to identify it.

While these headers provide some useful information about the client, it is not enough to personalize content in a meaningful way.

The remaining headers offer more precise mechanisms of identification.

IP Address

The method of client identification by IP address has been used more in the past when IP addresses weren’t so easily faked/swapped. Although it can be used as an additional security check, it just isn’t reliable enough to be used on its own.

Here are some of the reasons why:

  • It describes the machine, not the user
  • NAT firewalls – Many ISPs (Internet service providers) use NAT firewalls to enhance security and deal with IP address shortage
  • Dynamic IP addresses – Users often get the dynamic IP address from the ISP
  • HTTP proxies and gateways – These can hide the original IP address. Some proxies use Client-ip or X-Forwarded-For to preserve the original IP address

Long (fat) URLs

It is not that uncommon to see websites utilize URLs to improve the user experience. They add more information as the user browses the website until URLs look complicated and illegible.

You can see what the long URL looks like by browsing the Amazon store.

https://www.amazon.com/gp/product/1942788002/
ref=s9u_psimh_gw_i2?ie=UTF8&fpl=fresh&pd_rd_i=1942788002&pd_rd_r=70BRSEN2K19345MWASF0&pd_rd_w=KpLza&
pd_rd_wg=gTIeL&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=&pf_rd_r=RWRKQXA6PBHQG52JTRW2&pf_rd_t=36701&
pf_rd_p=1cf9d009-399c-49e1-901a-7b8786e59436&pf_rd_i=desktop

There are several problems when using this approach:

  • It’s ugly
  • Not shareable
  • Breaks caching
  • It’s limited to that session
  • Increases the load on the server

Cookies

The best client identification method up to date excluding the authentication. Developed by Netscape, but now every browser supports them.

There are two types of cookies: session cookies and persistent cookies. A session cookie is deleted upon leaving the browser, and persistent cookies are saved on disk and can last longer. For the session cookie to be treated as the persistent cookie, Max-Age or Expiry property needs to be set.

Modern browsers like Chrome and Firefox can keep background processes working when you shut them down so you can resume where you left off. This can result in the preservation of the session cookies, so be careful.

So how do the cookies work?

Cookies contain a list of name=value pairs that server sets using Set-Cookie or Set-Cookie2 response header. Usually, the information stored in a cookie is some kind of client id, but some websites store other information as well.

The browser stores this information in its cookie database and returns it when the user visits the page/website next time. The browser can handle thousands of different cookies and it knows when to serve each one.

Here is an example flow.

  1. User Agent -> Server
    POST /acme/login HTTP/1.1
    [form data]

    User identifies itself via form input

  2. Server -> User Agent
    HTTP/1.1 200 OK
    Set-Cookie2: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme"

    The server sends the Set-Cookie2 response header to instruct the User Agent (browser) to set the information about the user in a cookie.

  3. User Agent -> Server
    POST /acme/pickitem HTTP/1.1
    Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
    [form data]

    The user selects the item to the shop basket.

  4. Server -> User Agent
    HTTP/1.1 200 OK
    Set-Cookie2: Part_Number="Rocket_Launcher_0001"; Version="1"; Path="/acme"

    Shopping basket contains an item.

  5. User Agent -> Server
    POST /acme/shipping HTTP/1.1
    Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"; 
            Part_Number="Rocket_Launcher_0001";
    [form data]

    The user selects the shipping method.

  6. Server -> User Agent
    HTTP/1.1 200 OK
    Set-Cookie2: Shipping="FedEx"; Version="1"; Path="/acme"

    New cookie reflects shipping method.

  7. User Agent -> Server
    POST /acme/process HTTP/1.1
    Cookie: $Version="1";
            Customer="WILE_E_COYOTE"; $Path="/acme";
            Part_Number="Rocket_Launcher_0001"; $Path="/acme";
            Shipping="FedEx"; $Path="/acme"
    [form data]

That’s it!

There is one more thing I want you to be aware of. The cookies are not perfect either. Besides security concerns, there is also a problem with cookies colliding with REST architectural style. (The section about misusing cookies).

You can learn more about cookies in the RFC 2965.

Conclusion

This wraps it up for this part of the HTTP series.

You have learned about the strengths of content personalization as well as its potential pitfalls. You are also aware of the different ways that servers can use to identify you. In part 4 of the series, we will talk about the most important type of client identification: authentication.

If you found some of the concepts in this part unclear, refer to part 1 and part 2 of the HTTP series.

Thank you for reading and feel free to leave your comments below.

References

The post The HTTP series (Part 3): Client identification appeared first on Code Maze.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Vladimir Pecanac
Software Developer
Serbia Serbia
Hi, my name is Vladimir Pecanac, and I am full time .NET developer and software development enthusiast. I use this tiny part of the internet to share the things I learn in hope of both helping others and deepening my own knowledge of the topics I write about.

The best way to learn is to teach.

I feel that many technical articles are written in an unnecessarily complicated way to sound more authoritative and serious. My goal is to change that trend and write down-to-earth, simple articles that are easy to read and understand by anyone.

Having said all that, I hope you will enjoy reading my articles and learn something new in the process.

If you liked my articles, you can read more at my blog Code Maze.

You may also be interested in...

Pro

Comments and Discussions

 
PraiseGreat Pin
Charles196325-Jul-17 4:05
memberCharles196325-Jul-17 4:05 
GeneralRe: Great Pin
Vladimir Pecanac25-Jul-17 4:56
professionalVladimir Pecanac25-Jul-17 4:56 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171114.1 | Last Updated 3 Aug 2017
Article Copyright 2017 by Vladimir Pecanac
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid