[中英对照]当你在浏览器中输入“google.com”并回车,会发生什么?

Web应用开发 William 43浏览 0评论

My most favorite interview question I’ve come across yet was “You type ‘google.com’ into a browser address bar and hit <Enter>, what happens afterwards?”

Someone could talk for days on end trying to answer that with some form of completeness. How deep will they go? Strictly for fun, I’m going to put my answer here. When I was asked this in an actual interview, I rambled on for a good 10 minutes before they stopped me. And then I kept remembering things I forgot to include even after the interview finished.

I’m going to keep this formatted as a wall of text because that’s how it felt to answer this question in conversation.

我已遇到过的最喜欢的面试问题是”你键入’google. com’到一个浏览器的地址栏中, 并点击<Enter>, 之后会发生什么呢?”

有人可以滔滔不绝几天, 试图以某种形式的完备性来回答此问题。他们会走多深?纯粹出于兴趣, 我要把我的答案罗列在此。当我在一次实际面试中被问到这个问题时, 在他们阻止我之前我漫谈了1 0分钟。之后即使在面试结束后,我一直记得当时我所遗漏的东西。

我将把这个格式化为文本墙, 因为在谈话中回答这个问题就是这样的感觉.

So What Happens?

The browser is going to analyze the input. Usually if it has a “.com” it won’t think you’re typing search terms. And once it decides it must be a URL, it’ll check that it has a scheme, if not, it’ll add “http://” to the beginning. And since you didn’t specify a number of HTTP protocol features, it’ll assume defaults, like port 80, GET method and no basic auth.

Then it’ll create an HTTP request and send that. I’m not confident in my low level networking knowledge but if I was I’d say something about the MAC address, TCP packet transfers, dropped packet handling. But anyway, a “google.com” DNS lookup will happen, and if it’s not already cached a DNS service will reply with a list of IP addresses, because “google.com” doesn’t just have a single IP address. Browsers will pick the first one by default I believe. Not sure if they’re regional or how the list works, but I know it’s there.

那么发生了什么呢?

浏览器将分析输入。通常情况下, 如果输入中有”. com”, 它不会认为你在输入搜索词。一旦它决定其必定是一个url时, 它会检查输入是否有协议头,如果没有, 它会在其开头添加”http://”。由于你没有指定一系列http协议功能, 因此它将假定使用默认值, 如端口80、GET方法和无基本身份认证。

然后, 它将创建一个http请求并发送该请求。我对我的底层网络知识没有信心, 但如果我确实要说, 我会说一些关于MAC地址, TCP数据包传输, 丢包处理等。但无论如何, 一个对”google. com”DNS的查找将会发生, 如果它还没有对此的缓存,DNS服务将应答一系列IP地址列表, 因为”google. com”不只单IP的网站。我认为在默认情况下浏览器会选择第一个。不确定它们是区域性的以及它是如何工作的, 但我知道它就在那里。

So the HTTP request jumps from node to node until it gets to the IP address of google.com’s load balancer. It wouldn’t last long, Google would respond that you need to be using HTTPS – assuming with a 301 permanent redirect. So it would go all the way back to your browser, the browser would change the scheme to HTTPS, use the default 443 port and resend. This time the TLS handshake would take place between the load balancer and the browser client. Not 100% on how that works but I know the request would tell Google what protocols it supports (TLS 1.0, 1.1, 1.2) and Google would respond with “Let’s use 1.2”. Then the request gets sent with TLS encryption.

I think the next thing Google would do is put it through web application firewall rules on its load balancer to see if it’s a malicious request. When it passes, the secure connection has probably been terminated (because PCI-DSS regulations say you don’t need to encrypt internal traffic) and the request would get assigned to a pool in their CDN, and the google-side cached homepage will be returned in an HTTP response. Probably pre-gzipped.

因此, http 请求从一个节点跳转到另一个节点, 直到它找到google. com负载均衡器的IP地址。这不会持续很久, 谷歌会回应说, 你需要使用https-假定是301永久重定向。因此, 它会原路返回到你的浏览器, 浏览器将协议更改为 https, 默认使用443端口并重新发送。这一次,TLS握手将在负载均衡器和浏览器客户端之间进行。我不是100%确定其工作原理, 但我知道该请求会告诉谷歌, 它支持什么协议 (TLS 1.0, 1.1, 1.2) ,然后谷歌将响应 “让我们使用1.2吧”。之后使用TLS加密发送请求。

我认为谷歌接下来要做的是将其放到负载均衡器上的网络应用程序防火墙规则集上, 看看它是否是一个恶意请求。当这通过之后, 安全连接可能已被终止 (因为PCI-DSS规则规定你不需要加密内部流量), 请求将被分配到其CDN中的某个池上, 而google端缓存主页将在http响应中返回。可能是预先压缩的。

Google’s response header would be read by the browser, cached according to the response header caching policy, then the body would be un-gzipped. And because it’s google it’s probably ultra-optimized: minified, likely a lot of pre-rendered content, inlined CSS, JavaScript and images to reduce network requests and the time-to-first-render. But that request will trigger a cascade of other requests, all concurrent because it should be running HTTP/2. While those requests are being made, JavaScript would be parsed, probably not blocking because they used the defer attribute on their tags – or async, I never did read about what those did individually.

But the browser has probably already rendered the search box and is working on the toolbar at the top, which is going to take some extra network requests – I probably already have a cookie or maybe local storage with an OAuth token – or maybe I’m using Chrome and it already knows who I am, and that request with auth gets sent to their Google+ API that tells the Google search page application who I am.

谷歌的响应头将由浏览器读取,根据响应头的缓存策略进行缓存,然后正文将被解压缩。而且因为这是谷歌,它可能是超优化的:压缩,可能是许多预渲染内容、内联CSS、JavaScript和图像,以减少网络请求和首次渲染时间。但该请求将触发一系列其他请求,所有这些请求都是并发的,因为它应该运行HTTP/2。当这些请求正在进行时,JavaScript会被解析,可能没有阻塞,因为他们在标签上使用了defer属性 – 或者async,我从来没有单独阅读过这里他们做了些什么的资料。

但浏览器可能已经渲染了搜索框并且正在顶部的工具栏上工作,这将需要一些额外的网络请求 – 我可能已经有一个cookie或可能是带有OAuth令牌的本地存储 – 或我可能是使用Chrome并且它已经知道我是谁,并且使用auth的请求会被发送到他们的Google+ API上,告诉Google搜索页面的应用程序我的身份。

Another request would be sent to get my avatar image. At this point they’ve already browser-sniffed to see if I wasn’t using Chrome, in which case they would have popped-in a tooltip to tell me that Chrome is awesome and I should be using that instead of anything else.

I think it would quiet down at that point. All taking place in a fraction of a second.

另一个请求将被发送, 以获取我的头像图像。在这一点上, 他们已经浏览器可嗅探的, 看看我是否未使用 chrome, 在这种情况下, 他们会有弹出一个工具栏提示, 告诉我:chrome 是真棒, 我应该使用它, 而不是其他任何浏览器。

我想此时需要冷静下来。所有这些都发生在一秒的时间内。

What is observably different?
Let’s lookup the DNS:

  • I know I had previously seen google.com coming back with multiple IP addresses, but that doesn’t seem to be the case anymore. Seems that they used to use round-robin but don’t anymore. This StackOverflow question covers it. I had forgotten it was called round-robin.

何为显著地不同?让我们看看对应的DNS:

  • 我知道我以前见过google.com返回包中带有多个IP地址,但似乎不再是这种情况了。似乎他们之前常常使用轮巡策略,但现在不再使用了。 这个StackOverflow提问涉及了此情况。我已忘记了它被称为轮2。

Network Layers…

In a formally structured answer, you’d probably reference the OSI Model, which I know of but am not well versed in. After looking it up, I take it network layering maps like this:

  1. Application – The logic initiating requests

  2. Presentation – HTTP

  3. Session – TLS

  4. Transport – TCP

  5. Network – packet routing (IP)

  6. Data link – frames (which seem to be packet containers)

  7. Physical – bitstreams

  • I missed that in TLS they exchange certificates after agreeing on a protocol.

  • Networking isn’t my strongest arena.

Open google.com in my browser, disable cache:

  • I missed the host name canonicalization – which was a 301.

  • The correction from HTTP to HTTPS is a 307 Internal Redirect.

  • It then downloads fonts, the logo images, and my avatar image. Without an API call, which means they shoved my profile information in the page and bundled that with the return – so they’re doing actual data retrieval when you hit google.com and not just serving cached assets.

网络层…

在一个正式结构化回答中,你可能会参考我有所了解但并不精通的OSI模型。在查阅资料之后,我将它视为如下的网络分层映射:

  1. 应用 – 触发请求的逻辑

  2. 表示层 – HTTP

  3. 会话 – TLS

  4. 传输 – TCP

  5. 网络 – 路由 (IP)

  6. 数据链路 – 帧 (可看做数据包的容器)

  7. 物理层 – 比特流

  • 我记得在TLS中他们会在协议协商时交换证书。

  • 网络并不是我的强项。

在我的浏览器中打开google.com,并禁用缓存:

  • 我记得主机名规范化——这是一个301。

  • 从HTTP到HTTPS的校正是一个307内部重定向。

  • 然后它下载字体、商标图像和我的头像图像。如果没有API调用,这意味着他们会在页面中推送我的个人资料信息并将其与返回数据捆绑在一起 – 因此当你点击google.com而不仅仅是提供缓存资产时,他们会进行实际的数据检索。

The Response

Above is a file comparison of the IE 11 and Chrome responses – both logged out.

  • Not terribly different between IE11 and Chrome. But it means they’re user-agent sniffing server-side and not client-side. Could have mentioned this in my answer.

  • Unexpectedly, the Chrome response is larger by 22kB. I wonder if it’s the search-by-voice feature, which is visibly absent from IE 11. IE11 probably needs polyfills and the Chrome advertisement but it’s all obfuscated and I’m not going to torture myself any further.

  • Even after I clear my cookies in Chrome, it still sends cookies on first request. It does not do that in IE 11.

响应

以上是IE 11和Chrome响应数据的对比——所有都处于退出状态。

  • IE11和Chrome之间没有太大的差别。但这意味着他们是用户代理嗅探服务器端而不是客户端。在我的答案中可能提到了这一点。

  • 出乎意料的是,Chrome的响应体大了22kB。我想知道它是否是由在IE 11中明显缺席的语音搜索功能引起的。IE11可能需要polyfill和Chrome的广告,但它都被混淆了,我不会再进一步折磨自己了。

  • 即使我在Chrome中清除了Cookie,它仍会在第一次请求时发送Cookie。它在IE 11中并没有这样做。

Lets dig into that rendering!

That pic above is the first screenshot Chrome will give you.

  • There aren’t any async or defer attributes on the script tags, just nonce attributes. I’m learning about nonce as of this minute, and it seems to be security related. I guess they want those blocking scripts. I’m sure they fiddled around with/without async/defer at some point and decided against it.

  • Note to self: Full response is a mess of mixed JavaScript, CSS, and HTML. They aren’t following any rules governing their placement in regards to separation.

深入理解渲染!

上图是Chrome将为你提供的第一个屏幕截图。

  • 脚本标签中没有任何async或defer属性,只有nonce属性。我目前正在学习有关nonce的知识,这似乎与安全性有关。我估计他们想要那些阻塞式脚本。我确信他们在某些方面尝试过有/无aync/defer的情况,并决定反对之。

  • 自我提示:完全响应是对JavaScript、CSS和HTML的乱七八糟的混合体。相比于其独立性,他们没有遵守任何控制其位置的规则。

What about the question itself?

You know what? Maybe it’s not that great of an interview question for a developer since the answer has so much networking involved. It’s the format of the question I like, something open ended, that includes some guessing. That gives the interviewer the opportunity to follow up with questions like “How do you think TLS is established?” to see how the candidate thinks, see how creative they are, see what their limit is (how patient?).

What’s your favorite interview question?

问题本身是什么呢?

你知道吗? 对于开发人员而言,这可能不是一个很好的面试问题,因为答案涉及到如此多的网络知识。这是我喜欢的问题的格式,一些开放的事物,包括一些猜测。这使得面试官有机会跟进诸如“你认为TLS是如何建立的?”之类的问题,以查看候选人如何思考,看看他们有多少创意,看看他们的极限何在(有多耐心?)。

你最喜欢的面试问题是什么?

div.column{width:49.5%;display:table-cell;border:1px solid #d4d4d5;}


via:oschina

转载请注明:AspxHtml学习分享网 » [中英对照]当你在浏览器中输入“google.com”并回车,会发生什么?

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址