星期日, 二月 26, 2023

2023年,结束过去,开始最后一段人生旅途

一年又熬过去了, 进入2023年了。

2022年1月19号开始分居,到最小的书房里打地铺,过了整整一年多了。
这也符合了加拿大的离婚要求了。

孩子都大了,没有后顾之忧了。 

10年前的事情,该有个了结了。

今天正式的打了离婚的申请表,填好了,希望她能签字。十年前她做的恶心事是让人无法接受的,本来当时就要离婚的。但是她用孩子威胁我,孩子要考大学不受影响,我同意推迟离婚。现在孩子都大了,事情该有个了结了。以后各过各的,在人生的最后几年里不再纠缠,过几天自己想过的安全安心的清净日子。

这次她再如何耍赖,我都不会再让步了。

晚上小儿子打电话给我,告诉我只要我幸福开心他都支持的。


星期五, 一月 14, 2022

一下就~ 飄到 2022新年

昨天晚上, 有好幾天,都夢到了。。。。 

 又找回了這個 blogger。。。 
 好吧, 謝謝谷哥哥blogger,居然還保存著我的歷史資料, 

 那是老天的意思,那就繼續寫點東西吧。。。 

 在寫文的時候,腦子裏想的居然最多是兩個孩子。

 他們將來會如何,如果我現在給他們講的話聼不進去,將來看看我給他們寫的東西,也許會起點作用。 

畢竟人生這條路,我已經走了大半了。 

 希望當我不在的時候,他們還有個地方看看他們爸爸給他們的人間留言吧。 

我的爸爸 (你們的爺爺),就是在我剛剛大學畢業不久就突然離開世界的。 留給我的東西實在不多,以前很不理解甚至有點討厭他的東西,我現在非常理解了,想起來對不起他的苦心。 

他在離開前,和我說,他本來想寫一本書, 好吧,繼續寫。。 

 孩子,你可以看出,從這個blogger中間停了很久, 
 這段時間,我去幹啥了呢? 

 第一,家裏發生了一些事情,爸爸花了三個月的時間去看了一百本書,閉關學習反省。爸爸把以前缺的人生課囘補了一點知道了許多。 

第二,爸爸發覺,身邊有許多和爸爸一樣的人,需要幫助,爸爸就去幫他們了。那些努力工作養家的中國男人,太可憐了。這麽多年的網上社區努力,現在你已經看不見中國夫妻見血的家庭案件在新聞中了。 

第三,爸爸又花了三年時間在項目管理學會,把三人行mentoring program 搞成了,幫助了50個人。他們想我做president,我可不想那個沒趣的,我退出了。去做其他有趣又可以幫到人的事情。 

第四,爸爸感覺到這個社會很有意識形態的問題在教育,所以加入了政黨,一起推翻了catherine wynne極端政府,把極端意識形態對孩子學校教育的污染清除一些。價值觀教育還是保守一點好, 

第五。爸爸幫助國會議員bob,不讓毒品和妓院進入我們的社區, 爸爸是他的核心EDA成員。 

爸爸知道這些都是不賺錢的事情,但是非常重要。

如果現在爸爸不做,輪到你們長大的時候,當這些腐化社會的問題已經固化,你們就沒有時間,沒有機會改正了。到時候我一定會責怪我自己 當初沒有盡力。 

這是爸爸能爲你們和你們將來的家和孩子做的事情。  爸爸很開心做這些事情。

孩子不要讓自己的人生有遺憾的事情,
想做的事情,應該做的事情,不要管別人如何說,去做就對了,
把事情的結果這個頭痛糾結的問題就交給阿爸父吧。

星期三, 六月 04, 2008

youTube 的结构

Platform


  • Apache is the most popular web server in use today because it is free, runs everywhere, performs well, and can be configured to handle most needs.

    http://httpd.apache.org/">Apache

  • Python
  • Linux is a very popular OS in data centers because it is free, runs on a lot of hardware, has tons of available software, highly performing, easily virtualizable, and flexible. All good attributes when you are starting a web site and hoping to grow with demand.

    Some popular versions of Linux used in data centers are: CentOS, Red Hat, and Ubuntu.
    http://www.linux.org/">Linux
    (SuSe)

  • http://www.mysql.com/">MySQL

  • psyco, a dynamic python->C compiler
  • http://highscalability.com/lighttpd">lighttpd for video instead of Apache

    What's Inside?

    The Stats

  • Supports the delivery of over 100 million videos per day.
  • Founded 2/2005
  • 3/2006 30 million video views/day
  • 7/2006 100 million video views/day
  • 2 sysadmins, 2 scalability software architects
  • 2 feature developers, 2 network engineers, 1 DBA

    Recipe for handling rapid growth



    while (true)
    {
    identify_and_fix_bottlenecks();
    drink();
    sleep();
    notice_new_bottleneck();
    }

    This loop runs many times a day.

    Web Servers

  • NetScalar is used for load balancing and caching static content.
  • Run Apache with mod_fast_cgi.
  • Requests are routed for handling by a Python application server.
  • Application server talks to various databases and other informations sources to get all the data and formats the html page.
  • Can usually scale web tier by adding more machines.
  • The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
  • Python allows rapid flexible development and deployment. This is critical given the competition they face.
  • Usually less than 100 ms page service times.
  • Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
  • For high CPU intensive activities like encryption, they use C extensions.
  • Some pre-generated cached HTML for expensive to render blocks.
  • Row level caching in the database.
  • Fully formed Python objects are cached.
  • Some data are calculated and sent to each application so the values are cached in local memory. This is an underused strategy. The fastest cache is in your application server and it doesn't take much time to send precalculated data to all your servers. Just have an agent that watches for changes, precalculates, and sends.

    Video Serving

  • Costs include bandwidth, hardware, and power consumption.
  • Each video hosted by a mini-cluster. Each video is served by more than one machine.
  • Using a a cluster means:
    - More disks serving content which means more speed.
    - Headroom. If a machine goes down others can take over.
    - There are online backups.
  • Servers use the lighttpd web server for video:
    - Apache had too much overhead.
    - Uses

    http://linux.die.net/man/4/epoll">epoll to wait on multiple fds.
    - Switched from single process to multiple process configuration to handle more connections.

  • Most popular content is moved to a

    CDN is a system of computers networked together across the Internet that cooperate transparently to deliver content (especially large media content) to end users. The first web content based CDN's were Sandpiper and Skycache followed by Akamai and Digital Island. The first video based CDN was iBEAM Broadcasting.

    CDN nodes are deployed in multiple locations, often over multiple backbones. These nodes cooperate with each other to satisfy requests for content by end users, transparently moving content behind the scenes to optimize the delivery process. Optimization can take the form of reducing bandwidth costs, improving end-user performance, or both.

    The number of nodes and servers making up a CDN varies, depending on the architecture, some reaching thousands of nodes with tens of thousands of servers.

    http://en.wikipedia.org/wiki/Content_Delivery_Network">CDN (content delivery network):
    - CDNs replicate content in multiple places. There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
    - CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory.

  • Less popular content (1-20 views per day) uses YouTube servers in various

    http://en.wikipedia.org/wiki/Colocation">colo sites.
    - There's a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
    - Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won't always be your performance savior.
    - Tune

    http://en.wikipedia.org/wiki/RAID
    ">RAID
    controller and pay attention to other lower level issues to help.
    - Tune memory on each machine so there's not too much and not too little.

    Serving Video Key Points

  • Keep it simple and cheap.
  • Keep a simple network path. Not too many devices between content and users. Routers, switches, and other appliances may not be able to keep up with so much load.
  • Use commodity hardware. More expensive hardware gets the more expensive everything else gets too (support contracts). You are also less likely find help on the net.
  • Use simple common tools. They use most tools build into Linux and layer on top of those.
  • Handle random seeks well (SATA, tweaks).

    Serving Thumbnails

  • Surprisingly difficult to do efficiently.
  • There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
  • Thumbnails are hosted on just a few machines.
  • Saw problems associated with serving a lot of small objects:
    - Lots of disk seeks and problems with inode caches and page caches at OS level.
    - Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
    - A high number of requests/sec as web pages can display 60 thumbnails on page.
    - Under such high loads Apache performed badly.
    - Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
    - Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
    - With so many images setting up a new machine took over 24 hours.
    - Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
  • To solve all their problems they started using Google's

    http://labs.google.com/papers/bigtable.html">BigTable, a distributed data store:
    - Avoids small file problem because it clumps files together.
    - Fast, fault tolerant. Assumes its working on a unreliable network.
    - Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.
    - For more information on BigTable take a look at Google Architecture, GoogleTalk Architecture, and BigTable.

    Databases

  • The Early Years
    - Use MySQL to store meta data like users, tags, and descriptions.
    - Served data off a monolithic RAID 10 Volume with 10 disks.
    - Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
    - They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
    - Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
    - Updates cause cache misses which goes to disk where slow I/O causes slow replication.
    - Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
    - One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
  • The later years:
    - Went to database partitioning.
    - Split into shards with users assigned to different shards.
    - Spreads writes and reads.
    - Much better cache locality which means less IO.
    - Resulted in a 30% hardware reduction.
    - Reduced replica lag to 0.
    - Can now scale database almost arbitrarily.

    http://www.possibility.com/epowiki/Wiki.jsp?page=DatacenterSystemChoiceAnalysis">Data Center Strategy

  • Used manage hosting providers at first. Living off credit cards so it was the only way.
  • Managed hosting can't scale with you. You can't control hardware or make favorable networking agreements.
  • So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
  • Use 5 or 6 data centers plus the CDN.
  • Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
  • Video bandwidth dependent, not really latency dependent. Can come from any colo.
  • For images latency matters, especially when you have 60 images on a page.
  • Images are replicated to different data centers using BigTable. Code
    looks at different metrics to know who is closest.

    Lessons Learned

  • Stall for time. Creative and risky tricks can help you cope in the short term while you work out longer term solutions.
  • Prioritize. Know what's essential to your service and prioritize your resources and efforts around those priorities.
  • Pick your battles. Don't be afraid to outsource some essential services. YouTube uses a CDN to distribute their most popular content. Creating their own network would have taken too long and cost too much. You may have similar opportunities in your system. Take a look at Software as a Service for more ideas.
  • Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening.
  • Some advantages are:
    * faster backup
    * faster recovery
    * data can fit into memory
    * data is easier to manage
    * provided more write bandwidth because you aren't writing to a single master. In a single master architecture write bandwidth is throttled.

    This technique is used by many large websites, including eBay, Yahoo, LiveJournal, and Flickr.">Shard. Sharding helps to isolate and constrain storage, CPU, memory, and IO. It's not just about getting more writes performance.

  • Constant iteration on bottlenecks:
    - Software: DB, caching
    - OS: disk I/O
    - Hardware: memory, RAID
  • You succeed as a team. Have a good cross discipline team that understands the whole system and what's underneath the system. People who can set up printers, machines, install networks, and so on. With a good team all things are possible.