About half a year ago, I participated in a project of a foreign coupon aggregation platform. The project background is briefly introduced: the website aggregates the latest coupons of many brands, and users can view and search for the latest coupons of different brands. The coupons are divided into There are two categories: users who click to display a discount code or click to jump to the brand discount page, and then use Google ’s corresponding brand discount keywords to attract traffic. The coupons for the website come from two types of channels. One is the Aff network alliance platform. Our website is connected to three platforms, including shareasale, linkshare, and CJ. The other is social platforms, mainly Twitter platforms. The first type of platform automatically pulls the data through the interface to the warehouse and puts it on our website. Manually, you can apply for coupon offers from various brands on these online alliance platforms. For the coupon offer of the online alliance platform, the user clicks the coupon to jump to the brand’s official website to purchase this product. We have a commission commission. The commission rate for different products is different. The second category, such as the Twitter platform, has no commission, but the amount is particularly large. It can expand the number of our coupons and increase the number of pages on our website and Google’s index pages.
At first our website just the three WangMeng docking platform, as the site of the first version online, there are about thirty thousand valid coupon number, brand for three thousand pages, web site design is our brand page for polymerization, also is the first edition website is only about three thousand pages, in order to improve the collection of Google page, we began to do SEO optimization, Basic SEO optimization includes website Html optimization, Sitemap, Title, Keywords, Description, Meta, brand index page, long tail word construction, chain construction, url optimization, site search, etc. Outside the chain construction in a short time to do more than ten thousand, long tail word also did 7,000, so the page suddenly increased to about twelve, long tail word page also do SEO optimization, including similar long tail word recommendation, similar brand recommendation. At that time, the log monitoring observed that the daily frequency of Google crawlers had increased to two or three thousand, some brand words had been ranked, and everything was going well. There is a hidden trouble is what we did not anticipate online, at that time, also for the project after problems have buried a ray, is used when we have just launched a new common domain name, but the team for some reason had a roughly 10 years history of the old domain name, the old domain name you have some outside the chain can bring some traffic, it should be done before some sites outside the chain of quantity, thought the old domain name can bring good weight, and so on online immediately switch to the old domain name after a week, now already has two versions, one is the old version of the new domain, One is the new version of the old domain name (with its own traffic). Because the project team has limited manpower, it mainly adds new functions to the website of the old domain name. The previous new domain name is not maintained and temporarily placed for a period of time by robots shielded crawlers.
Soon we found the problem, Google has been a problem when included page, first the cache has been 404, then our website loading speed is slow, in Google page speed speed ratings in about 60 points, so we spent a week time to speed up, add a CDN, optimizes the website rendering code logic, simplifying the front-end JS and CSS file size, optimize the picture size and so on, then the website speed has improved, loading can be done in 2 seconds, the server response time control within 200 ms, But there was always a problem with Google’s cached pages, and eventually the seos looked it up and said it was Google’s problem, and Google officials said it was Google’s internal server problem, as long as the index was normal, it was fine, and we didn’t deal with it. However, the problem that always bothers us is that Google’s included pages have been unstable, hundreds in the morning, and thousands in the afternoon, and the SEO traffic brought by Google also hovers around dozens a day.
Focus on improving the second phase of the project on Google page, when the SEO web similarity is too high, so it took us a week or two time is to study the contents of our website is too similar, so for the possible similar pages made some random differential treatment, let website refresh every time a certain part of the random, although the way of the crawler is not optimized, but can ensure that web page similarity become low, I wrote a python program to examine tens of thousands of pages of total station, we use first crawler crawl took two days to get to the page and then automatically render screenshots down, Then extracted from the HTML text, the text cosine similarity calculation, and finally randomly selected one thousand pages, comparing two similarity painting contrast figure, find the total similarity degree is not high, the degree of similarity and competing goods website and our gap also is not very big, because everyone from WangMeng platform to get the data, so the data should be about the same. Then Google cache finally normal, but that is the movement of the interface, it is very strange, because Google has two versions of the bot, one is a mobile terminal, a PC, our website will give different template with UA test HTML code, just Google eventually is to move the cached version of the mobile side cache display is not normal, the screen is the front of a component is to use a particular Google incompatible technology, finally to fix the problem, and a week time to observe, or abnormal.
Third period included abnormal or to solve the problem, then we put the question focus on the content, because the content is from WangMeng pull to come over to the interface of the data, so the differentiation and competing goods, about in order to improve the content of the website, we chose to climb in social networking site information to generate the coupon, because a lot of brands will pay station in a release their own brand of coupons, so after our investigation to determine the climb take Twitter tweets, I set up a Twitter distributed crawler system, timing per second to Twitter search coupons in the relevant keywords, and then climb take relevant tweets warehousing, Then, there is a python program in the background that regularly goes to the library every day to get the tweets from the previous day. After algorithm processing, the messages are generated in the form of coupons. Another script task will automatically repost the generated messages to our website after they have been removed. When we added Twitter coupons, we quickly doubled the number of coupons and reached 80,000 pages. Because before we analysis the long tail word page too much, so I stopped long tail word page, outside the chain also began declining, but at this point, included or not stable, at this time the bing included on page twenty thousand, but Google has wandered the 23000, is not very stable, sometimes in the afternoon is sixteen thousand, and become a two thousand in the morning. Compared to competing products, it’s really weird.
In the end we think the problem big probability on the domain name, because the screening test did not solve the problem, is the domain name has not been to test, because the domain name changed after a lot of work has been done for white, but we also want to go to test whether domain problems, so before the new domain name is enabled again, in the new domain name before a simplified version of the coupon system, is used to compare this site and the old domain name, observed after a period of time, before the new domain name system included also wrong, SEO are considered when we were in the publishing system before the two domain name has been associated, It might get detected by Google, and then it doesn’t work, and we’re ready to launch a domain to test, and we have a plan to build a traffic station to direct traffic to the coupon platform. But all of that didn’t happen, and the company eventually stopped developing new features because the project took too long, leaving the site to wait for inclusion.