# How many of these data analysis paper questions can you do?

Last time, we gave you some tips on how to interview a data analyst. This time, we are going to share with you some questions on the written test. Hopefully, you will be able to handle these typical questions with ease.1. Estimate the number of births this year without any public reference.For class questions, before answering them, we should break down what is involved in thinking and describe what you associate with them through summative thinking.First, we should know that this is a Fermi estimation problem, so we can use a two-tier model (population portrait X population transformation) : number of births = sigma number of women of childbearing age in each age group * ratio of births in each age group.As the main thread of the answer.Then, from number to number first: if the number of newborns in previous years is available, a time series model is established for prediction (PS: The mutation event of second-child release also needs to be taken into account here).Next, find the precursor indicators.For example, the number of newly active users of baby products X represents the household users of newborns.Xn/ newborn n is the conversion rate of newborn home users in that year. For example, X2007/ newborn 2007 is the conversion rate of newborn home users in 2007.Of course, the conversion rate will evolve with the development of the platform, so we can provide a rough conversion rate for this year based on the number of previous years.Finally, we can derive this year’s estimated number of births based on the approximate conversion rate for the year and the number of new births in households this year.2. What is PCA?Why centralization of PCA?What are its principal components?In statistics, principal component analysis (PCA) is a technique for simplifying data sets.It’s a linear transformation.This transformation transforms the data into a new coordinate system such that the first large variance of any data projection is at the first coordinate (called the first principal component), the second large variance is at the second coordinate (the second principal component), and so on.The principle of principal component analysis is to try to recombine the original variables into a group of new comprehensive variables unrelated to each other. At the same time, according to the actual needs, several fewer comprehensive variables can be taken out from them to reflect the information of the original variables as much as possible.This statistical method is called principal component analysis, or pca.It’s also a mathematical way of dealing with dimensionality reduction.Principal component analysis (PCA) is an attempt to recombine many original indicators with certain correlation (such as P indicators) into a group of new, unrelated and comprehensive indicators to replace the original indicators.Usually, the mathematical processing is to make a linear combination of the original P indicators as a new comprehensive indicator.The most classic way is to express it by the variance of F1 (the first linear combination selected, namely the first comprehensive index), that is, the larger Va (rF1) is, the more information F1 contains.Therefore, F1 selected from all linear combinations should have the largest variance, so F1 is called the first principal component.If the first principal component is not enough to represent information of original P indicators, then consider to choose F2 is to choose the second linear combination, in order to effectively reflect the original information, F1’s existing information is not need to appear again in the F2, is expressed in mathematical language to Cov (F1, F2) = 0, says F2 for the second principal component, and so on can be constructed out of the third, fourth,…, the PTH principal component.In addition, principal component analysis is often used to reduce the dimension of the data set while maintaining the characteristics of the data set with the greatest contribution to the difference.This is done by retaining the lower order principal components and ignoring the higher order principal components.Such lower-order components tend to retain the most important aspects of the data.However, this is not certain, depending on the specific application.3. How to calculate the revenue of the first advertisement?(No need to calculate, just give the idea of the answer.)First, we need to know: Revenue = bid x traffic x CTR x effective conversion rate.So, obviously, the number of ads, to some extent, will drive up traffic, but also cause a decrease in matching, which affects the click-through rate.From this point of view, the greatest benefit is to find the maximum value of the product, and also to consider the optimization problem under constraints.In addition, we can also refer to the price discrimination scheme, to different users, different amounts of advertising.4. How do we analyze the drop in retention the next day?First of all, we should use “two layer model” analysis method to answer.By subdividing users into old and new, channels, events, portraits and other dimensions, the next day retention rates of different users in each dimension were calculated.Using the data, identify which group is causing the decline in retention.Of course, declining retention rates need to be analyzed on a case-by-case basis.You u can also analyze the problem in terms of internal and external factors.1) Internal factors: including customer acquisition (channel quality is low, activities to acquire non-target users), demand satisfaction (new function changes caused by certain types of users dissatisfaction), activation means (check-in and other activation means did not achieve the goal, the natural use cycle of the product is low, resulting in a large number of users acquired last time do not need to use in a short period of time, etc.);2) external factors: including the macroeconomic environment (PEST analysis method can be used), politics (policy), economic (if short-term NaZhu competition environment, such as the competitor’s activities), society (public pressure, user lifestyle changes, changes in consumer psychology, values such as preference changes), technology (the emergence of innovative solutions, changes and distribution channels, etc.).5. What can be done to improve profits when selling soybeans?At what level will the price increase benefit the most?We all know that revenue is unit price times sales.Therefore, to answer this question, we should start from the two directions of increasing the unit premium, or increasing the sales scale.>> Increase unit premium: 1) If we have sufficient marketing funds in the early stage, we can gain long-term premium ability through brand building;2) If the initial capital is insufficient, we can also increase the premium by increasing the added value of the product.For example, to increase the commodity processing link, soybeans into soy milk, soy milk, dried beans, soy protein powder, etc.Or reposition the commodity, turn the commodity into a gift, create the image of organic soybean and other products.>> Increase sales volume: We all know that sales = traffic x conversion rate.If we increase the revenue of the product by increasing the unit premium above, it will have an impact on the conversion rate as well as the traffic.Therefore, in the sales process, we can also adopt different pricing methods according to customers’ sensitivity to products through price discrimination to increase the sales scale and achieve the purpose of increasing profits.Of course, in order to promote sales, we should also control the timing of price discrimination, the number of launches, and the pricing strategy.For example, soybean prices are different in different time periods and business circles, so high pricing is adopted and coupons are provided to price-sensitive users to give full play to the advantage of price discrimination, so as to maximize profits.You may not come across these topics in the interview, but you will certainly have the possibility of applying the same ideas to analyze other questions.Therefore, I hope you can prepare well in advance, and win the written part firmly, paving the way for the subsequent interview.Finally, if you are in an interview and encounter some “weird” data analysis questions, feel free to leave a comment.We will also combine your doubts and provide you with the corresponding analysis of answer ideas.So, please don’t skimp on your comments