ETL架构师面试题(转) 
By  Jerome 发表于 2006-8-5 21:20:00 

Analysis

1. What is a logical data mapping and what does it mean to the ETL team?

2. What are the primary goals of the data discovery phase of the data warehouse project?

3. How is the system-of-record determined?

Architecture

4. What are the four basic Data Flow steps of an ETL process?

5. What are the permissible data structures for the data staging area? Briefly describe the pros and cons of each.

6. When should data be set to disk for safekeeping during the ETL?

Extract

7. Describe techniques for extracting from heterogeneous data sources.

8. What is the best approach for handling ERP source data?

9. Explain the pros and cons of communicating with databases natively versus ODBC.

10. Describe three change data capture (CDC) practices and the pros and cons of each.

Data Quality

11. What are the four broad categories of data quality checks? Provide an implementation technique for each.

12. At which stage of the ETL should data be profiled?

13. What are the essential deliverables of the data quality portion of ETL?

14. How can data quality be quantified in the data warehouse?

Building mappings

15. What are surrogate keys? Explain how the surrogate key pipeline works.

16. Why do dates require special treatment during the ETL process?

17. Explain the three basic delivery steps for conformed dimensions.

18. Name the three fundamental fact grains and describe an ETL approach for each.

19. How are bridge tables delivered to classify groups of dimension records associated to a single fact?

20. How does late arriving data affect dimensions and facts? Share techniques for handling each.

Metadata

21. Describe the different types of ETL metadata and provide examples of each.

22. Share acceptable mechanisms for capturing operational metadata.

23. Offer techniques for sharing business and technical metadata.

Optimization/Operations

24. State the primary types of tables found in a data warehouse and the order which they must be loaded to enforce referential integrity.

25. What are the characteristics of the four levels of the ETL support model?

26. What steps do you take to determine the bottleneck of a slow running ETL process?

27. Describe how to estimate the load time of a large ETL job.

Real Time ETL

28. Describe the architecture options for implementing real-time ETL.

29. Explain the different real-time approaches and how they can be applied in different business scenarios.

30. Outline some challenges faced by real-time ETL and describe how to overcome them.

转载自KimballETL Toolkit著作。

 
阅读全文 | 回复(0) | 引用通告 | 编辑
  • 标签:ETL 面试题 Kimball 
  • 发表评论:

      大名:
      密码:
      主页:
      标题:

     
    Jerome's BI BLOG
    本站导读
          由于本站采用的是日志的模式,在阅读时需要不断翻页或搜索,给大家带来较大麻烦,特整理了本站的目录如下,请点击这里浏览本站目录
    站点公告
    站点日历
    最新日志
    最新评论
    最新留言
    友情链接
    站点统计
    日志搜索
    用户登陆



     
    Powered by Oblog.