image_1.pngCopyright©2024 Yonyou Group All Rights Reserved.Without the written permission of Yonyou Group, no part of this user manual may be copied, reproduced, translated, or reduced for any purpose. The content of this user manual may change without notice, please stay informed.Please note: The content of this user manual does not represent a commitment made by Yonyou Network.OverviewApplication OverviewThe Data Factory is a cloud-native big data processing platform based on an innovative middle-platform architecture. The product integrates functions such as data mobility, data modeling, indicator management, data development, and scheduling management. This product consolidates multiple underlying product capabilities and provides various development tools for IT developers or business analysts based on the needs of data warehouse construction and the actual usage of business data, achieving a one-stop service function that covers the entire process from data collection, data fusion, processing, to the final output of result datasets.Application ValueData movement is diverse, breaking down data silos within enterprises. It supports four types of data movement: batch synchronization, streaming synchronization, file synchronization, and custom synchronization, making it easy to build data integration tasks for different data sources and target datasets. It supports the creation of various data sources such as MySQL, Oracle, SQLServer, Greenplum, Hive3, DM (DaMeng), PostgreSQL, DB2, and more. In cases where existing data sources do not meet requirements, users can create customized synchronization tasks through custom synchronization, enhancing the availability of data synchronization.Standardized data warehouse modeling enhances the performance of data analysis and presentation. The product can build both relational models and multidimensional models, providing systematic and visual online modeling and development capabilities. By standardizing the modeling process of dimensions and facts, it reduces the occurrence of multiple JOIN operations during data analysis, lowers the complexity of SQL statements, and thereby alleviates the pressure on the database engine during data analysis and presentation.Built-in with various algorithm models, supporting intelligent reconstruction of the entire business processes in marketing, finance, supply chain, procurement, and human resources. The product supports SQL data development and Python data development, allowing further processing of metric data processed using a data warehouse with SQL statements, and further processing using Python tools and built-in functions, providing data modeling capabilities. It supports mainstream data mining algorithms and machine learning algorithms, including over 50 algorithm models such as Kmeans clustering, neural networks, random forests, decision trees, and more.The technical threshold is low, which is beneficial for the implementation and use of the product. When designing the product, whether it is the guided creation of data movement, data warehouse modeling, and indicator management, or professional-level data development for data exploration, data querying, data cleaning, and data de-identification, it demonstrates flexibility and ease of use, simplifying complex theories into a simple drag-and-drop approach.Operation and maintenance are convenient, facilitating daily management for maintenance personnel. It supports the management of development tasks and logical table tasks, tracking task execution progress and status. Basic information, code, and corresponding generated instances can be viewed. Development tasks correspond to ordinary instances, while logical table tasks correspond to logical table instances. Instances are generated after the task scheduling begins to run. In addition to viewing basic information and code, running logs can also be accessed for troubleshooting when errors occur, allowing for quick resolution. Both tasks and instances can view their dependency relationships.Product safety is high, ensuring the protection of enterprise data assets. The product supports tenant isolation, facilitating collaboration among multiple departments while ensuring data security; it also supports project isolation, creating different physical spaces for different projects, which guarantees data security and ensures clear processes, making task management easier.Application ScenariosScenario DisplayScenario 1: Building Consumer Behavior Analysis Application Based on Multidimensional Modeling TheoryBusiness DescriptionA company wants to classify customers based on their value and provide targeted product services and marketing models; there is an existing sales order table that needs to analyze important value customers and customer transaction volume based on relevant historical data. This way, relevant departments can take immediate action based on the analysis report to ensure the company's subsequent profitability.Business Processimage_2.pngApplication ListData Source ManagementTask ManagementBusiness SegmentBusiness DomainBusiness ProcessTime LimitProject ListDimension ManagementFact Table ManagementBusiness LimitationsIndicator ListOffline Development - Python Script MethodScenario 2: Building a Supply Chain Aging Analysis Application Based on Relational Modeling TheoryBusiness DescriptionInventory age analysis refers to inventory management strictly following the first-in, first-out principle, conducting age statistics based on the principle of shipping out older goods first to prevent products or materials from expiring. This can significantly improve inventory turnover rates, promote sales, and reduce the risk of inventory capital being tied up. In this scenario: first, use the data movement function to synchronize data from the source business system to the data lake, unifying the management of data resources; then, utilize relational modeling to create logical models for each layer of the data warehouse, and subsequently materialize them into the data lake; finally, use the offline development ETL function to process the raw data step by step into the target data mart layer tables, resulting in the data for inventory age analysis.Business Processimage_3.pngOperation GuideOverviewOverviewThe overview is a dashboard page that displays various statistical indicators of the current tenant, helping senior leaders or managers quickly view the tenant's information and understand the overall product situation.Overall ValueQuick and convenient access to view various metrics of tenants;Various chart forms allow viewing from multiple dimensions;Help users understand product capabilities.Related ContentBusiness OverviewOverview of Dimensional ModelingBusiness OverviewThe overview is a dashboard-like page that displays various statistical indicators of the current tenant, helping senior leaders or managers quickly view the tenant's information and understand the overall product situation.Senior personnel review the indicators of each tenant.Column NameDescriptionNew Resource Volume TodayRefers to the amount of new data added each natural day, which may be negative; a negative value indicates that the user has removed a data source from the data factory.Total Number of Data SourcesRefers to the total number of data sources added within the current tenant.Types of Data SourcesRefers to the total number of types of data sources supported by the data factory.Total Number of ModelsRefers to the total number of dimension tables, fact tables, and summary tables in a published state within the current tenant.Total Number of MetricsRefers to the total number of atomic metrics and composite metrics in a published state within the current tenant.Total Number of TasksRefers to the total number of logical table tasks and development tasks that currently exist within the tenant.Total Number of ProjectsRefers to the total number of projects created within the current tenant.Data Flow DirectionPoint: Data source name; Line: Represents data exchange between two connected points.This scenario is designed to meet the user's needs for viewing data change statistics, data source statistics, model statistics, project statistics, and data flow statistics, among other dimensions.Enter Overview > Business Overview feature to view the statistical information available within the current tenant.image_4.pngOverview of Dimensional ModelingThe overview of dimensional modeling helps management or operations teams quickly understand the statistical indicators related to the models generated by dimensional modeling within the current tenant and the metrics built based on these models from a top-level perspective. It also provides a graphical process guide, allowing users to click and navigate to the corresponding nodes for business processing.Business Segments: The total number of business segments within the current tenant.Data Domain: The total number of data domains within the current tenant.Business Limitation: The total business limitation quantity within the current tenant.Time Limit: The total time-limited quantity within the current tenant;Business Processes: Total number of business processes within the current tenant;Dimensions & Dimension Logic Tables: The total number of dimensions and dimension logic tables within the current tenant.Fact Logic Table: Total number of fact logic tables within the current tenant;Summary Table: Total number of summary tables within the current tenant.Atomic Indicators: The total number of atomic indicators within the current tenant.Derived Indicators: Total number of derived indicators within the current tenant.Derived Indicators: Total number of derived indicators within the current tenant;Composite Indicators: The total number of composite indicators within the current tenant.Open Overview > Dimension Modeling Overview feature to view the relevant metrics data of the current tenant in a visual format.Click on the modeling architecture in the left list to view all dimensions and fact tables by business segment > data domain > business process.You can switch tabs to view the indicator system according to business segment > data domain > business process.The main workspace allows you to view the dimensional modeling process guidelines and navigate to the corresponding nodes for business operations by clicking.image_5.pngData MigrationOverviewData movement is a tool for users to perform data synchronization. In data movement, batch data synchronization, streaming data synchronization, and file data synchronization can be carried out. Batch data synchronization can be implemented in either full or incremental ways, and new fields and transformation cleaning rules can be set during the synchronization process. Streaming data synchronization can achieve real-time data synchronization based on CDC (Change Data Capture). File data synchronization can synchronize common semi-structured and structured files, syncing local offline files to the database, such as Excel and CSV format data synchronization.Key Characteristics of Data Migration:Supports various synchronization methods such as full, incremental, and real-time, to meet the synchronization needs of different scenarios;Support for data synchronization scenarios at different granularities, such as libraries, tables, etc.;Supports flexible scheduling strategies;Support data cleaning and transformation capabilities.Overall ValueData Migration - Batch Synchronization, supports periodic full or incremental data synchronization.Data Migration - Stream Synchronization captures data changes based on the database CDC (Change Data Capture) mechanism for data synchronization. It supports real-time ...