YonyouYonBIP V3.0R6_2407_1 FlagshipPrivateCloudUserManual-CloudPlatform-DataPlatform-DataDevelopment.docx

下载本文档

阅读 389
下载 15
格式 docx
大小 19.41 MB
约147页
2026-04-18
收藏
评论
点赞(0)
海报
举报

YonyouYonBIP V3.0R6_2407_1 FlagshipPrivateCloudUserManual-CloudPlatform-DataPlatform-DataDevelopment.docx_第1页

1/147页

YonyouYonBIP V3.0R6_2407_1 FlagshipPrivateCloudUserManual-CloudPlatform-DataPlatform-DataDevelopment.docx_第2页

2/147页

YonyouYonBIP V3.0R6_2407_1 FlagshipPrivateCloudUserManual-CloudPlatform-DataPlatform-DataDevelopment.docx_第3页

3/147页

在线预览已结束，请下载后查看完整版，加入VIP享文档下载特权

/147

文本预览下载提示常见问题

image_1.pngCopyright©2024 Yonyou Group All Rights Reserved.Without the written permission of Yonyou Group, no part of this user manual may be copied, reproduced, translated, or reduced for any purpose. The content of this user manual may change without notice, please stay informed.Please note: The content of this user manual does not represent a commitment made by Yonyou Network.OverviewApplication OverviewData development is a cloud-native big data processing platform based on an innovative middle-platform architecture. The product integrates functions such as data synchronization, offline development, real-time development, and task scheduling. This product consolidates multiple underlying product capabilities and provides various data development tools aimed at IT developers or business analysts based on data governance, data warehouse construction, and actual business data usage needs, achieving a one-stop service function for the entire process from data collection and aggregation, data fusion and extraction, to outputting high-quality data sets.image_2.pngApplication ValueData synchronization takes various forms, breaking down data silos within enterprises. It supports five types of data synchronization: batch synchronization, streaming synchronization, file synchronization, custom synchronization, and table structure synchronization, constructing data synchronization tasks for heterogeneous data sources in a process-oriented manner. It supports data synchronization for various data sources, including MySQL, Oracle, SQLServer, Greenplum, Hive3, DM (DaMeng), PostgreSQL, DB2, and MongoDB. Data synchronization retains task extensibility; if the preset data sources do not meet project requirements, users can create customized synchronization tasks through custom synchronization, enhancing the availability of data synchronization.Multi-engine, visual development tools ensure the efficient and stable production of data. The data development integrates Spark and Flink engines, enabling task development in a workflow format through drag-and-drop. It supports multi-source ad-hoc queries and batch and real-time computation of massive data, improving data development efficiency and lowering the development threshold, ensuring the efficient and stable execution of data production tasks.Built-in with various algorithm models, it supports the intelligent reconstruction of the entire business process in marketing, finance, supply chain, procurement, human resources, and more. The product supports multiple development methods to achieve data mining functions, catering to developers at different levels for data processing to meet business needs. Python scripts and interactive development support data mining functions in a coding manner, aimed at professionals such as algorithm engineers; self-service ETL presets mainstream data mining algorithms and machine learning algorithms, including over 50 algorithm models such as Kmeans clustering, neural networks, random forests, decision trees, etc., allowing data value mining through drag-and-drop, lowering the usage threshold, and empowering business personnel.The technical threshold is low, which is conducive to the implementation and use of the product. During the design of the product, whether it is the guided creation direction for data synchronization or professional-level SQL tasks, Python tasks, and other data development, multiple operations such as data exploration, data querying, data cleaning, and data de-identification reflect flexibility and ease of use, simplifying complex theories into a simple drag-and-drop method.Operation and maintenance are convenient, facilitating daily management for maintenance personnel. Task scheduling allows for centralized management of online tasks, enabling capabilities such as task execution, scheduling settings management, and task instance management. Task details provide access to basic information, code, and the generated task instances. The task instance list allows tracking of task running progress, status, and historical execution records. In addition to viewing basic information and code, task instances also allow access to running logs, making it easier to identify issues and resolve them quickly in case of errors. Both tasks and instances can view their dependency relationships.The product has high safety, ensuring the protection of enterprise data assets. It supports tenant isolation, facilitating collaboration among multiple departments while ensuring data security; it also supports project isolation, creating different physical spaces for different projects, which ensures data security and maintains clear processes, making task management easier.Application ScenariosScenario DisplayScenario 1: Building Consumer Behavior Analysis Application Based on Multidimensional Modeling TheoryBusiness DescriptionA company wants to classify customers based on their value and provide targeted product services and marketing models. There is an existing sales order table that needs to analyze important value customers and customer transaction volume based on relevant historical data. This will allow related departments to take immediate action based on the analysis report to ensure the company's subsequent profitability.Business Processimage_3.pngApplication ListData Source ManagementTask ManagementBusiness SegmentBusiness DomainBusiness ProcessTime LimitProject ListDimension ManagementFact Table ManagementBusiness LimitationsIndicator ListOffline Development - Python Script MethodScenario 2: Building a Supply Chain Aging Analysis Application Based on Relational Modeling TheoryBusiness DescriptionInventory age analysis refers to inventory management strictly following the first-in, first-out principle, conducting age statistics based on the principle of shipping out older goods first to prevent products or materials from expiring. This can significantly improve inventory turnover rates, promote sales, and reduce the risk of inventory capital being tied up. In this scenario: first, use the data synchronization function to sync data from the source business system to the data lake, unifying the management of data resources; then, utilize the relational modeling function to create logical models for each layer of the data warehouse, and subsequently materialize them into the data lake; finally, use the self-service ETL function of data development to process the raw data step by step into the target data table of the data mart, resulting in the data for inventory age analysis.Business Processimage_4.pngOperation GuideData SynchronizationOverviewData synchronization is a tool for users to perform data synchronization, which can include batch data synchronization, streaming data synchronization, and file data synchronization. Batch data synchronization can be achieved in either full or incremental ways, and during the synchronization process, new fields can be set, and transformation and cleansing rules can be configured; streaming data synchronization can achieve real-time data synchronization based on CDC (Change Data Capture); file data synchronization can synchronize common semi-structured and structured files, syncing local offline files to the database, such as Excel and CSV format data synchronization.Key Features of Data Synchronization:Supports various synchronization methods such as full, incremental, and real-time, to meet the synchronization needs of different scenarios;Support for data synchronization scenarios at different granularities, such as libraries, tables, etc.;Supports flexible scheduling strategies;Support data cleaning and transformation capabilities.Overall ValueData Synchronization - Batch Synchronization, supports periodic full or incremental data synchronization.Data Synchronization - Stream Synchronization captures data changes based on the database CDC (Change Data Capture) mechanism for data synchronization. It supports real-time data synchronization for operations such as insert, update, and delete.Setting up data comparison - When performing segmented comparison tasks, it is necessary for the table to have a primary key and a time field. Data can be sliced based on the primary key and time field to achieve quick and accurate comparisons.Synchronization TasksUsers can manage synchronization tasks under the same tenant through the task management feature, including creating tasks, viewing the number, status, details of synchronization tasks, etc.When creating batch synchronization - incremental synchronization, it is necessary to ensure that the table has unique and comparable fields; otherwise, it will not be possible to accurately obtain incremental data, resulting in the failure of incremental data synchronization.Setting up data comparison - When performing segmented comparison tasks, it is necessary for the table to have a primary key and a time field. Data can be sliced based on the primary key and time field to achieve quick and accurate comparisons.When creating file synchronization, please upload a local file that meets the format requirements as per the page instructions.When synchronizing incremental data, fields such as ts/pubts are used as default incremental fields.Column NameDescriptionSync TypeThe sync type refers to the synchronization strategy for batch sync tasks, including full, incremental, and table structure sync. Full sync means the task will periodically sync all data from the source table to the target table according to the schedule. Incremental sync means the task will periodically sync incremental data from specified incremental fields in the source table to the target table. The difference is that full sync tasks do not require a primary key in the source table at creation, while incremental sync requires a primary key or incremental field in the source table; otherwise, it cannot determine the basis for new and old data.ConcurrencyConcurrency refers to the number of threads that read/write data simultaneously for this task.Batch SizeBatch size refers to the maximum number of records that this task can read/write at one time.Partition TypePartition type indicates whether the task will perform partitioned storage during writing for subsequent data queries. If partitioning is selected, the values for the partition fields in the target table must come from fields in the source table. Only certain data source types support partitioning operations.Read Start PointThe read start point indicates the starting position for data retrieval in this task, including the running task as the starting point and custom. The running task as the starting point means the data retrieval starts from the time the task was created. Custom requires entering the corresponding binlog file name and binlog position value.Historical Data ReadingWhen historical data reading is enabled, historical data from the source table can be synced to the target table.Alarm PolicyThe alarm policy refers to the task notification strategy set according to certain rules. When the task encounters information that meets the alarm policy, relevant personnel can be notified.: Set filtering conditions on the source table to screen the data content that needs to be synchronized.: When creating a target table for a sync task, it supports synchronizing the source index to the target table. Enabling this will improve the query efficiency of the target table but will reduce the data synchronization efficiency.Synchronization Task ManagementTask management supports users in managing batch synchronization, streaming synchronization, file synchronization, custom synchronization, table structure synchronization, and other tasks in a directory...

1、当您付费下载文档后，您只拥有了使用权限，并不意味着购买了版权，文档只能用于自身使用，不得用于其他商业用途（如 [转卖]进行直接盈利或[编辑后售卖]进行间接盈利）。
2、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。
3、如文档内容存在违规，或者侵犯商业秘密、侵犯著作权等，请点击“违规举报”。

碎片内容