INGEST
Self-service data ingest with data cleansing, validation, and automatic profiling.
Organizations can expend significant engineering effort moving data into Hadoop yet struggle to maintain governance and data quality. Nucleus dramatically simplifies data ingest by shifting ingest to data owners through a simple guided UI.
Nucleus can connect to most sources and infer schema from common data formats. Nucleus’s default ingest workflow moves data from source to Hive tables with advanced configuration options around field-level validation, data protection, data profiling, security, and overall governance.
Using Nucleus’s pipeline template mechanism, IT can extend Nucleus’s capabilities to connect to any source, any format, and load data into any target in a batch or streaming pattern.
DESIGN
Design batch or streaming pipeline templates in Flow and register with Nucleus to enable user self-service. IT Designers can extend Nucleus’s feed capabilities around ingest, transformation, and export by developing new pipeline templates in Flow. Flow provides a visual canvas with over 180 data connectors and transforms for batch and stream-based processing. Nucleus and Flow together act as an “intelligent edge” able to orchestrate tasks between your cluster and data center.
Designers develop and test new pipelines in Flow and register templates with Nucleus determining what properties users are allowed to configure when creating feeds. This embodies the principle of write-once-use-many and enables data owners instead of engineers to create new feeds while IT retains control over the underlying dataflow patterns.
Nucleus adds a suite of Flow processors for Spark, Sqoop, Hive, and special purpose data lake primitives that provide additional capabilities.