This is the 9th day of my participation in the Gengwen Challenge. For details of the event, please check: Gengwen Challenge
DataOps, as the name suggests, is derived from the concept of DevOps, providing fully automatic and integrated data collection and analysis functions in one basket. A long time ago, the company had the intention to buy the ControlHub version, so I contacted the company, but unfortunately, the person in charge replied to me by email and told me that there are currently no sales channels in China. And now the Beta version of Online has arrived~~~, follow me to see the advantages of this big platform?
1. The upcoming 4.0 version
I saw the help of StreamSets 4.0 online a long time ago, but the download version does not have it, which makes people very curious, what big trick is StreamSets holding back?
Yes, through experience, I have discovered the secret, this version will open up its own cloud-native joints, providing the following powerful features:
- job management
- Scheduling Job Management
- load, dynamic scaling
- function fragment support
- Cloud platform
- Distributed computing power
- Good monitoring and user management
2. Experience entrance
StreamSets has launched the third-quarter experience event, which is a rare opportunity. Friends who want to try it out may wish to try it.
2.1 Registration
Register the entrance . A ladder is required to log in. After entering, follow the guide and build it within 5 minutes.
2.2 Build the deployment script
2.3 Copy the deployment script
Shaped like:
curl -s https://dev.hub.streamsets.com/streamsets-engine-install.sh | bash -s -- --deployment-id="1b72d612-b533-48f0-966b-927b488231a7:cd534f44-cf0f-11eb-a0cd-b3e334979695" --deployment-token="eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.eyJzIjoiMTBjNGFmMTdlNWIwYzUwOGM4MGZhZmY3MjI4NjAzZDZmZDIwNGY4MmMwYzliYWY2MjQ5MDZmZjdiZWM0NmMyNWI1YjA4N2Q0MGM1Mjc3Y2E4YmQ0NGQ2MThmNTI3MDI1ZGE3ZTFlMGI0NTg2OTZkNzU2M2U3MGJiZjQ5NGE0MzIiLCJ2IjoxLCJpc3MiOiJkZXYiLCJqdGkiOiI5YmFiMDk1MS1mM2JhLTQxYTYtYjk0NC00ZTE4NzVlZDEwZTciLCJvIjoiY2Q1MzRmNDQtY2YwZi0xMWViLWEwY2QtYjNlMzM0OTc5Njk1In0." --sch-url="https://dev.hub.streamsets.com"
复制代码
If you copy my script, it will add a computing power engine for me. You can contact me and open an account for you to experience. Of course, you copied the script you generated yourself, so you can experience it directly.
2.4 Increase the computing power engine
First we need a cloud host~~~ Then install the java sdk, and then execute the above script.
# 1.安装javasdk
yum -y install java-1.8.0-openjdk*
# 2. 复制你的部署脚本
复制代码
Note that the computing power platform requires 1G+ memory, so ensure that your memory is sufficient. Press Y all the way. StreamSet 4.0 has been deployed and connected to your cloud platform.
2.5 Check the computing engine
Click on Setup - Engines on the control hub platform, and you should be able to open a computer that has added computing power.
3 Experience Pipeline
Click on the build pipeline: open one and you can see the following picture, the icons of each component have a new look, and the color matching is very comfortable.
3.1 Let's build an acquisition pipeline
Drag and drop components and place them, and a pipeline is built in minutes.
3.2 Version management
The cloud platform provides the Check In function, and the version problem has been solved very well.
3.3 Running the preview
Click the little eye icon. The data preview is as follows:
4 Experience Fragments (Functions)
The previous SDC platform could not create functions, which made us unable to reuse code. How about this snippet?
4.1 Create a new segment
We build a simple http request fragment, as follows, just fine. Yes, fragments don't need source and target, source and target are the parameters and return value of the function.
4.2 Debugging the next fragment
Because there is no source, debugging requires selecting the test source.
4.3 Version Management
Regarding fragments, it also has version management.
4.4 Quoting Fragments
To create a new pipeline, we refer to the fragment function we just created. so hi!
5 job
The newly added Job is an upgraded version of the previous simple run pipeline. Monitoring information is complete.
5.1 Create a job
5.2 Create a scheduling job
With scheduling jobs, are you still worried about not being able to start the pipeline regularly?
6 Data and computing power monitoring
7 User management
Say goodbye to simple user management, here are commonly used users, groups, auditing, api authentication keys, etc.
8 Summary
Are you guys stunned?
A powerful integrated platform is what we think in our minds!
During operation, no ladders are required, and the operation is super smooth. It is currently in the Beta period, and may be charged later, I hope it is not too expensive.
If you like it, just click to follow and favorite! Your click is my driving force!