alibaba / alink Goto Github PK
View Code? Open in Web Editor NEWAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
License: Apache License 2.0
Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
License: Apache License 2.0
In the lda online corpus update step, the last sample may be less possible to be sampled.
There is Chinese comment in FeatureLabelUtil.
When little data, some workers do not have data, pca train will throw null pointer exception.
想请问下,这个https://github.com/alibaba/Alink/blob/master/pyalink/ftrl_demo.ipynb 的例子是被跑通了的吗?
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-3-af625ccf78c2> in <module>
13 feature_pipeline.fit(trainBatchData).save(FEATURE_PIPELINE_MODEL_FILE);
14
---> 15 BatchOperator.execute();
16
17 # load pipeline model
~/anaconda3/lib/python3.7/site-packages/pyalink-1.0.1_flink_1.9.0_scala_2.11-py3.7.egg/pyalink/alink/batch/base.pyc in execute(cls)
~/anaconda3/lib/python3.7/site-packages/py4j-0.10.8.1-py3.7.egg/py4j/java_gateway.py in __call__(self, *args)
1284 answer = self.gateway_client.send_command(command)
1285 return_value = get_return_value(
-> 1286 answer, self.gateway_client, self.target_id, self.name)
1287
1288 for temp_arg in temp_args:
~/anaconda3/lib/python3.7/site-packages/py4j-0.10.8.1-py3.7.egg/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
Py4JJavaError: An error occurred while calling z:com.alibaba.alink.operator.batch.BatchOperator.execute.
: org.apache.flink.runtime.client.JobExecutionException: Could not retrieve JobResult.
at org.apache.flink.runtime.minicluster.MiniCluster.executeJobBlocking(MiniCluster.java:623)
at org.apache.flink.client.LocalExecutor.executePlan(LocalExecutor.java:222)
at org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:91)
at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:820)
at com.alibaba.alink.operator.batch.BatchOperator.execute(BatchOperator.java:248)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$2(Dispatcher.java:333)
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
... 6 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152)
at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83)
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375)
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
... 7 more
Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: java.net.SocketTimeoutException: connect timed out
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:270)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:907)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:230)
at org.apache.flink.runtime.executiongraph.ExecutionGraphBuilder.buildGraph(ExecutionGraphBuilder.java:106)
at org.apache.flink.runtime.scheduler.LegacyScheduler.createExecutionGraph(LegacyScheduler.java:207)
at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:184)
at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176)
at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70)
at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275)
at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265)
at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)
at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146)
... 10 more
Caused by: java.lang.RuntimeException: java.net.SocketTimeoutException: connect timed out
at com.alibaba.alink.operator.common.io.reader.HttpFileSplitReader.getFileLength(HttpFileSplitReader.java:79)
at com.alibaba.alink.operator.common.io.csv.GenericCsvInputFormat.createInputSplits(GenericCsvInputFormat.java:401)
at com.alibaba.alink.operator.common.io.csv.GenericCsvInputFormat.createInputSplits(GenericCsvInputFormat.java:39)
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:256)
... 22 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at com.alibaba.alink.operator.common.io.reader.HttpFileSplitReader.getFileLength(HttpFileSplitReader.java:61)
... 25 more
The last test (testReadEasModel) at CsvSourceSinkTest.java fails. It's likely that the path cannot be reached in the US.
String path = "http://test-region.oss-cn-hangzhou-zmf.aliyuncs.com/alink_model_export/Alink-Expr-pre-6339/226607/model_data.csv"
For now Alink would create env by itself, which make it difficult to integrate with other tools/framework. It would be nice to allow alink to use the env created by others.
RT,so we can use Alink to process streaming data in Flink program?
val env = ExecutionEnvironment.createLocalEnvironment()
val tEnv=BatchTableEnvironment.create(env)
val mle = new MLEnvironment(env, tEnv)
MLEnvironmentFactory.setDefault(mle)
加了本地环境 运行提示
Error:(18, 42) Static methods in interface require -target:jvm-1.8
val tEnv=BatchTableEnvironment.create(env)
How this algorithms can transfer to pmml models??
In integration with Alink, I didn't find how to save alink trainer trained model in documents of Alink. In my application, I use PipelineModel
to save trained model with Alink trainer as follows, but this has no document to describe the way. And I don't know how to save online learning model.
Pipeline pipeline = new Pipeline().add(vectorAssembler).add(kMeans);
PipelineModel pipelineModel = pipeline.fit(data);
pipelineModel.save("/tmp/feature_pipe_model.csv");
浏览了example和文档之后,发现在线学习是应用在浅层模型。不支持深度学习难免挂万漏一。
官方文档里只有pyAlink,请问可以试用其他语言吗?比如Java,Scala
请教一个问题,Alink 是基于datastream api 还是dataSet api?
Method getFieldTypes
of Flink class TableSchema
is annotated with Deprecated
, which JIRA is FLINK-12254 Expose the new type system through the API. It should use method getFieldDataTypes
instead of getFieldTypes
to use new type.
在线学习训练支持FTRL ,以后会支持其它的算法吗,问个小白的问题,为什么在线学习只支持线性的,可以解决吗
TypeError Traceback (most recent call last)
in
----> 1 useLocalEnv(2, flinkHome=None, config=None)
D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\env.pyc in useLocalEnv(parallelism, flinkHome, config)
D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\env.pyc in make_configuration(config)
D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in get_java_gateway()
D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in inst(cls)
D:\python.venv\lib\site-packages\pyalink-1.0_flink_1.9.0_scala_2.11-py3.7.egg\pyalink\alink\py4j_util.pyc in init(self)
TypeError: 'JavaPackage' object is not callable
In HasVectorSize
class, ParamInfo of VECTOR_SIZE
use vectorSize
as name, use vectorSize
and inputDim
as alias, which cause Exception when use flink-ml-api
dependency because of the repeated alias of vectorSize
. vectorSize
should be removed from alias of VECTOR_SIZE
.
Is there a docker image or dockerfile to run alink?
This is the first step to support multi-version data source connector.
打包官网example上传到flink上跑ALSExample出现以下错误
2019-12-27 14:56:29
java.lang.Exception: The user defined 'open()' method caused an exception: null
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:499)
at org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:157)
at org.apache.flink.runtime.iterative.task.IterationIntermediateTask.run(IterationIntermediateTask.java:107)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:369)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NegativeArraySizeException
at com.alibaba.alink.operator.common.recommendation.AlsTrain$9.open(AlsTrain.java:308)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.api.java.operators.translation.WrappingFunction.open(WrappingFunction.java:45)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:495)
... 6 more
我用的是集群安装方式,为何maven打包时候一直报bug,
[ERROR] Failed to execute goal on project alink_core: Could not resolve dependencies for project com.alibaba.alink:alink_core:jar:0.1-SNAPSHOT: Could not transfer artifact net.sf.json-lib:json-lib:jar:2.2.3 from/to alimaven (http://maven.aliyun.com/nexus/content/groups/public): Transfer failed for http://maven.aliyun.com/nexus/content/groups/public/net/sf/json-lib/json-lib/2.2.3/json-lib-2.2.3.jar: Unknown host maven.aliyun.com: Temporary failure in name resolution -> [Help 1]
找不到这个依赖包,我把依赖包导入pom文件还是报这个错 。那位大佬知道怎么做,告诉我一下,挺急的
This is caused by integer overflow.
试图与flink交互时,表环境不一致错误
Exception in thread "main" org.apache.flink.table.api.ValidationException: Only tables from the same TableEnvironment can be joined.
`
package com.alibaba.alink;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.dataproc.SplitBatchOp;
import com.alibaba.alink.operator.batch.recommendation.AlsPredictBatchOp;
import com.alibaba.alink.operator.batch.recommendation.AlsTrainBatchOp;
import com.alibaba.alink.operator.batch.source.CsvSourceBatchOp;
import com.alibaba.alink.operator.batch.source.TableSourceBatchOp;
import org.apache.flink.api.java.DataSet;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.BatchTableEnvironment;
public class WrongDemo {
public static void main(String[] args) throws Exception {
ExecutionEnvironment env= ExecutionEnvironment.createLocalEnvironment();
BatchTableEnvironment tEnv= BatchTableEnvironment.create(env);
String url = "https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/movielens_ratings.csv";
String schema = "userCol bigint, itemCol bigint,rateCol double, timestamp string";
Tuple3[] fake= new Tuple3[]{
new Tuple3<Long,Long,Double>(1L,1L,0.6),
new Tuple3<Long,Long,Double>(2L,2L,0.8),
new Tuple3<Long,Long,Double>(2L,3L,0.6),
new Tuple3<Long,Long,Double>(4L,1L,0.6),
new Tuple3<Long,Long,Double>(4L,2L,0.3),
};
DataSet ds= env.fromElements(fake);
Table table =tEnv.fromDataSet(ds);
BatchOperator data=BatchOperator.fromTable(table);
BatchOperator data3=new TableSourceBatchOp(table);
SplitBatchOp spliter = new SplitBatchOp().setFraction(0.8);
BatchOperator data2 = new CsvSourceBatchOp()
.setFilePath(url).setSchemaStr(schema);
System.out.println(data.getDataSet().getExecutionEnvironment());
System.out.println(BatchOperator.getExecutionEnvironmentFromDataSets(ds));
System.out.println(BatchOperator.getExecutionEnvironmentFromOps(data2));
System.out.println(BatchOperator.getExecutionEnvironmentFromOps(data3));
System.out.println(BatchOperator.getExecutionEnvironmentFromOps(spliter.linkFrom(data2)));
spliter.linkFrom(data);
BatchOperator trainData = spliter;
BatchOperator testData = spliter.getSideOutput(0);
AlsTrainBatchOp als = new AlsTrainBatchOp()
.setUserCol("userid").setItemCol("movieid").setRateCol("rating")
.setNumIter(10).setRank(10).setLambda(0.1);
BatchOperator model = als.linkFrom(trainData);
AlsPredictBatchOp predictor = new AlsPredictBatchOp()
.setUserCol("userid").setItemCol("movieid").setPredictionCol("prediction_result");
BatchOperator preditionResult;
preditionResult = predictor.linkFrom(model, testData).select("rating, prediction_result");
preditionResult.print();
}
}
`
Hi,
I found the current implementation of UDF/UDTF operators has the following limitations:
Moreover, in current codes, if a same column name is used in selectedCol(s), reservedCols, and outputCols, the related codes are not very clear.
Maybe we can re-phrase codes to 3 rules, and the implementation should align to these rules:
我没学过python,在python这方面小白,我安装python3.5的环境,也下载包了,但是按照提示easy_install时候报错,官方提供的是easy_install [存放的路径]/pyalink-0.0.1-py3.*.egg。我的是easy_install /usr/apps/pyalink-1.0_flink_1.9.0_scala_2.11-py3.5.egg 为何跑了很久一直卡在Couldn't find index page for 'pyalink' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/
卡的有半小时多,最后报错说找不到pyalink-1.0_flink_1.9.0_scala_2.11-py3.5,有安装成功的么?试了好多次都不行,有人懂么?在线等,挺急的
modify naive bayes algorithm as a text classifier
不使用集群版本
Modify the cutsArray param in Bucketizer.
Modify the handleInvalid param in VectorSizeHint.
Need to change the oss links to accessible ones.
mvn.aliyun.com and mvn central all don't exist it.
I try to save the model, use this code:
featurePipeline.save(FEATURE_PIPELINE_MODEL_FILE)
The pipeline model have been save to a file with csv.
Is there a way to save the model to mysql and load it?
大神们好,我有个问题,用CsvSourceStreamOp方法读取csv后,csv内容变更后,print出来的内容没有更新。以下是我的代码,请大神帮忙解决一下,谢谢!
`#!/usr/bin/python
from pyalink.alink import *
import numpy as np
import pandas as pd
useLocalEnv(1)
schemaStr = "A double, B double, C double, D double, label string"
path = "D:/program/Alink_script/data/iris2_1_stream.csv"
csvSource = CsvSourceStreamOp().setFilePath(path).setSchemaStr(schemaStr).setFieldDelimiter(",")
csvSource.print()
sample = SampleStreamOp().setRatio(0.01).linkFrom(csvSource)
sample.print(key="csvSource", refreshInterval=1)
StreamOperator.execute()`
Most python packages (currently more than 200 thousand) can be easily installed with pip, making it easier to integrate new packages like yours into large existing python development environments.
This however would require uploading your package to PyPI, which I think will be easy with this package, given that it's already bundled into an .egg format.
Modify the Chinese and English documents of QuantileDiscretizer, OnehotEncoder, Bucketizer.
When I test the kafka as the sink, something was wrong.
The codes are as follows:
public class writeKafka {
private static void write() throws Exception {
String URL = "data/iris.csv";
String SCHEMA_STR
= "sepal_length double, sepal_width double, petal_length double, petal_width double, category string";
CsvSourceStreamOp data = new CsvSourceStreamOp().setFilePath(URL).setSchemaStr(SCHEMA_STR);
Kafka011SinkStreamOp sink = new Kafka011SinkStreamOp()
.setBootstrapServers("127.0.0.1:9092")
.setDataFormat("json")
.setTopic("iris");
data.link(sink);
StreamOperator.execute();
}
public static void main(String[] args) throws Exception {
write();
}
}
It goes wrong like
09:54:15,518 WARN org.apache.kafka.common.utils.AppInfoParser - Error registering AppInfo mbean
javax.management.InstanceAlreadyExistsException: kafka.producer:type=app-info,id=kafka_011_-6885371916521496791
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(AppInfoParser.java:58)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:335)
at org.apache.kafka.clients.producer.KafkaProducer.<init>(KafkaProducer.java:191)
at com.alibaba.alink.operator.common.io.kafka.KafkaBaseOutputFormat.getKafkaProducer(KafkaBaseOutputFormat.java:160)
at com.alibaba.alink.operator.common.io.kafka.KafkaBaseOutputFormat.open(KafkaBaseOutputFormat.java:166)
at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:65)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
I don't know what caused it. Besides, I found that the message was generated normally.
Double-Checked Locking is widely cited and used as an efficient method for implementing lazy initialization in a multithreaded environment.
Unfortunately, it will not work reliably in a platform independent way when implemented in Java, without additional synchronization.
declares the singleton field volatile offers a much more elegant solution
First, I can't get the jar of alink_core.
So, I import my project with alink_open-0.1...jar and alink_python-0.1..jar that in pyalink lib folder.
When I test the GBDTExample in my project, then , I just get this error like title described.
Class BaseSourceStreamOp
doesn't support 0.8, 0.9, 1.0 version of Kafka source, only support 1.1 version, which is supported by Kafka011SourceStreamOp
. It is necessary to support above version of Kafka source.
大神们好,有个问题想请教一下,我通过Kafka011SourceStreamOp读取Kafka中的数据,但获取到message生成一列的Operator,我想对message进行解析(例如通过逗号分隔),生成新的多列的Operator,这样该如何操作?以下为我部分代码,希望大神帮忙解答,谢谢!
`#!/usr/bin/python
from pyalink.alink import *
import numpy as np
import pandas as pd
useLocalEnv(1)
bootstrapServer = "10.124.165.91:9092"
topic = "test_alink1"
data = Kafka011SourceStreamOp().setBootstrapServers(bootstrapServer).setTopic(topic).setStartupMode("EARLIEST").setGroupId("")
#此次可获取Kafka中message的信息,但是想对其进行解析生成新的Operator,同时不想解析成DataFrame再转Operator,因为之前readme中介绍过,不要频繁切换DataFrame与Operator,效率会收到影响。
data.select("message").print()
StreamOperator.execute()`
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.