Support Vector Machine no Apache-faísca

votos
0

Olá quando eu estou tentando executar Support Vector Machine no Apache-faísca com ./run-example org.apache.spark.mllib.classification.SVM local <path-to-dir>/sample_svm_data.txt 2 2.0 2no terminal recebo a seguinte mensagem de erro.

Exception in thread main java.lang.NumberFormatException: For input string: 1 0 2.52078447201548 0 0 0 2.004684436494304 2.000347299268466 0 2.228387042742021 2.228387042742023 0 0 0 0 0 0
  at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
  at java.lang.Double.parseDouble(Double.java:540)
  at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:234)
  at scala.collection.immutable.StringOps.toDouble(StringOps.scala:31)
  at org.apache.spark.mllib.util.MLUtils$$anonfun$loadLabeledData$1.apply(MLUtils.scala:45)
  at org.apache.spark.mllib.util.MLUtils$$anonfun$loadLabeledData$1.apply(MLUtils.scala:43)
  at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
  at scala.collection.Iterator$$anon$18.next(Iterator.scala:385)
  at scala.collection.Iterator$class.foreach(Iterator.scala:772)
  at scala.collection.Iterator$$anon$18.foreach(Iterator.scala:379)
  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
  at scala.collection.Iterator$$anon$18.toBuffer(Iterator.scala:379)
  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
  at scala.collection.Iterator$$anon$18.toArray(Iterator.scala:379)
  at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
  at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
  at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
  at org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)

O despejo completo é adicionado para posterior diagnóstico.

13/12/13 12:26:54 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/12/13 12:26:54 INFO spark.SparkEnv: Registering BlockManagerMaster
13/12/13 12:26:54 INFO storage.MemoryStore: MemoryStore started with capacity 9.2 GB.
13/12/13 12:26:54 INFO storage.DiskStore: Created local directory at /tmp/spark-local-20131213122654-abb2
13/12/13 12:26:54 INFO network.ConnectionManager: Bound socket to port 36563 with id = ConnectionManagerId(<master>,36563)
13/12/13 12:26:54 INFO storage.BlockManagerMaster: Trying to register BlockManager
13/12/13 12:26:54 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager <master>:36563 with 9.2 GB RAM
13/12/13 12:26:54 INFO storage.BlockManagerMaster: Registered BlockManager
13/12/13 12:26:54 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/13 12:26:54 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:56637
13/12/13 12:26:54 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.232.5.169:56637
13/12/13 12:26:54 INFO spark.SparkEnv: Registering MapOutputTracker
13/12/13 12:26:54 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-986ccc2b-5a40-48ae-8801-566b0f32895b
13/12/13 12:26:54 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/13 12:26:54 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:59613
13/12/13 12:26:54 INFO server.Server: jetty-7.x.y-SNAPSHOT
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
13/12/13 12:26:54 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
13/12/13 12:26:54 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
13/12/13 12:26:54 INFO ui.SparkUI: Started Spark Web UI at http://<master>:4040
13/12/13 12:26:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/12/13 12:26:55 INFO storage.MemoryStore: ensureFreeSpace(121635) called with curMem=0, maxMem=9907879280
13/12/13 12:26:55 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 118.8 KB, free 9.2 GB)
13/12/13 12:26:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/12/13 12:26:55 INFO spark.SparkContext: Starting job: first at GeneralizedLinearAlgorithm.scala:121
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Got job 0 (first at GeneralizedLinearAlgorithm.scala:121) with 1 output partitions (allowLocal=true)
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Final stage: Stage 0 (first at GeneralizedLinearAlgorithm.scala:121)
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Parents of final stage: List()
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Missing parents: List()
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Computing the requested partition locally
13/12/13 12:26:55 INFO rdd.HadoopRDD: Input split: file:/data/tanmay/tmp/sample_svm_data.txt:0+39474
13/12/13 12:26:55 INFO scheduler.DAGScheduler: Failed to run first at GeneralizedLinearAlgorithm.scala:121
Exception in thread main java.lang.NumberFormatException: For input string: 1 0 2.52078447201548 0 0 0 2.004684436494304 2.000347299268466 0 2.228387042742021 2.228387042742023 0 0 0 0 0 0
  at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1241)
  at java.lang.Double.parseDouble(Double.java:540)
  at scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:234)
  at scala.collection.immutable.StringOps.toDouble(StringOps.scala:31)
  at org.apache.spark.mllib.util.MLUtils$$anonfun$loadLabeledData$1.apply(MLUtils.scala:45)
  at org.apache.spark.mllib.util.MLUtils$$anonfun$loadLabeledData$1.apply(MLUtils.scala:43)
  at scala.collection.Iterator$$anon$19.next(Iterator.scala:401)
  at scala.collection.Iterator$$anon$18.next(Iterator.scala:385)
  at scala.collection.Iterator$class.foreach(Iterator.scala:772)
  at scala.collection.Iterator$$anon$18.foreach(Iterator.scala:379)
  at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)
  at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
  at scala.collection.Iterator$$anon$18.toBuffer(Iterator.scala:379)
  at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
  at scala.collection.Iterator$$anon$18.toArray(Iterator.scala:379)
  at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
  at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
  at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
  at org.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)
  at org.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)

Alguém por favor pode me ajudar a descobrir o problema com esses dados (/ parâmetros de entrada), considerando o fato de que o Apache-faísca vendeu o sample_svm_data.txt junto com o pacote para bibliotecas de aprendizado de máquina [, o que implica que os dados não deve ser a questão]?

Publicado 13/12/2013 em 05:33
fonte usuário
Em outras línguas...                            


1 respostas

votos
0

A questão era que os dados tinham espaços como delimitador e mahout (SVM) por padrão não pôde analisar o mesmo. Acabei de substituir os espaços com vírgulas e viola funcionou !!

Respondeu 22/04/2014 em 07:34
fonte usuário

Cookies help us deliver our services. By using our services, you agree to our use of cookies. Learn more