Skip to content

Conversation

@bobjiang82
Copy link

  • Reuse GradientBoostedTreeDataGenerator to generate dataset
  • Read dataset and convert to ml.LabeledPoint and to ml.DataFrame
  • Call XGBoost and passed in params for training
  • Call XGBoost prediction and print test error
  • Add XGBoost libs configuration doc
  • Use pipeline for training
  • Verified with Scala 2.12, Apache Spark 2.4, and XGBoost v1.1.

Note: based on Xiaochang's PR #628.

@xwu-intel
Copy link

@bobjiang82 #628 is merged. could you rebase the code to resolve the conflict?

@bobjiang82
Copy link
Author

@xwu99 Done.



### 8. Run xgboost workload ###

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change xgboost to XGBoost and following the same?

```

#### 8.a latest xgboost release (default) ####

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to use 8.a, 8.b., need to use correct captial cases for titles.

Copy link

@xwu-intel xwu-intel Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to write this since it's already written in the above section 4. Run a workload
I suggest you seperate the doc out and only merge code and make sure it's runnable with default HiBench process.

If you only have the xgboost jar files, just copy them to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in sparkbench/ml/pom.xml to get aligned.<br>
For example, if xgboost is built from source on a Linux platform, the jars will be generated and installed to ```~/.m2/repository/ml/dmlc/xgboost4j_<scala version>/<xgboost version>-SNAPSHOT/``` and ```~/.m2/repository/ml/dmlc/xgboost4j-spark_<scala version>/<xgboost version>-SNAPSHOT/``` respectively. To use them, copy the 2 jars to $SPARK_HOME/jars/ and update the relevant versions for xgboost4j and xgboost4j-spark in the pom.xml files.<br>
After that, build hibench, prepare data and run xgboost benchmark.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the doc style is not consistent as the original doc. and too complicated to follow.
I suggest rewrite or remove. We can merge code first. It should be runnable with default setting.

```

#### 8.a latest xgboost release (default) ####

Copy link

@xwu-intel xwu-intel Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to write this since it's already written in the above section 4. Run a workload
I suggest you seperate the doc out and only merge code and make sure it's runnable with default HiBench process.

commit code first and continue to refine doc.
@bobjiang82
Copy link
Author

Updated to merge the code first and continue to refine the doc.

@xwu-intel
Copy link

Updated to merge the code first and continue to refine the doc.

Thanks! could you add this to CI

Updated to merge the code first and continue to refine the doc.

Thanks, could you add this to
benchmark list: conf/benchmarks.lst
and
CI: travis/benchmarks_ml.lst

@bobjiang82
Copy link
Author

Added xgboost to conf/benchmarks.lst and travis/benchmarks_ml.lst

@xwu-intel
Copy link

@bobjiang82 could you modify bin/run_all.sh to mask out hadoop since this is for spark only.

sync the forked repo with HiBench base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants