{"id":21695,"date":"2018-06-27T09:00:04","date_gmt":"2018-06-27T00:00:04","guid":{"rendered":"https:\/\/www.creationline.com\/?p=21695"},"modified":"2018-06-27T12:08:08","modified_gmt":"2018-06-27T03:08:08","slug":"apache-spark%e7%b8%9b%e3%82%8a%e3%81%a7kaggle%e3%81%ae%e3%82%b3%e3%83%b3%e3%83%9a%e3%83%86%e3%82%a3%e3%82%b7%e3%83%a7%e3%83%b3%e3%82%84%e3%81%a3%e3%81%a6%e3%81%bf%e3%81%9f-spark","status":"publish","type":"post","link":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695","title":{"rendered":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" src=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg\" alt=\"\" width=\"942\" height=\"261\" class=\"alignnone size-full wp-image-12352\" srcset=\"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg 942w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle-360x100.jpg 360w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle-768x213.jpg 768w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle-225x62.jpg 225w\" sizes=\"auto, (max-width: 942px) 100vw, 942px\" \/><\/p>\n<p>\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002<\/p>\n<p>\u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark \u3067\u6311\u6226\u3057\u3066\u307f\u305f\u3044\u3068\u601d\u3044\u307e\u3059\u3002<\/p>\n<p>\u4f7f\u3063\u3066\u3044\u308b\u65b9\u306f\u77e5\u3063\u3066\u306f\u3044\u308b\u306e\u3067\u3059\u304c\u3001\u5b9f\u306f kaggle \u3067\u306f Apache Spark \u3092\u4f7f\u7528\u3057\u3066\u3044\u308b\u4eba\u306f\u3042\u307e\u308a\u591a\u304f\u3042\u308a\u307e\u305b\u3093\u3002\u65e5\u672c\u3067\u3082 kaggle \u306e\u4f8b\u3092\u898b\u3066\u307f\u308b\u3068\u3001Python+numpy+pandas+scikit-learn(+TensorFlow)\u3068\u3044\u3046\u7d44\u307f\u5408\u308f\u305b\u3067\u6311\u6226\u3057\u3066\u3044\u308b\u65b9\u304c\u591a\u6570\u3067\u3059\u3002<\/p>\n<p>\u4eca\u56de\u306e\u8a18\u4e8b\u306f\u3042\u3048\u3066Apache Spark\u7e1b\u308a\u3067 kaggle \u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u306b\u53c2\u52a0\u3057\u3066\u307f\u3066\u3001\u5b9f\u969b Pandas\/numpy\/scikit-learn\u3067\u3084\u3063\u3066\u3044\u308b\u3053\u3068\u3092Apache Spark\u306b\u7f6e\u304d\u63db\u3048\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u306e\u304b\u3001\u7f6e\u304d\u63db\u3048\u308b\u3068\u3057\u305f\u3089\u3069\u3046\u3059\u308b\u306e\u304b\u3001\u3068\u3044\u3046\u3068\u3053\u308d\u306b\u7740\u76ee\u3057\u3001\u5b9f\u969b\u306b\u7d50\u679c\u3092\u6295\u7a3f\u3059\u308b\u3068\u3053\u308d\u307e\u3067\u3084\u3063\u3066\u307f\u305f\u3044\u3068\u601d\u3044\u307e\u3059\u3002<\/p>\n<p>Apache Spark\u3092\u5229\u7528\u3059\u308b\u30e1\u30ea\u30c3\u30c8\u3068\u3057\u3066\u306f\u3001\u51e6\u7406\u3059\u308bPC\u304c\u6bd4\u8f03\u7684\u5927\u304d\u304f\u306a\u304f\u3066\u3082\u3001\u5927\u304d\u306a\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u6271\u3046\u3053\u3068\u304c\u3067\u304d\u308b\u70b9\u3067\u3059\u3002\u3064\u307e\u308a\u6642\u9593\u3092\u304b\u3051\u3066\u3082\u69cb\u308f\u306a\u3044\u306e\u3067\u3042\u308c\u3070\u3001\u30af\u30e9\u30b9\u30bf\u69cb\u6210\u3067\u3057\u305f\u51e6\u7406\u3067\u304d\u306a\u3044\u3088\u3046\u306a\u5927\u898f\u6a21\u30c7\u30fc\u30bf\u3092\u624b\u5143\u306e\u30d1\u30bd\u30b3\u30f3\u3067\u51e6\u7406\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002\u3082\u3061\u308d\u3093\u30af\u30e9\u30b9\u30bf\u69cb\u6210\u3067 Apache Spark \u3092\u4f7f\u7528\u3059\u308c\u3070\u51e6\u7406\u30ce\u30fc\u30c9\u6570\u306b\u5fdc\u3058\u305f\u6027\u80fd\u306e\u5411\u4e0a\u3092\u671f\u5f85\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<p>\u3082\u3061\u308d\u3093 kaggle \u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u306f\u51e6\u7406\u901f\u5ea6\u3060\u3051\u3067\u306f\u306a\u304f\u6a5f\u68b0\u5b66\u7fd2\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u305d\u306e\u3082\u306e\u306e\u6027\u80fd\u3084\u3001\u4ee5\u4e0b\u306b\u30c7\u30fc\u30bf\u304b\u3089\u7279\u5fb4\u3092\u5f15\u304d\u51fa\u305b\u308b\u304b\u306b\u3088\u3063\u3066\u9806\u4f4d\u306b\u5927\u304d\u306a\u5909\u52d5\u304c\u51fa\u308b\u305f\u3081\u3001Apache Spark \u3067\u5927\u304d\u306a\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u3092\u6271\u3046\u3053\u3068\u3084\u51e6\u7406\u6027\u80fd\u3092\u5411\u4e0a\u3055\u305b\u308b\u3053\u3068\u3068 Kaggle \u3067 competitve \u304b\u3069\u3046\u304b\u306f\u5225\u306e\u8a71\u3067\u3059\u3002\u3042\u3057\u304b\u3089\u305a\u3002<\/p>\n<p>\u3067\u306f\u306f\u3058\u3081\u307e\u3057\u3087\u3046\u3002<\/p>\n<h2>\u53c2\u8003\u306b\u3057\u305f\u30b5\u30a4\u30c8<\/h2>\n<p>\u3053\u3053\u3067\u306f Qiita \u306e \"Kaggle\u306e\u7df4\u7fd2\u554f\u984c\uff08Regression\uff09\u3092\u89e3\u3044\u3066Kaggler\u306b\u306a\u308b\" : <a href=\"https:\/\/qiita.com\/katsu1110\/items\/a1c3185fec39e5629bcb\" rel=\"noopener\" target=\"_blank\">https:\/\/qiita.com\/katsu1110\/items\/a1c3185fec39e5629bcb<\/a> \u3092\u53c2\u8003\u306b\u3001\u3067\u304d\u308b\u3060\u3051 Apache Spark \u306e\u6a5f\u80fd\u3092\u4f7f\u3063\u3066\u540c\u3058\u3053\u3068\u3092\u3059\u308b\u3053\u3068\u306b\u3057\u307e\u3059\u3002<\/p>\n<p>\u5f93\u3063\u3066\u884c\u3046\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u306f\u7df4\u7fd2\u7528\u306e \"<a href=\"https:\/\/www.kaggle.com\/c\/house-prices-advanced-regression-techniques\" rel=\"noopener\" target=\"_blank\">House Prices: Advanced Regression Techniques<\/a>\" \u3068\u3057\u307e\u3059\u3002\u51e6\u7406\u7528\u306e\u30c7\u30fc\u30bf\u30bb\u30c3\u30c8\u306f\u4e88\u3081\u624b\u5143\u306b\u30c0\u30a6\u30f3\u30ed\u30fc\u30c9\u3057\u3066\u304a\u304d\u307e\u3059\u3002<\/p>\n<h2>\u30c7\u30fc\u30bf\u306e\u30ed\u30fc\u30c9\u3001\u8868\u793a<\/h2>\n<p>Pandas \u3067\u306f CSV \u30c7\u30fc\u30bf\u306e\u30ed\u30fc\u30c9\u306f\u975e\u5e38\u306b\u304b\u3093\u305f\u3093\u3067\u3001 \"read_csv\" \u3092\u547c\u3076\u3060\u3051\u3067\u3059\u3002\u8aad\u307f\u8fbc\u3093\u3060 CSV \u306f Pandas \u306e Dataframe \u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<p><pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">python\nimport pandas as pd\ndf = pd.read_csv(&quot;train.csv.gz&quot;)<\/pre>\n<\/p>\n<p>Apache Spark \u306f\u5c11\u3057\u524d\u307e\u3067\u306f CSV \u306e\u8aad\u307f\u8fbc\u307f\u3092\u6a19\u6e96\u3067\u30b5\u30dd\u30fc\u30c8\u3057\u3066\u3044\u306a\u304b\u3063\u305f\u306e\u3067\u3059\u304c\u3001\u6700\u8fd1\u3067\u306f\u305d\u306e\u307e\u307e\u8aad\u307f\u8fbc\u3080\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002\u4ee5\u4e0b\u306e\u3088\u3046\u306b\u3059\u308c\u3070 CSV \u30d5\u30a1\u30a4\u30eb\u3092\u8aad\u307f\u8fbc\u307f\u3001Apache Spark \u306e Dataframe \u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<pre>python\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.master(\"local\").getOrCreate()\n# CSV\u306e\u30c7\u30fc\u30bf\u30d5\u30a1\u30a4\u30eb\u3092\u8aad\u307f\u8fbc\u307f\ndf = spark.read.format(\"csv\") \\\n          .options(header=\"true\", \\\n                   nanValue=\"NA\", \\\n                   nullValue=\"NA\", \\\n                   inferSchema=True) \\\n          .load(\"train.csv.gz\")<\/pre>\n<p>Jupyter Notebook(\u3042\u308b\u3044\u306fJupyterLab) \u3092\u4f7f\u7528\u3057\u3066\u3044\u308b\u5834\u5408\u3001Pandas Dataframe\u306e\u8868\u793a\u306f\u975e\u5e38\u306b\u304b\u3093\u305f\u3093\u3067\u3059\u3002\u304d\u308c\u3044\u306a\u8868\u3067\u8868\u793a\u3055\u308c\u3001\u5217\u306e\u6570\u304c\u591a\u3044\u5834\u5408\u3067\u3082\u9069\u5b9c\u7701\u7565\u3057\u305f\u308a\u3001\u30b9\u30af\u30ed\u30fc\u30eb\u3057\u3066\u5185\u5bb9\u3092\u304b\u3093\u305f\u3093\u306b\u628a\u63e1\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<p><pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">python\nfrom IPython.core.display import display\ndisplay(df.head(5))<\/pre>\n<\/p>\n<p><a href=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_01.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_01.jpg\" alt=\"\" width=\"1005\" height=\"281\" class=\"aligncenter size-full wp-image-21698\" srcset=\"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_01.jpg 1005w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_01-360x101.jpg 360w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_01-768x215.jpg 768w\" sizes=\"auto, (max-width: 1005px) 100vw, 1005px\" \/><\/a><\/p>\n<p>Apache Spark \u3067\u306f Dataframe \u306e \"show()\" \u30e1\u30bd\u30c3\u30c9\u3067\u8868\u793a\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u306e\u3067\u3059\u304c\u3001\u5168\u3066\u30c6\u30ad\u30b9\u30c8\u3067\u8868\u793a\u3055\u308c\u3066\u3057\u307e\u3046\u305f\u3081\u3001\u5217\u306e\u6570\u304c\u591a\u304f\u306a\u308b\u3068\u5185\u5bb9\u306e\u78ba\u8a8d\u304c\u3084\u308a\u3065\u3089\u304f\u306a\u3063\u3066\u3057\u307e\u3044\u307e\u3059\u3002\u3053\u308c\u3067\u306f\u30c7\u30fc\u30bf\u306e\u78ba\u8a8d\u3060\u3051\u3067\u3052\u3093\u306a\u308a\u3057\u3066\u3057\u307e\u3044\u305d\u3046\u3067\u3059\u3002<\/p>\n<p><pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">python\ndf.show(5)<\/pre>\n<\/p>\n<p><a href=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_02.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_02.jpg\" alt=\"\" width=\"987\" height=\"621\" class=\"aligncenter size-full wp-image-21700\" srcset=\"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_02.jpg 987w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_02-360x227.jpg 360w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_02-768x483.jpg 768w\" sizes=\"auto, (max-width: 987px) 100vw, 987px\" \/><\/a><\/p>\n<p>\u5206\u6790\u306b\u304a\u3044\u3066\u306f\u30c7\u30fc\u30bf\u306e\u72b6\u6cc1\u3092\u3056\u3063\u3068\u898b\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u3053\u3068\u3082\u91cd\u8981\u306a\u70b9\u3067\u3059\u3002\u3057\u304b\u3057\u3053\u308c\u3067\u306f\u3056\u3063\u3068\u5185\u5bb9\u3092\u78ba\u8a8d\u3059\u308b\u3053\u3068\u304c\u96e3\u3057\u304f\u3001\u5b9f\u7528\u306b\u306f\u9069\u3057\u307e\u305b\u3093\u3002\u305d\u3053\u3067 Spark Dataframe \u306e\u5185\u5bb9\u3092 Pandas Dataframe \u306b\u5909\u63db\u3057\u3066\u8868\u793a\u3059\u308b\u3088\u3046\u306b\u3001\u304a\u52a9\u3051\u95a2\u6570\u3092\u4f5c\u6210\u3057\u307e\u3059\u3002<\/p>\n<p><pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\">python\ndef printDf(sprkDF): \n    # format Spark Dataframe like pandas dataframe\n    # spark\u306edataframe\u3092pandas\u306edataframe\u306e\u3088\u3046\u306b\u6574\u5f62\u3057\u3066\u51fa\u529b\u3059\u308b\n    newdf = sprkDF.toPandas()\n    from IPython.core.display import display, HTML\n    return HTML(newdf.to_html())\ndisplay(printDf(df.limit(5)))<\/pre>\n<\/p>\n<p><a href=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_03.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_03.jpg\" alt=\"\" width=\"987\" height=\"337\" class=\"aligncenter size-full wp-image-21701\" srcset=\"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_03.jpg 987w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_03-360x123.jpg 360w, https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2018\/06\/houseprice_03-768x262.jpg 768w\" sizes=\"auto, (max-width: 987px) 100vw, 987px\" \/><\/a><\/p>\n<p>\u3053\u308c\u3067\u3060\u3044\u3076\u898b\u3084\u3059\u304f\u306a\u308a\u307e\u3057\u305f\uff01\u30c7\u30fc\u30bf\u3082\u898b\u3084\u3059\u304f\u306a\u308a\u3001\u3084\u308b\u6c17\u304c\u51fa\u3066\u304d\u307e\u3059\u306d\uff01<\/p>\n<h2>\u30e9\u30d9\u30eb\u30a8\u30f3\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0<\/h2>\n<p>\u6b21\u306f\u6587\u5b57\u5217\u3067\u69cb\u6210\u3055\u308c\u3066\u3044\u308b\u5217\u3092\u6a5f\u68b0\u5b66\u7fd2\u30a2\u30eb\u30b4\u30ea\u30ba\u30e0\u306b\u4e0e\u3048\u3084\u3059\u3044\u3088\u3046\u6570\u5b57\u306b\u7f6e\u304d\u63db\u3048\u307e\u3059\u3002<\/p>\n<p>scikit-learn \u306e \"LabelEncoder\" \u30e9\u30a4\u30d6\u30e9\u30ea\u306f\u6587\u5b57\u5217\u3067\u69cb\u6210\u3055\u308c\u305f\u30ab\u30e9\u30e0\u3092\u8aad\u307f\u8fbc\u307f\u3001\u6570\u5024\u306b\u7f6e\u304d\u63db\u3048\u3066\u304f\u308c\u307e\u3059\u3002\u3053\u308c\u306f\u975e\u5e38\u306b\u4fbf\u5229\u306a\u30e9\u30a4\u30d6\u30e9\u30ea\u3067\u3059\u3002<\/p>\n<p>[cc lang=\"python\"]python<br \/>\nfrom sklearn.preprocessing import LabelEncoder<br \/>\n\u3000for i in range(train.shape[1]):<br \/>\n    if train.iloc<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark \u3067\u6311\u6226\u3057\u3066\u307f\u305f\u3044\u3068\u601d\u3044\u307e\u3059\u3002 \u4f7f\u3063\u3066\u3044\u308b\u65b9\u306f\u77e5\u3063\u3066\u306f\u3044\u308b\u306e\u3067\u3059\u304c\u3001\u5b9f\u306f kaggle  [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":12352,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[83,51],"tags":[84,85,50],"class_list":["post-21695","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dataanalytics","category-spark","tag-dataanalytics","tag-kaggle","tag-spark"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.7 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3<\/title>\n<meta name=\"description\" content=\"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695\" \/>\n<meta property=\"og:locale\" content=\"ja_JP\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3\" \/>\n<meta property=\"og:description\" content=\"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695\" \/>\n<meta property=\"og:site_name\" content=\"Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/creationline\" \/>\n<meta property=\"article:published_time\" content=\"2018-06-27T00:00:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-06-27T03:08:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"942\" \/>\n\t<meta property=\"og:image:height\" content=\"261\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@creationline\" \/>\n<meta name=\"twitter:site\" content=\"@creationline\" \/>\n<meta name=\"twitter:label1\" content=\"\u57f7\u7b46\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593\" \/>\n\t<meta name=\"twitter:data2\" content=\"1\u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/#\\\/schema\\\/person\\\/7d923d1c017568a1a5e66d7bb1c8764a\"},\"headline\":\"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark\",\"datePublished\":\"2018-06-27T00:00:04+00:00\",\"dateModified\":\"2018-06-27T03:08:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695\"},\"wordCount\":176,\"image\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2015\\\/12\\\/kaggle.jpg\",\"keywords\":[\"DataAnalytics\",\"Kaggle\",\"Spark\"],\"articleSection\":[\"DataAnalytics\",\"Spark\"],\"inLanguage\":\"ja\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695\",\"url\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695\",\"name\":\"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2015\\\/12\\\/kaggle.jpg\",\"datePublished\":\"2018-06-27T00:00:04+00:00\",\"dateModified\":\"2018-06-27T03:08:08+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/#\\\/schema\\\/person\\\/7d923d1c017568a1a5e66d7bb1c8764a\"},\"description\":\"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#breadcrumb\"},\"inLanguage\":\"ja\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#primaryimage\",\"url\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2015\\\/12\\\/kaggle.jpg\",\"contentUrl\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2015\\\/12\\\/kaggle.jpg\",\"width\":942,\"height\":261},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\\\/21695#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"HOME\",\"item\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"\u30c7\u30fc\u30bf\u200b\u200b\u30de\u30cd\u30b8\u30e1\u30f3\u30c8\",\"item\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"DataAnalytics\",\"item\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/data-management\\\/dataanalytics\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/#website\",\"url\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/\",\"name\":\"Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3\",\"description\":\"\u30a2\u30b8\u30e3\u30a4\u30eb\uff06DevOps\u3001\u30af\u30e9\u30a6\u30c9\u30cd\u30a4\u30c6\u30a3\u30d6\u3001AI\uff06LLM\u306e\u5148\u7aef\u6280\u8853\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"ja\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/#\\\/schema\\\/person\\\/7d923d1c017568a1a5e66d7bb1c8764a\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"ja\",\"@id\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/avatar.png\",\"url\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/avatar.png\",\"contentUrl\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/cms_x3GWkuX\\\/wp-content\\\/uploads\\\/2021\\\/12\\\/avatar.png\",\"caption\":\"admin\"},\"url\":\"https:\\\/\\\/www.creationline.com\\\/tech-blog\\\/author\\\/admin\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3","description":"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695","og_locale":"ja_JP","og_type":"article","og_title":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3","og_description":"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark","og_url":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695","og_site_name":"Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3","article_publisher":"https:\/\/www.facebook.com\/creationline","article_published_time":"2018-06-27T00:00:04+00:00","article_modified_time":"2018-06-27T03:08:08+00:00","og_image":[{"width":942,"height":261,"url":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_creator":"@creationline","twitter_site":"@creationline","twitter_misc":{"\u57f7\u7b46\u8005":"admin","\u63a8\u5b9a\u8aad\u307f\u53d6\u308a\u6642\u9593":"1\u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#article","isPartOf":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695"},"author":{"name":"admin","@id":"https:\/\/www.creationline.com\/tech-blog\/#\/schema\/person\/7d923d1c017568a1a5e66d7bb1c8764a"},"headline":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark","datePublished":"2018-06-27T00:00:04+00:00","dateModified":"2018-06-27T03:08:08+00:00","mainEntityOfPage":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695"},"wordCount":176,"image":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#primaryimage"},"thumbnailUrl":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg","keywords":["DataAnalytics","Kaggle","Spark"],"articleSection":["DataAnalytics","Spark"],"inLanguage":"ja"},{"@type":"WebPage","@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695","url":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695","name":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark - Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3","isPartOf":{"@id":"https:\/\/www.creationline.com\/tech-blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#primaryimage"},"image":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#primaryimage"},"thumbnailUrl":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg","datePublished":"2018-06-27T00:00:04+00:00","dateModified":"2018-06-27T03:08:08+00:00","author":{"@id":"https:\/\/www.creationline.com\/tech-blog\/#\/schema\/person\/7d923d1c017568a1a5e66d7bb1c8764a"},"description":"DataAnalytics, Spark |\u3053\u3093\u306b\u3061\u306f\u3002\u6728\u5185\u3067\u3059\u3002 \u4eca\u56de\u306f\u30c7\u30fc\u30bf\u30b5\u30a4\u30a8\u30f3\u30c6\u30a3\u30b9\u30c8\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u30b5\u30a4\u30c8\u3068\u3057\u3066\u6709\u540d\u306a kaggle \u306b Apache Spark","breadcrumb":{"@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#breadcrumb"},"inLanguage":"ja","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695"]}]},{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#primaryimage","url":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg","contentUrl":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2015\/12\/kaggle.jpg","width":942,"height":261},{"@type":"BreadcrumbList","@id":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics\/21695#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"HOME","item":"https:\/\/www.creationline.com\/tech-blog"},{"@type":"ListItem","position":2,"name":"\u30c7\u30fc\u30bf\u200b\u200b\u30de\u30cd\u30b8\u30e1\u30f3\u30c8","item":"https:\/\/www.creationline.com\/tech-blog\/data-management"},{"@type":"ListItem","position":3,"name":"DataAnalytics","item":"https:\/\/www.creationline.com\/tech-blog\/data-management\/dataanalytics"},{"@type":"ListItem","position":4,"name":"Apache Spark\u7e1b\u308a\u3067Kaggle\u306e\u30b3\u30f3\u30da\u30c6\u30a3\u30b7\u30e7\u30f3\u3084\u3063\u3066\u307f\u305f #Spark"}]},{"@type":"WebSite","@id":"https:\/\/www.creationline.com\/tech-blog\/#website","url":"https:\/\/www.creationline.com\/tech-blog\/","name":"Tech Blog\uff5c\u30af\u30ea\u30a8\u30fc\u30b7\u30e7\u30f3\u30e9\u30a4\u30f3","description":"\u30a2\u30b8\u30e3\u30a4\u30eb\uff06DevOps\u3001\u30af\u30e9\u30a6\u30c9\u30cd\u30a4\u30c6\u30a3\u30d6\u3001AI\uff06LLM\u306e\u5148\u7aef\u6280\u8853","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.creationline.com\/tech-blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"ja"},{"@type":"Person","@id":"https:\/\/www.creationline.com\/tech-blog\/#\/schema\/person\/7d923d1c017568a1a5e66d7bb1c8764a","name":"admin","image":{"@type":"ImageObject","inLanguage":"ja","@id":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2021\/12\/avatar.png","url":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2021\/12\/avatar.png","contentUrl":"https:\/\/www.creationline.com\/tech-blog\/cms_x3GWkuX\/wp-content\/uploads\/2021\/12\/avatar.png","caption":"admin"},"url":"https:\/\/www.creationline.com\/tech-blog\/author\/admin"}]}},"_links":{"self":[{"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/posts\/21695","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/comments?post=21695"}],"version-history":[{"count":37,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/posts\/21695\/revisions"}],"predecessor-version":[{"id":21712,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/posts\/21695\/revisions\/21712"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/media\/12352"}],"wp:attachment":[{"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/media?parent=21695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/categories?post=21695"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.creationline.com\/tech-blog\/wp-json\/wp\/v2\/tags?post=21695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}