日韩av高清无码,欧美日韩.男人的天堂,99久久免费国产香蕉麻豆

本文為您介紹Mars的功能、與PyODPS DataFrame的區別和使用場景。

說明

隨著MaxCompute MaxFrame的上線發布，將逐步替換PyODPS DataFrame及Mars接口，同時在算子兼容性以及分布式能力上有明顯提升，建議新用戶/新作業直接基于MaxFrame進行Python開發工作。

使用場景

Mars與PyODPS DataFrame使用場景如下：

Mars
- 經常使用PyODPS DataFrame的to_pandas()方法，將PyODPS DataFrame轉換成Pandas DataFrame的場景。
- 熟悉Pandas接口，但不愿意學習PyODPS DataFrame接口的場景。
- 使用索引的場景。
- 創建DataFrame后需要保證數據順序的場景。
  Mars DataFrame通過iloc等方法可以獲取某個偏移的數據。例如，df.iloc[10]可以獲取第10行數據。Mars DataFrame也支持需要保證數據順序才可以使用的特性接口df.shift()和df.ffill()。
- 需要并行和分布化Numpy或Scikit-learn，以及支持分布式運行TensorFlow、PyTorch和XGBoost的場景。
- 數據量在TB級別以下的場景。
PyODPS DataFrame
- 使用MaxCompute調度作業的場景。PyODPS DataFrame會將DataFrame編譯成MaxCompute SQL。如果需要通過MaxCompute調度作業，建議您使用PyODPS DataFrame。
- 穩定性要求較高的作業場景。PyODPS DataFrame會將作業編譯至MaxCompute執行，由于MaxCompute相當穩定，而Mars相對比較新，如果對穩定性有很高要求，建議您使用PyODPS DataFrame。
- 數據量在TB級別以上的場景，建議您使用PyODPS DataFrame。

與PyODPS DataFrame的區別

API
- Mars
  Mars DataFrame完全兼容Pandas。Mars Tensor兼容Numpy。Mars Learn兼容Scikit-learn。
- PyODPS
  只有DataFrame接口，和Pandas的接口差異較大。

索引

Mars

Mars DataFrame支持索引操作，包含行和列索引。示例代碼如下。

In [1]: import mars.dataframe as md
In [5]: import mars.tensor as mt
In [7]: df = md.DataFrame(mt.random.rand(10, 3), index=md.date_range('2020-5-1', periods=10))
In [9]: df.loc['2020-5'].execute()
Out[9]:
                   0         1         2
2020-05-01  0.061912  0.507101  0.372242
2020-05-02  0.833663  0.818519  0.943887
2020-05-03  0.579214  0.573056  0.319786
2020-05-04  0.476143  0.245831  0.434038
2020-05-05  0.444866  0.465851  0.445263
2020-05-06  0.654311  0.972639  0.443985
2020-05-07  0.276574  0.096421  0.264799
2020-05-08  0.106188  0.921479  0.202131
2020-05-09  0.281736  0.465473  0.003585
2020-05-10  0.400000  0.451150  0.956905

PyODPS
不支持索引操作。

數據順序
- Mars
  Mars DataFrame創建后，會保證數據順序，支持時序操作（shift）、向前（ffill）、向后（bfill）和填空值操作。
```
In [3]: df = md.DataFrame([[1, None], [None, 1]])
In [4]: df.execute()
Out[4]:
     0    1
0  1.0  NaN
1  NaN  1.0

In [5]: df.ffill().execute() #空值用上一行的值。
Out[5]:
     0    1
0  1.0  NaN
1  1.0  1.0
```
- PyODPS
  PyODPS使用MaxCompute計算和存儲數據，而MaxCompute并不保證數據順序，因此PyODPS不保證數據順序，不支持時序操作。
執行層
- Mars
  Mars包含客戶端和分布式執行層。您可以通過調用o.create_mars_cluster，在MaxCompute內部創建Mars集群，并將計算作業直接提交至Mars集群，調度費用極小。在數據規模較小時，Mars更有優勢。
- PyODPS
  PyODPS是一個客戶端，不包含任何服務端部分。執行PyODPS DataFrame時，系統會將計算作業編譯至MaxCompute SQL。因此，PyODPS DataFrame支持的操作，取決于MaxCompute SQL。此外，您每次調用execute方法時，會提交一次MaxCompute作業，需要在集群內調度作業。

使用說明

Mars是一個基于張量的統一分布式計算框架。Mars能利用并行和分布式技術，為Python數據科學棧加速，包括Numpy、Pandas和Scikit-learn。

Mars常用接口如下：

Mars Tensor接口

和Numpy保持一致，且支持大規模高維數組。示例代碼如下。

import mars.tensor as mt
a = mt.random.rand(10000, 50)
b = mt.random.rand(50, 5000)
a.dot(b).execute()

Mars DataFrame接口

和Pandas保持一致，且支持大規模數據處理和分析。示例代碼如下。

import mars.dataframe as md
ratings = md.read_csv('Downloads/ml-20m/ratings.csv')
movies = md.read_csv('Downloads/ml-20m/movies.csv')
movie_rating = ratings.groupby('movieId', as_index=False).agg({'rating': 'mean'})
result = movie_rating.merge(movies[['movieId', 'title']], on='movieId')
result.sort_values(by='rating', ascending=False).execute()

Mars Learn接口

和Scikit-learn保持一致。示例代碼如下。Mars Learn可以集成TensorFlow、PyTorch和XGBoost。

import mars.dataframe as md
from mars.learn.neighbors import NearestNeighbors
df = md.read_csv('data.csv')
nn = NearestNeighbors(n_neighbors=10)
nn.fit(df)
neighbors = nn.kneighbors(df).fetch()

技術支持

如果您在使用Mars過程中遇到問題，請單擊申請鏈接加入技術支持釘釘群進行咨詢。

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

概述

使用場景

與PyODPS DataFrame的區別

使用說明

參考文檔

技術支持