GithubHelp home page GithubHelp logo

jqfactor_analyzer's Introduction

jqfactor_analyzer

聚宽单因子分析工具开源版


聚宽单因子分析工具开源版是提供给用户进行因子分析的工具,提供了包括计算因子IC值,因子收益,因子换手率等各种详细指标,用户可以按照自己的需求查看因子详情。

安装

pip install jqfactor_analyzer

升级

pip install -U jqfactor_analyzer

具体使用方法

analyze_factor: 因子分析函数

使用示例

  • 示例:5日平均换手率因子分析

# 载入函数库
import pandas as pd
import jqfactor_analyzer as ja

# 获取 jqdatasdk 授权,输入用户名、密码,申请地址:http://t.cn/EINDOxE
# 聚宽官网,使用方法参见:http://t.cn/EINcS4j
import jqdatasdk
jqdatasdk.auth('username', 'password')

# 获取5日平均换手率因子2018-01-01到2018-12-31之间的数据(示例用从库中直接调取)
# 聚宽因子库数据获取方法在下方
from jqfactor_analyzer.sample import VOL5
factor_data = VOL5

# 对因子进行分析
far = ja.analyze_factor(
    factor_data,  # factor_data 为因子值的 pandas.DataFrame
    quantiles=10,
    periods=(1, 10),
    industry='jq_l1',
    weight_method='avg',
    max_loss=0.1
)

# 获取整理后的因子的IC值
far.ic

结果展示:

1

# 生成统计图表
far.create_full_tear_sheet(
    demeaned=False, group_adjust=False, by_group=False,
    turnover_periods=None, avgretplot=(5, 15), std_bar=False
)

结果展示:

2

获取聚宽因子库数据的方法

  1. 聚宽因子库包含数百个质量、情绪、风险等其他类目的因子

  2. 连接jqdatasdk获取数据包,数据接口需调用聚宽 jqdatasdk 接口获取金融数据(试用注册地址)

    # 获取因子数据:以5日平均换手率为例,该数据可以直接用于因子分析
    # 具体使用方法可以参照jqdatasdk的API文档
    import jqdatasdk
    jqdatasdk.auth('username', 'password')
    # 获取聚宽因子库中的VOL5数据
    factor_data=jqdatasdk.get_factor_values(
        securities=jqdatasdk.get_index_stocks('000300.XSHG'),
        factors=['VOL5'],
        start_date='2018-01-01',
        end_date='2018-12-31')['VOL5']

将自有因子值转换成 DataFrame 格式的数据

  • index 为日期,格式为 pandas 日期通用的 DatetimeIndex

  • columns 为股票代码,格式要求符合聚宽的代码定义规则(如:平安银行的股票代码为 000001.XSHE)

    • 如果是深交所上市的股票,在股票代码后面需要加入.XSHE
    • 如果是上交所上市的股票,在股票代码后面需要加入.XSHG
  • 将 pandas.DataFrame 转换成满足格式要求数据格式

    首先要保证 index 为 DatetimeIndex 格式

    一般是通过 pandas 提供的 pandas.to_datetime 函数进行转换, 在转换前应确保 index 中的值都为合理的日期格式, 如 '2018-01-01' / '20180101', 之后再调用 pandas.to_datetime 进行转换

    另外应确保 index 的日期是按照从小到大的顺序排列的, 可以通过 sort_index 进行排序

    最后请检查 columns 中的股票代码是否都满足聚宽的代码定义

    import pandas as pd
    
    sample_data = pd.DataFrame(
        [[0.84, 0.43, 2.33, 0.86, 0.96],
         [1.06, 0.51, 2.60, 0.90, 1.09],
         [1.12, 0.54, 2.68, 0.94, 1.12],
         [1.07, 0.64, 2.65, 1.33, 1.15],
         [1.21, 0.73, 2.97, 1.65, 1.19]],
        index=['2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05', '2018-01-08'],
        columns=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE']
    )
    
    print(sample_data)
    
    factor_data = sample_data.copy()
    # 将 index 转换为 DatetimeIndex
    factor_data.index = pd.to_datetime(factor_data.index)
    # 将 DataFrame 按照日期顺序排列
    factor_data = factor_data.sort_index()
    # 检查 columns 是否满足聚宽股票代码格式
    if not sample_data.columns.astype(str).str.match('\d{6}\.XSH[EG]').all():
        print("有不满足聚宽股票代码格式的股票")
        print(sample_data.columns[~sample_data.columns.astype(str).str.match('\d{6}\.XSH[EG]')])
    
    print(factor_data)
  • 将键为日期, 值为各股票因子值的 Seriesdict 转换成 pandas.DataFrame

    可以直接利用 pandas.DataFrame 生成

    sample_data = \
    {'2018-01-02': pd.Seris([0.84, 0.43, 2.33, 0.86, 0.96],
                            index=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE']),
     '2018-01-03': pd.Seris([1.06, 0.51, 2.60, 0.90, 1.09],
                            index=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE']),
     '2018-01-04': pd.Seris([1.12, 0.54, 2.68, 0.94, 1.12],
                            index=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE']),
     '2018-01-05': pd.Seris([1.07, 0.64, 2.65, 1.33, 1.15],
                            index=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE']),
     '2018-01-08': pd.Seris([1.21, 0.73, 2.97, 1.65, 1.19],
                            index=['000001.XSHE', '000002.XSHE', '000063.XSHE', '000069.XSHE', '000100.XSHE'])}
    
    import pandas as pd
    # 直接调用 pd.DataFrame 将 dict 转换为 DataFrame
    factor_data = pd.DataFrame(data).T
    
    print(factor_data)
    
    # 之后请按照 DataFrame 的方法转换成满足格式要求数据格式

jqfactor_analyzer's People

Contributors

heyu91 avatar joinquanter avatar wony-zheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jqfactor_analyzer's Issues

官方demo跑不通,报错AssertionError: Length of new_levels (3) must be <= self.nlevels (2)

使用官方demo:

# 载入函数库
import pandas as pd
import jqfactor_analyzer as ja

# 获取 jqdatasdk 授权,输入用户名、密码,申请地址:http://t.cn/EINDOxE
# 聚宽官网,使用方法参见:http://t.cn/EINcS4j
import jqdatasdk
jqdatasdk.auth('user', 'passwd')

# 获取5日平均换手率因子2018-01-01到2018-12-31之间的数据(示例用从库中直接调取)
# 聚宽因子库数据获取方法在下方
from jqfactor_analyzer.sample import VOL5
factor_data = VOL5

# 对因子进行分析
far = ja.analyze_factor(
    factor_data,  # factor_data 为因子值的 pandas.DataFrame
    quantiles=10,
    periods=(1, 10),
    industry='jq_l1',
    weight_method='avg',
    max_loss=0.1
)

报错信息如下:
实际上,测试其他因子也报错。

(l2ck) [sunshe35@arch Alphalen-master]$  cd /home/sunshe35/Data/Projects/Alphalen-master ; /usr/bin/env /home/sunshe35/.conda/envs/l2ck/bin/python /home/sunshe35/.vscode/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 42437 -- /home/sunshe35/Data/Projects/Alphalen-master/test3.py 
auth success  ( 如需更多使用说明请查看API文档:https://www.joinquant.com/help/api/doc?name=JQDatadoc )
/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/analyze.py:258: FutureWarning: The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.
  factor_data = factor_data.stack(dropna=False)
Traceback (most recent call last):
  File "/home/sunshe35/Data/Projects/Alphalen-master/test3.py", line 16, in <module>
    far = ja.analyze_factor(
          ^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/__init__.py", line 37, in analyze_factor
    return FactorAnalyzer(factor,
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/analyze.py", line 251, in __init__
    self.__gen_clean_factor_and_forward_returns()
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/analyze.py", line 290, in __gen_clean_factor_and_forward_returns
    self._clean_factor_data = get_clean_factor_and_forward_returns(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/prepare.py", line 387, in get_clean_factor_and_forward_returns
    factor_data = get_clean_factor(factor, forward_returns, groupby=groupby,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/jqfactor_analyzer/prepare.py", line 296, in get_clean_factor
    merged_data['factor_quantile'] = quantile_data
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._set_item(key, value)
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/frame.py", line 4512, in _set_item
    value, refs = self._sanitize_column(value)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/frame.py", line 5250, in _sanitize_column
    return _reindex_for_setitem(value, self.index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/frame.py", line 12674, in _reindex_for_setitem
    reindexed_value = value.reindex(index)._values
                      ^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/series.py", line 5144, in reindex
    return super().reindex(
           ^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/generic.py", line 5607, in reindex
    return self._reindex_axes(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/generic.py", line 5630, in _reindex_axes
    new_index, indexer = ax.reindex(
                         ^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 4422, in reindex
    indexer = self.get_indexer(
              ^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3953, in get_indexer
    return self._get_indexer(target, method, limit, tolerance)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3974, in _get_indexer
    tgt_values = engine._extract_level_codes(  # type: ignore[union-attr]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "index.pyx", line 734, in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes
  File "/home/sunshe35/.conda/envs/l2ck/lib/python3.11/site-packages/pandas/core/indexes/multi.py", line 2579, in _recode_for_new_levels
    raise AssertionError(
AssertionError: Length of new_levels (3) must be <= self.nlevels (2)

far.create_full_tear_sheet 报错

TypeError Traceback (most recent call last)
in
11 far.create_full_tear_sheet(
12 demeaned=False, group_adjust=False, by_group=False,
---> 13 turnover_periods=None, avgretplot=(5, 2), std_bar=False
14 )

D:\Anoconda\app\Lib\site-packages\jqfactor_analyzer\analyze.py in create_full_tear_sheet(self, demeaned, group_adjust, by_group, turnover_periods, avgretplot, std_bar)
1461 self.plot_quantile_returns_bar(by_group=False,
1462 demeaned=demeaned,
-> 1463 group_adjust=group_adjust)
1464 pl.plt.show()
1465 self.plot_cumulative_returns(period=None, demeaned=demeaned, group_adjust=group_adjust)

D:\Anoconda\app\Lib\site-packages\jqfactor_analyzer\analyze.py in plot_quantile_returns_bar(self, by_group, demeaned, group_adjust)
1038
1039 pl.plot_quantile_returns_bar(
-> 1040 mean_return_by_quantile, by_group=by_group, ylim_percentiles=None
1041 )
1042

D:\Anoconda\app\Lib\site-packages\jqfactor_analyzer\plot_utils.py in call_w_context(*args, **kwargs)
23 with plotting_context(), axes_style():
24 sns.despine(left=True)
---> 25 return func(*args, **kwargs)
26 else:
27 return func(*args, **kwargs)

D:\Anoconda\app\Lib\site-packages\jqfactor_analyzer\plotting.py in plot_quantile_returns_bar(mean_ret_by_q, by_group, ylim_percentiles, ax)
286
287 mean_ret_by_q.multiply(DECIMAL_TO_BPS).plot(
--> 288 kind='bar', title=QRETURNBAR.get("TITLE"), ax=ax
289 )
290 ax.set(xlabel="", ylabel=QRETURNBAR.get("YLABEL"), ylim=(ymin, ymax))

D:\Anoconda\app\Lib\site-packages\pandas\plotting_core.py in call(self, *args, **kwargs)
792 data.columns = label_name
793
--> 794 return plot_backend.plot(data, kind=kind, **kwargs)
795
796 def line(self, x=None, y=None, **kwargs):

D:\Anoconda\app\Lib\site-packages\pandas\plotting_matplotlib_init_.py in plot(data, kind, **kwargs)
60 kwargs["ax"] = getattr(ax, "left_ax", ax)
61 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 62 plot_obj.generate()
63 plot_obj.draw()
64 return plot_obj.result

D:\Anoconda\app\Lib\site-packages\pandas\plotting_matplotlib\core.py in generate(self)
277 def generate(self):
278 self._args_adjust()
--> 279 self._compute_plot_data()
280 self._setup_subplots()
281 self._make_plot()

D:\Anoconda\app\Lib\site-packages\pandas\plotting_matplotlib\core.py in _compute_plot_data(self)
402 data = data._convert(datetime=True, timedelta=True)
403 numeric_data = data.select_dtypes(
--> 404 include=[np.number, "datetime", "datetimetz", "timedelta"]
405 )
406

D:\Anoconda\app\Lib\site-packages\pandas\core\frame.py in select_dtypes(self, include, exclude)
3440 # the "union" of the logic of case 1 and case 2:
3441 # we get the included and excluded, and return their logical and
-> 3442 include_these = Series(not bool(include), index=self.columns)
3443 exclude_these = Series(not bool(exclude), index=self.columns)
3444

D:\Anoconda\app\Lib\site-packages\pandas\core\series.py in init(self, data, index, dtype, name, copy, fastpath)
312 data = data.copy()
313 else:
--> 314 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
315
316 data = SingleBlockManager(data, index, fastpath=True)

D:\Anoconda\app\Lib\site-packages\pandas\core\internals\construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
710 value = maybe_cast_to_datetime(value, dtype)
711
--> 712 subarr = construct_1d_arraylike_from_scalar(value, len(index), dtype)
713
714 else:

D:\Anoconda\app\Lib\site-packages\pandas\core\dtypes\cast.py in construct_1d_arraylike_from_scalar(value, length, dtype)
1231 value = ensure_str(value)
1232
-> 1233 subarr = np.empty(length, dtype=dtype)
1234 subarr.fill(value)
1235

TypeError: Cannot interpret '<attribute 'dtype' of 'numpy.generic' objects>' as a data type

Performance 计算分位差值,变量错了

performance.py 中的函数 compute_mean_returns_spread 中,计算分位标准差差值,下分位变量错了

   if isinstance(std_err.index, pd.MultiIndex):
        std1 = std_err.xs(upper_quant, level='factor_quantile')
        std2 = std_err.xs(lower_quant, level='factor_quantile')
    else:
        std1 = std_err.loc[upper_quant]
        std2 = std_err.loc[upper_quant]
    joint_std_err = np.sqrt(std1**2 + std2**2)

calc_top_down_cumulative_returns 错误

def calc_top_down_cumulative_returns(self, period=None,
                                     demeaned=False, group_adjust=False):     
    if period is None:
        period = self._periods[0]
    period_col = convert_to_forward_returns_columns(period)
    mean_returns, _ = self.calc_mean_return_by_quantile(
        by_date=True, by_group=False,
        demeaned=demeaned, group_adjust=group_adjust,
    )
    mean_returns = mean_returns.apply(rate_of_return, axis=0) 
    #  period>1时,这里重复计算了。rate_of_return取了一次(1/N)次幂
    #  然后performance.cumulative_returns里又取了一次(1/N)次幂
    #  计算因子加权、分位数累计收益时,没有这个问题。只有top-down的多算了一次
    #  见图,纵坐标差太多

    upper_quant = mean_returns[period_col].xs(self._factor_quantile,
                                              level='factor_quantile')
    lower_quant = mean_returns[period_col].xs(1,
                                              level='factor_quantile')
    return pef.cumulative_returns(upper_quant - lower_quant, period=period)






def rate_of_return(period_ret):

period = int(period_ret.name.replace('period_', ''))
return period_ret.add(1).pow(1. / period).sub(1)



def cumulative_returns(returns, period):

returns = returns.fillna(0)

if period == 1:
    return returns.add(1).cumprod()
#
# 构建 N 个交错的投资组合
#

def split_portfolio(ret, period):
    return pd.DataFrame(np.diag(ret))

sub_portfolios = returns.groupby(
    np.arange(len(returns.index)) // period, axis=0
).apply(split_portfolio, period)
sub_portfolios.index = returns.index

#
# 将 N 期收益转换为 1 期收益, 方便计算累积收益
#

def rate_of_returns(ret, period):
    return ((np.nansum(ret) + 1)**(1. / period)) - 1

sub_portfolios = rolling_apply(
    sub_portfolios,
    window=period,
    func=rate_of_returns,
    min_periods=1,
    args=(period,)

)
sub_portfolios = sub_portfolios.add(1).cumprod()

#
# 求 N 个投资组合累积收益均值
#
return sub_portfolios.mean(axis=1)

Selection_002
Selection_001

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.