GithubHelp home page GithubHelp logo

limepy's Introduction

LimeSurvey is open-source survey software. Using pandas, the limepy package simplifies a number of tasks when working with LimeSurvey data:

  • Downloading survey data. This requires that LimeSurvey’s RemoteControl 2 API is enabled, as explained here.
  • Creating a list of al the questions in the survey, with metadata.
  • Summarising data, e.g. creating value counts for multi-column items such as multiple-choice questions; calculating averages for number arrays; or creating scores for a ranking question.
  • Printing answers to open-ended questions.
  • Printing the answers of an individual respondent.

Note that limepy uses f-strings and therefore requires Python 3.6 or higher.

Use at your own risk and please make sure to check the results.

Installation

$ pip install limepy

How is it different

There are various python packages for managing the LimeSurvey RemoteControl 2 API. While limepy can help you download survey data, the emphasis is on processing and summarising the data.

Examples

Download survey data

You can download survey data with the RemoteControl 2 API (provided the api is enabled in your LimeSurvey installation).

For a one-off download, you can of course do this manually. However, you may want to use the api if you want to write a preliminary report based on the first responses, and then automatically update it as new responses come in.

from pathlib import Path
from limepy import download

csv = download.get_responses(base_url, user_name, password, user_id, sid)
path = Path('../data/responses.csv')
path.write_text(csv)

Create Survey object

A Survey object contains the data and metadata of a survey. To create a Survey object, you need:

  • A csv containing the survey results. You can download it manually or use the api as described above. Make sure to set heading type to 'Question code' and reponse type to 'Answer codes'. If using the api to download, the file will be delimited with ; rather than ,.
  • An .lss file containing the survey structure. You can download this manually.
from limepy.wrangle import Survey, Question
import pandas as pd

df = pd.read_csv('../data/responses.csv', sep=';')
with open('../data/structure.lss', encoding="utf8") as f:
    my_structure = f.read()

my_survey = Survey(df, my_structure)

If you wish to remove html tags from the questions, set strip_tags=True.

If you have a multilingual questionnaire, then you can select the language the group names, questions, answers and help texts should be presented in, e.g. language='nl' for Dutch.

Note: if you use a merged dataframe (for example, data from various versions of the same questionnaire), you should reset the index before creating a Survey object.

Get list of questions with metadata

my_survey.question_list

Print results for individual respondent

The respondent method will return a string listing the answers of an individual respondent. You need the respondent’s row index.

my_survey.respondent(26)

Create a readable dataframe

Create a dataframe with full questions as column names and ‘long’ responses as values.

my_survey.readable_df

Create a Question object

A Question object can be used to summarise data. To create a Question oject, you need a Survey object and the question id (find it in the index of the question list).

my_question = Question(my_survey, 3154)

If you want to use a subset of the respondents for your analysis (e.g., exclude respondents that do not meet certain criteria, or drop duplicates), the most practical approach is probably to create a subset first and use that to create your Survey object. However, you can also use a mask if you want to create a Question object for a subset of the respondents.

my_question = Question(my_survey, 3154, mask=pd.notnull(df.iloc[:, 8]))

Summarise answers to a question

For many question types, limepy can summarise the results.

  • In many cases, this will return a dataframe containing value counts (as well as Percent and Valid Percent).
  • In case of a Numerical input question, the output will be a dataframe containing the results of the pandas DataFrame describe method.
  • In case of a Numbers array question, the average will be calculated for each option (but you must specify the method, i.e. 'mean' or 'median').
  • In case of a Ranking question, the result will be a dataframe with scores calculated for each item.
  • If no method has been implemented for a question type, a dataframe will be returned which contains the columns associated with the question.
my_question.summary

To show the metadata associated with a question:

my_question.metadata

Compare groups

Limepy currently has no method to compare groups, but you can write a function to do so (the example below may not work with all question types).

def compare(qid, category_variable, how='Valid Percent'):
    """Compare answers for groups based on category variable"""
    summaries = []
    for group in set(df[category_variable]):
        if pd.isnull(group):
            continue
        mask = list(df[category_variable] == group)
        q = Question(my_survey, qid, mask=mask)
        summary = q.summary
        if how in list(summary.columns):
            summary = summary[[how]]
        summary.columns = [group]
        summaries.append(summary)
    return pd.concat(summaries, axis=1)

Write answers to an open-ended question

The write_open_ended method creates a string listing all the answers to the question. Optionally, you can specify a list of indices of columns that contain background information you want included in the output.

my_question.write_open_ended(background_column_indices=[9])

You can also create a folder and store text files containing the answers to all open-ended questions in the survey.

from pathlib import Path

remove = ' _?:/()'

def include(row):
    for string in ['free text', 'comment']:
        if string in row.question_type:
            return True
    if row.other == 'Y':
        return True
    return False

for qid, row in my_survey.question_list.iterrows():
    if include(row):
        question = row.question
        for char in remove:
            question = question.replace(char, ' ')
        question = question[:25]
        path = Path('../data/open_ended') / f'{qid} {question}.md'
        path.write_text(Question(sv, qid).write_open_ended(background_column_indices=[9]))

Create report as html

def add_table(question, question_text=None):
    """Add table summarising question"""

    if not question_text:
        question_text = question.question
    html = f"<div class='tableHeader'>{question_text}</div>\n"
    html += question.summary.to_html() + '\n'
    help_txt = question.metadata['help']
    if help_txt:
        html += f"<div class='tableCaption'>{help_txt}</div>"
    return html


html = """<head>
<title>Title</title>
<link rel="stylesheet" href="styles.css">
<meta charset="utf-8">
</head>
<body>
"""

my_question = Question(my_survey, 44)
html += add_table(my_question)

html += "</body>"

Inspect original data

If you want to inspect the original data for a specific question, for example because you want to process answers to an ‘other’ option, then you can use the question title (you can look up the title using my_survey.question_list.

title = 'G01Q07'
colnames = [c for c in df.columns if title in c]
df[colnames]

limepy's People

Contributors

dirkmjk avatar henrytdsimmons avatar jeanbaptisteb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

limepy's Issues

`TypeError: string indices must be integers` in wrangle.py

While trying to build a simple pipeline, I encountered an error that I can't seem to fix on my end.
The error being raised:
/python3.10/site-packages/limepy/wrangle.py", line 134, in <dictcomp> if item['language'] == language
TypeError: string indices must be integers

The section of code mentioned:

            if not isinstance(items, list):
                items = [items]
            question_l10ns = {
                item['qid']:item['question']
                for item
                in items
                if item['language'] == language

My code can be found here.

Did not work out of apache (deafult nginx or IIS)

The url set is OK only if you use apache.

Not with IIS or nginx server (it's /index.php?r=admin/remotecontrol )

Workaround : set the url to https://example.org/limesurvey/index.php?r=admin/remotecontrol&

Maybe an option to avoid the workaround ?

Changes due to manual editing of LimeSurvey responses

Currently, limepy assumes multiple choice options are checked if they don’t have a missing value. This does not take into account changes caused by manual data editing of LimeSurvey responses. Specifically, a checkbox array option normally has a value 1 when checked and a missing value when not checked, but after manual editing, options that are not checked may get a 0.

Problem with single-language surveys

With single-language surveys languages will be a string, but it is expected to be a list. This will cause problems when creating a Survey object without specifying the language.

License

Hi -- I noticed that LimePy does not have a license in this repo, but in PyPI it is stated that it is MIT license. It might be useful to provide a license file in the github repository as well, just to avoid issues.

user_id unclear in documentation

Sorry but I am not sure what to enter for user_id?

I can use the login on my base url and visit index.php/admin/remotecontrol.

I tried to enter my username twice:

from limepy import download

download.get_session_key(
    base_url,
    "my_username",
    "my_pw",
    "my_username",
)

But I just get:

{'id': "my_username'', 'result': {'status': 'Invalid user name or password'}, 'error': None}

I'm sure that I entered the username and pw correctly as they are the same as for the login via base url + /admin, right?

As I wasn't able to find anything related to an user_id on https://api.limesurvey.org/classes/remotecontrol_handle.html or on https://manual.limesurvey.org/RemoteControl_2_API#Python_example_and_glue

Maybe a small note on the README page could be helpful for new users?

download.get_responses(base_url, user_name, password, user_id, sid) error


JSONDecodeError Traceback (most recent call last)
in
8 sid = '156133'
9
---> 10 csv = download.get_responses(base_url, user_name, password, user_id, sid)
11 path = Path('../data/responses.csv')
12 path.write_text(csv)

5 frames
/usr/lib/python3.8/json/decoder.py in raw_decode(self, s, idx)
353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Is it possible to read a LSS file without having a dataframe of answers?

Hi,

Nice library.

Here's my problem: I have a survey structure file (LSS format), without any answer yet.

I'd like to use this library to explore the survey structure. But from the example given on the main page and from the source code, my understanding is that it's necessary to have some dataframe containing the answers, in order to read the structure file.

Am I correct, or is there a way to read the LSS file on its own?

If it's not currently possible, it would be a pretty useful feature!

Thanks.

Getting a KeyError: 'help' when creating survey object

Hi,
I manually downloaded my CSV using Question code and Answer codes as settings and used your sample code to build my survey object:

df = pd.read_csv('code-results-survey543837.csv', sep=';')
with open('limesurvey_survey_872183.lss') as f:
    my_structure = f.read()
my_survey = Survey(df, my_structure)

But the my_survey object is failing. Here's the traceback:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_824/587664489.py in <module>
      3     my_structure = f.read()
      4 
----> 5 my_survey = Survey(df, my_structure)
~/.local/lib/python3.8/site-packages/limepy/wrangle.py in __init__(self, dataframe, structure, language, strip_tags)
     52         self.language = language
     53         self.strip_tags = strip_tags
---> 54         self.questions, self.groups = self.parse_structure(structure)
     55         self.question_list = self.create_question_list()
     56         self.readable_df = self.create_readable_df()
~/.local/lib/python3.8/site-packages/limepy/wrangle.py in parse_structure(self, structure)
    108                 if item['language'] == language
    109             }
--> 110             question_l10ns_help = {
    111                 item['qid']:item['help']
    112                 for item
~/.local/lib/python3.8/site-packages/limepy/wrangle.py in <dictcomp>(.0)
    109             }
    110             question_l10ns_help = {
--> 111                 item['qid']:item['help']
    112                 for item
    113                 in items
KeyError: 'help'

Any idea what I'm doing wrong?
Thanks!

NoneType object causing crash at survey creation

Hi,

I am not able to create a survey object from my data (manually exported), despite I carefully set heading type to 'code' and reponse type to 'short'.

My code is:

from limepy.wrangle import Survey, Question
import pandas as pd

df = pd.read_csv('results-survey618718_code.csv', sep=',')
with open('limesurvey_survey_618718.lss') as f:
    my_structure = f.read()

my_survey = Survey(df, my_structure)

And I get:

  File "/home/xxxxxxx/anaconda3/lib/python3.9/site-packages/limepy/wrangle.py", line 262, in create_readable_df
    colname = colname.replace('\n', ' ')

AttributeError: 'NoneType' object has no attribute 'replace'

Is it a bug? Or am I misusing the tool? Thanks for your help.

Here is the CSV file I use:


"id","submitdate","lastpage","startlanguage","startdate","datestamp","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","matiere_SQ001","matiere_SQ002","matiere_SQ003","matiere_SQ004","matiere_SQ005","matiere_SQ006","suggestion","general_SQ001","general_SQ002","general_SQ003","general_SQ004","general_SQ005","ptspositifs_SQ001_SQ001","ptspositifs_SQ002_SQ001","ptspositifs_SQ003_SQ001","ptsnegatifs_SQ001_SQ001","ptsnegatifs_SQ002_SQ001","ptsnegatifs_SQ003_SQ001","commentaire","poisson"
"10","2022-04-01 07:55:43","17","fr","2022-04-01 07:38:41","2022-04-01 07:55:43","A2","A3","A1","A2","A2","A2","+ d'applications en cinématiques ","A2","A1","A1","A1","A1","A1","DS parfois trop long","A2","A2","A2","A2","A2","A2","","A2","A1","A1","A1","A1","A1","","A2","A1","A1","A1","A1","A1","","A2","A2","A1","A1","A1","A1","","A1","A1","A1","A1","A1","A1","","A2","A1","A2","A2","A2","A2","","A1","A1","A1","A1","A1","A1","","A1","A1","A2","A1","A1","A2","","A2","A2","A1","A1","A1","A1","","A2","A2","A1","A1","A2","A1","","A1","A2","A1","A1","A1","A2","","A2","A1","A1","A1","A1","A1","","A2","A3","A4","A4","A2","A2","Rendre les cours plus concrets","A1","A2","A2","A1","A1","Professionalisation ","Bonne ambiance ","Échelle humaine","","","","","Bonne idée ! "
"11","2022-04-01 07:49:16","17","fr","2022-04-01 07:38:42","2022-04-01 07:49:16","A2","A2","A4","A4","A4","A4","Problème de compréhension  avec le prof ","A2","A2","A2","A2","A2","A2","","A1","A1","A1","A1","A1","A1","","A2","A1","A2","A2","A2","A2","","A2","A2","A2","A2","A2","A2","","A1","A1","A1","A1","A1","A1","","A2","A2","A2","A2","A2","A2","","A2","A2","A2","A2","A2","A2","","A2","A2","A2","A2","A3","A2","","A2","A2","A3","A3","A2","A2","","A2","A2","A2","A2","A2","A2","","A1","A1","A1","A1","A1","A1","","A2","A2","A3","A2","A2","A2","","A3","A3","A3","A2","A2","A2","","A2","A2","A2","A2","A2","A2","","A1","A1","A1","A1","A3","Le niveau ","L'ambiance ","Soirée ","Le contenu","Certains profs ","Horaire ","","C'est une bonne idée "

Survey in multiple languages

I'm using LimeSurvey version 2.62. with two languages. If I create a Question object my_question and then print my_question.question and my_question.summary I get the question title only in one language, and the answers for each language. Is there a way to access the languages explicitly?

Mussten Sie Informationen angeben, um Zugriff auf das Korpus zu erhalten?

      Count  Percent  Valid Percent
Yes    11.0     30.6           73.3
No      4.0     11.1           26.7
Ja     11.0     30.6           73.3
Nein    4.0     11.1           26.7

The beginning of my survey.lss file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<document>
 <LimeSurveyDocType>Survey</LimeSurveyDocType>
 <DBVersion>261</DBVersion>
 <languages>
  <language>de</language>
  <language>en</language>
 </languages>
 <answers>
  <fields>
   <fieldname>qid</fieldname>
   <fieldname>code</fieldname>
   <fieldname>answer</fieldname>
   <fieldname>assessment_value</fieldname>
   <fieldname>sortorder</fieldname>
   <fieldname>language</fieldname>
   <fieldname>scale_id</fieldname>
  </fields>
  <rows>
   <row>
    <qid><![CDATA[501861]]></qid>
    <code><![CDATA[A1]]></code>
    <answer><![CDATA[Yes]]></answer>
    <assessment_value><![CDATA[0]]></assessment_value>
    <sortorder><![CDATA[1]]></sortorder>
    <language><![CDATA[en]]></language>
    <scale_id><![CDATA[0]]></scale_id>
   </row>
   <row>
    <qid><![CDATA[501861]]></qid>
    <code><![CDATA[A1]]></code>
    <answer><![CDATA[Ja]]></answer>
    <assessment_value><![CDATA[0]]></assessment_value>
    <sortorder><![CDATA[1]]></sortorder>
    <language><![CDATA[de]]></language>
    <scale_id><![CDATA[0]]></scale_id>
   </row>


No way to get question code from Question object?

I'm using limepy to create a PDF report for a survey.

Iterating through the questions, I have a Question object for every question, which is usually enough to build the report. However, it looks like some details are missing, e.g. for multiple choice questions with an Other option, I'm unable to find the freeform entries users made. I'd be happy to look them up manually in the survey's dataframe, but I don't see a way to get the question code from a Question object.

Maybe I'm overlooking something obvious – but if not: Could you add a code property to the Question class?

some open ended questions are just empty

Hi,
first of all, thanks for your amazing library, it helps me a lot for my bachelors thesis.
But i noticed one problem: some questions dont work with the write_open_ended function.
The question itself gets written but without answers.
As of right now i cannot figure out why that happens, but i'll take a look.
If i print the corresponding df everything looks fine and not different from questions that gets written properly.

Any ideas?

KeyError while loading question

I'm a little bit lost and maybe I'm missing something, while initializing some questions I receive a "KeyError". I'm still new to Python, maybe it is something obvious I have overlooked.

Encoding issues

Hi! I'm using download.get_responses and the connection to remote-control web if working perfectly, but for some sid's I have the error 'charmap' codec can't encode character '\U0001f449' in position 1064935: character maps to <undefined>
I tried to redefine the encoding using sys.getdefaultencoding(), 'replace but I'm thinking how can I add this line to the reading process before I could do the path.write_text(csv)
This is happening with some of the surveys I had on my website, not all of them.
If you can help me with that or give me some ideas, I'll appreciate it a lot. Thank you.

AttributeError: 'dict' object has no attribute 'encode'

I'm getting the following error while trying to execute download.get_responses

~/.virtualenvs/sandbox/lib/python3.6/site-packages/limepy/download.py in get_responses(base_url, user_name, password, user_id, sid, lang, document_type, completion_status, heading_type, response_type, from_response_id, to_response_id, fields)
84 document_type, completion_status, heading_type,
85 response_type, from_response_id, to_response_id,
---> 86 fields)
87 release_session_key(base_url, session_key, user_id)
88 return csv

~/.virtualenvs/sandbox/lib/python3.6/site-packages/limepy/download.py in export_responses(user_id, base_url, session_key, sid, lang, document_type, completion_status, heading_type, response_type, from_response_id, to_response_id, fields)
39 req = requests.post(api_url, json=payload)
40 result = req.json()['result']
---> 41 csv = base64.b64decode(result.encode()).decode('utf-8-sig')
42 return csv
43

AttributeError: 'dict' object has no attribute 'encode'

Changes in data structure in LimeSurvey 4

In order to prevent duplicated question data when a survey uses more than one language, the data structure has been changed in LimeSurvey 4. As a result of the changes, creating a Survey object with limepy currently fails.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.