jbgruber / rwhatsapp Goto Github PK

An R package for working with WhatsApp data 💬

R 100.00%

rwhatsapp's Introduction

Hi there 👋

I am a Post-Doc Researcher at Vrije Universiteit (VU) Amsterdam, working on the NEWSFLOWS project (based at the University of Amsterdam until February 2024). Until September 2023 I worked at the Department of Communication Science at Vrije Universiteit Amsterdam working on the OPTED project and AmCAT (Amsterdam Conent Analyis Toolkit. Since April, I also started to work on the NEWSFLOWS project at the Department of Communication Science at the University of Amsterdam. Previously, I worked as Post-Doc Researcher at the Chair for Digital Democracy at the European New School of Digital Studies (ENS), European University Viadrina Foundation Frankfurt (Oder). In 2021, I passed my PhD in Politics at the University of Glasgow.

🔭 I’m currently working on several R packages for research and doing research on hybrid media systems and computational methods.
💬 Ask me about R, text analysis and political communication

I'm present on a couple of different platforms, in case you want to reach or follow me:

🦋 @jbgruber.bsky.social
🐘 @[email protected]
~~🐦 @JohannesBGruber~~ (legacy account)
❓ https://stackoverflow.com/users/5028841/jbgruber
📫 [email protected]

rwhatsapp's People

Contributors

Stargazers

Watchers

Forkers

forensic-id fmigone imarcello whrl makiisthenes daleharr aveuglevisionnaire pradeepprasad josamartinezv duniatri immanuel10 jumagoca78 dragnnel hugoce79 pat-richter anubhavhere adejumoridwan jfontestad

rwhatsapp's Issues

Inquiry for reading txt files containing characteristics of Chinese language

Thank you so much for this package. It works well with files in English. However, when I am trying to import files in Chinese, Rstudio cannot show Chinese characteristics properly.

Meanwhile, the Chinese characteristics can be shown in the environment.

so I am wondering how to import the Chinese words properly for future analysis.
Thank you so much for your help.

Importing Multiple Files

I am not sure if this is an issue, but I am trying to import multiple files using this package but I can't seem to get it right? Is there a way I can do that, even if it is using a for loop or lapply?

rwa_read issue

Hi, thanks for the functions, very helpful.

However, rwa_read is not able to read msgs consisting of more than 2 lines. The function just picks up the top 2 lines of the msg. Are there any parameter settings to correct this?

Thanks!
Dobrin

Problem with emoji

Hello @JBGruber

I have a problem when I try to read my whatsapp data.
The problem has already been asked in issue #29 but without solutions.

x <- "D:/Dev/newwhat/data/Chat - Stat-Inf_ Job&Scholarship.txt"
rwa_read(x)

Error in split.default(lookup$emoji, lookup$id) :
first argument must be a vector
In addition: Warning messages:
1: Unknown or uninitialised column: emoji.
2: Unknown or uninitialised column: emoji.

This is the data : Chat - Stat-Inf_ Job&Scholarship.txt

Thanks

Firstly, thanks for this package - it has literally saved me and my future Phd

When using my own txt file the number of messages plot always shows NA - how did you get around this - as it seems this doesn't happen to your data?

Also, I think I really good function would be the average number of words each person says in a message - after all it might seem as though I have sent a bunch more messages but they could be one worded.

Just an idea.

Uploading WhatsApp data to Shiny

Hi, I am creating an App where a User can upload data to Shiny. I have tried researching on how I can do that but I always get an error.

Warning in rwhatsapp::rwa_read(input$file$datapath) :
Time conversion did not work correctly. Provide a custom format or add an issue at www.github.com/JBGruber/rwhatsapp.
Warning: Error in is.finite: default method not implemented for type 'list'
108: formatC
106: print.xtable
97: transform
96: func
94: f
93: Reduce
84: do
83: hybrid_chain
82: origRenderFunc
81: output$head
1: runApp

What could be the issue, or how can I go about it.

Error in cbind_all

Hi, I'm sorry if this is a very basic issue I've tried to read a couple files and I get this error for some of the files:
Error in cbind_all(x) : Argument 2 must be length 3471, not 3470

What should I be looking to fix here if it's on my files?

Author fails to extract if text contains `:` and two or more linebreaks

The author of a message seems to be incorrectly reported as NA if the message text contains both a : and two or more linebreaks.

The following should be a minimum reproducible example:

example.zip

`chat0.txt`

08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.
3rd line.

`chat1.txt`

08.02.20, 17:35 - First Last: The time is 17:35.
2nd line.

`test.Rmd`

---
output: html_notebook
---

```{r}
library("rwhatsapp")
chat0 <- rwa_read("chat0.txt")
chat0
chat1 <- rwa_read("chat1.txt")
chat1
```

It reports NA as author of the message in chat0.txt and First Last as author of the message in chat1.txt.

I don't know if this is related to #14, as I didn't quite understand what that issue is about. Excuse me if it is a duplicate.

Cuts off chat from 2020 for a specific group

Hey,
First of all - What an amazing package! One simple functions that does a fantastic job reading a chat file.

I have a text file with text from 2016-July 2020. The argument reads and parses all the chat messages until the last one for 2019 and then stops (doesn't throw an error, it just stops at the end of 2019).

When I tried splitting it up such as taking all the unread 2020 chat to a different file and read them separately it still didn't work and I get the following error:

> chat_2 <- rwa_read("chat2.txt")
Error: Input must be a vector, not NULL.
Run rlang::last_error() to see where the error occurred.
In addition: Warning message:
Unknown or uninitialised column: `emoji`. 
> rlang::last_error()
<error/vctrs_error_scalar_type>
Input must be a vector, not NULL.
Backtrace:
 1. rwhatsapp::rwa_read("chat2.txt")
 8. vctrs:::stop_scalar_type(.Primitive("quote")(NULL), "")
 9. vctrs:::stop_vctrs(msg, "vctrs_error_scalar_type", actual = x)

I also tried deleting the first few rows of 2020 thinking it was something there but it's plain chat. I will say that it works perfectly for a different group chat I have spanning before and after 2020. It also takes it longer to parse almost the same amount of messages compared to the second group that works for 2020.

Any suggestions?

Thanks!

Problems with rwa_read

Hi!
rwa_read gives this error (and two warnings). Tried it with several chats, even chats without emojis, but it is the same. I cannot figure it out unfortunately. Any idea?

chat <- rwa_read("wz.txt", verbose=FALSE)
Warning: Unknown or uninitialised column: emoji.
Warning: Unknown or uninitialised column: emoji.
Error in split.default(lookup$emoji, lookup$id) :
first argument must be a vector

Session Info:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rwhatsapp_0.2.4 tableone_0.13.0 randomForestSRC_3.0.0 ggfortify_0.4.14
[5] survival_3.2-13 survminer_0.4.9 ggpubr_0.4.0 mgcv_1.8-38
[9] nlme_3.1-152 scales_1.1.1 ggplot2_3.3.5 tidyr_1.1.4
[13] dplyr_1.0.7 haven_2.4.3 readr_2.1.1

loaded via a namespace (and not attached):
[1] colorspace_2.0-2 ggsignif_0.6.3 ellipsis_0.3.2 class_7.3-19 rprojroot_2.0.2
[6] markdown_1.1 fs_1.5.2 gridtext_0.1.4 ggtext_0.1.1 rstudioapi_0.13
[11] proxy_0.4-26 farver_2.1.0 remotes_2.4.2 bit64_4.0.5 fansi_1.0.3
[16] xml2_1.3.3 splines_4.1.2 cachem_1.0.6 knitr_1.37 pkgload_1.2.4
[21] jsonlite_1.7.3 broom_0.7.11 km.ci_0.5-2 data.tree_1.0.0 DiagrammeR_1.0.8
[26] compiler_4.1.2 backports_1.4.1 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0
[31] survey_4.1-1 cli_3.2.0 visNetwork_2.1.0 htmltools_0.5.2 prettyunits_1.1.1
[36] tools_4.1.2 gtable_0.3.0 glue_1.6.2 Rcpp_1.0.8 carData_3.0-5
[41] vctrs_0.4.0 xfun_0.29 stringr_1.4.0 ps_1.6.0 brio_1.1.3
[46] testthat_3.1.2 lifecycle_1.0.1 devtools_2.4.3 rstatix_0.7.0 zoo_1.8-9
[51] vroom_1.5.7 hms_1.1.1 parallel_4.1.2 RColorBrewer_1.1-2 yaml_2.2.2
[56] curl_4.3.2 memoise_2.0.1 gridExtra_2.3 KMsurv_0.1-5 labelled_2.9.0
[61] stringi_1.7.6 desc_1.4.0 e1071_1.7-9 pkgbuild_1.3.1 rlang_1.0.2
[66] pkgconfig_2.0.3 evaluate_0.14 lattice_0.20-45 purrr_0.3.4 htmlwidgets_1.5.4
[71] labeling_0.4.2 bit_4.0.4 tidyselect_1.1.1 processx_3.5.2 magrittr_2.0.3
[76] R6_2.5.1 generics_0.1.1 DBI_1.1.2 pillar_1.7.0 withr_2.4.3
[81] abind_1.4-5 tibble_3.1.6 crayon_1.5.1 car_3.0-12 survMisc_0.5.5
[86] utf8_1.2.2 tzdb_0.2.0 rmarkdown_2.11 usethis_2.1.5 grid_4.1.2
[91] data.table_1.14.2 callr_3.7.0 forcats_0.5.1 digest_0.6.29 xtable_1.8-4
[96] munsell_0.5.0 mitools_2.4 sessioninfo_1.2.2

issue report

Hi,
I encountered an error while working. It was fixed when I did - instead of + in the emoji url.
actual https://abs.twimg.com/emoji/v2/72x72/1f44d+1f3fb.png
new https://abs.twimg.com/emoji/v2/72x72/1f44d-1f3fb.png

Consistant problems to read the TXT

There is an error that keeps appearing to me sometimes when I run a code to read a TXT

chat_clean_Arena <- rwa_read("Arena Divulgacao - Imperium.txt")
Error in split.default(lookup$emoji, lookup$id) :
first argument must be a vector

It is something related to some of the chat message itself in the TXT, because I deleted some messages and than it worked.
But I had a hard time to discover which message was triggering this error, I had to delete one by one until the code works.
And I couldn't find a padron between the messages to explain why this is happening.

I hope you can help me, is driving me crazy.

I will attach a file that is showing this error
Arena Divulgacao - Imperium.txt
( this one I couldnt find which was the error message) and put here some of the chat messages that triggered the problem previouly in another file, as you can see, one of the messages don't even have an emoji into it:

[23/08/2023 20:28:08] Moara Souto Vieira: Oi lindezas tudo bem por aí? Hj é dia de relato por aqui..

Como contei para vcs semana passada eu estou bem reflexiva sobre carreira/trabalho.

Em 2020, sai de uma empresa onde trabalhava 10 anos como Gestora de RH.Onde tinha uma posição estável digamos assim, mas não fazia mais meus olhos brilharem, vivia não concordando com os valores da empresa e isso me machucava muito.

Quando saí, foi uma mistura de sentimentos,veio alívio, medo, insegurança e certeza que era o sinal que eu precisava….

Mas, claro que o que me faltava muito era clareza do que queria…naquele momento só tinha clareza do que não queria.. Já é um passo sabe.. Eu já sabia que ali não me energizava mais…

E fui caminhando aos poucos, estudando, conversando com muita gente para começar a trilhar minha mudança de carreira… na verdade temos muito medo do incerto, do novo e do que não conhecemos…Hoje tenho certeza que devemos ir com medo mesmo…. O medo faz parte da vida, ele sempre estará presente, ele quer dizer que estamos em caminhos novos e que a nossa mente ainda não se adaptou sabe…

Depois de 3 anos ainda continuo trilhando meu caminho, nada tá resolvido e seguro.. E é assim a vida..Sempre estaremos em movimento. Mas com certeza não estou no mesmo lugar, e isso já é lindo demais e tem que ser comemorado não é mesmo?

Vou deixar aqui, uma carta para o medo que eu sempre leio pra mim…

_Desejo que vocês curtam
Beijokas 😙

[25/08/2023 18:30:01] Moara Souto Vieira: Filhos, idealizações e felicidade

“Nunca idealize os outros. Eles nunca vão alcançar as suas expectativas.” ~ Leo Buscaglia

Oi meninas, sextou… e queria trazer um texto que li em um blog e achei muito interessante sobre o que muitas vezes temos que nos atentar para não idealizarmos a vida e carreira de nossos filhos.

https://www.lagartavirapupa.com.br/post/filhos-idealiza%C3%A7%C3%B5es-e-felicidade

E como a gente pode ajudar nossos filhos a encontrar suas aptidões e habilidades?

Uma delas é observar seus interesses e hobbies, incentivando-os a explorar diferentes atividades relacionadas a esses temas. Além disso, é importante que os pais ofereçam um ambiente seguro e acolhedor para que seus filhos possam se expressar livremente e compartilhar suas ideias e pensamentos.

Conta pra gente como que vocês estimulam as habilidades por aí nas crias?

[30/08/2023 17:08:58] Moara Souto Vieira: Fase 6- Finanças

“Não importa quem você é ou qual é sua idade: se quiser conquistar sucesso permanente e sustentável, sua motivação precisa vir de dentro. (…) Deve ser pessoal, ter raízes profundas e fazer parte de seus pensamentos mais íntimos. Paul J. Meyer –

Historicamente afastadas do mundo das finanças, mulheres devem aprender a lidar com dinheiro desde cedo para que possam assumir o protagonismo financeiro. O quanto você entende de finanças?

De acordo com um estudo do Banco de Investimentos Merrill Lynch, 61% das mulheres preferem falar sobre a própria morte do que sobre dinheiro. O resultado mostra que, para elas, o dinheiro supera um dos maiores tabus de nossa sociedade que é a morte. Muitas razões podem explicar esse medo de tratar o tema. Um deles é o fato de que até pouco tempo atrás as mulheres não podiam cuidar da sua própria vida financeira. No Brasil, as mulheres tiveram direito de ter sua própria conta bancária somente na década de 1960. Antes disso, mulheres casadas precisavam da autorização do marido para trabalhar fora.
Essa parte da roda da vida está ligada às suas finanças. Você está ganhando dinheiro suficiente para manter o padrão de vida que você deseja? Veja, não é sobre ganhar mais dinheiro, é sobre ter o suficiente para bancar o que você almeja.

🚩Cuidar do pilar finanças em nossa roda da vida envolve algumas práticas importantes:

📝Orçamento: Faça um planejamento financeiro, estabelecendo metas e acompanhando seus gastos e receitas. Isso ajudará a controlar seu dinheiro de forma mais eficiente.

🤑Poupança: Reserve uma parte de sua renda para economizar regularmente. Isso pode ser útil para lidar com imprevistos e alcançar objetivos financeiros de longo prazo.

💸Investimentos: Considere investir seu dinheiro de forma inteligente, levando em conta seu perfil de risco e objetivos financeiros. Consultar um profissional pode ser útil nesse processo.

🚨Dívidas: Gerencie suas dívidas de forma responsável, evitando acumular juros altos. Priorize o pagamento das dívidas com taxas mais elevadas.

👩‍🏫Educação financeira: Busque conhecimento sobre finanças pessoais e invista em sua educação financeira. Isso o ajudará a tomar decisões mais informadas e a alcançar uma maior estabilidade financeira.

Lembre-se de que cada pessoa tem circunstâncias financeiras únicas, portanto, é importante adaptar essas práticas às suas necessidades e objetivos individuais.

Bora fazer dinheiro 💰 perfeitas😍😍😍

The latest export data format has changed

I exported the latest chat zip file and found the field 'time' has been wrapped with []: An example:
[17/05/21, 9:15:42 PM] Person Name: hello world!

Unable to parse emoji column

rlang::last_error()

message: `by` can't contain join column `emoji` which is missing from LHS class: `rlang_error` backtrace: 1. rwhatsapp::rwa_read(history) 5. rwhatsapp:::rwa_add_emoji(tbl) 7. dplyr:::left_join.tbl_df(out, rwhatsapp::emojis, by = "emoji") 9. dplyr:::common_by.character(by, x, y) 10. dplyr:::common_by.list(by, x, y) 11. dplyr:::bad_args(...) 12. dplyr:::glubort(fmt_args(args), ..., .envir = .envir) Call `rlang::last_trace()` to see the full backtrace

Oversensitive message regex

I seem to have found a few more situations in which the existing regular expression is overly eager in identifying messages at wrong places.

Given the following input, rwhatsapp currently identifies the authors "First Last", N/A, and "another usage of a hyphen.\nSentence with a colon".

06.02.20, 09:33 - First Last: Line 1.

Line 2 - usage of a hyphen as en dash to connect sentences,

Line 3 - another usage of a hyphen.

Sentence with a colon: other part of sentence.
06.02.20, 09:41 - First Last: Message 2.

While I don't know too much about the current implementation I think it might be possible to use positive lookaheads to make the implementation more strict.

The following regex doesn't fit all use cases (i.e. mainly the different date formats) but correctly parses the above message:

(?<datetime>[0-9]{2}\.[0-9]{2}\.[0-9]{2}. [0-9]{2}:[0-9]{2}) - (?:(?<sender>.+):\s+)?(?<text>[\s\S]+?)(?=(?:\n[0-9]{2}\.[0-9]{2}\.[0-9]{2}, [0-9]{2}:[0-9]{2} - )|\Z)

I have problem reading datetime the read_ function

miChat <- rwa_read("miChat.txt", tz="America/Lima") %>%
filter(!is.na(author)) %>% # remove messages without author
filter(!text == "" )#selecciona mensajes de texto

dim(miChat)
[1] 25523 6

the datetime column misses the 24 hour (am / pm) timestamp.
Once the file is read the time column has been transformed into a 12 hour datetime column without the time stamp (am / pm) is remplaced by 15 (seconds) in all rows

x1 <- head(miChat$time,150)
x1
[1] "2020-05-09 08:04:15 -05" "2020-05-09 08:07:15 -05" "2020-05-09 08:08:15 -05"
[4] "2020-05-09 08:11:15 -05" "2020-05-09 08:13:15 -05" "2020-05-09 08:17:15 -05"
[7] "2020-05-09 09:44:15 -05" "2020-05-09 10:02:15 -05" "2020-05-09 10:13:15 -05"
[10] "2020-05-09 10:17:15 -05" "2020-05-10 01:04:15 -05" "2020-05-10 07:25:15 -05"
[13] "2020-05-10 07:38:15 -05" "2020-05-10 08:06:15 -05" "2020-05-10 08:12:15 -05"
[16] "2020-05-10 08:14:15 -05" "2020-05-10 09:11:15 -05" "2020-05-10 09:34:15 -05"
[19] "2020-05-10 09:41:15 -05" "2020-05-10 09:42:15 -05" "2020-05-10 09:45:15 -05"
[22] "2020-05-10 09:50:15 -05" "2020-05-10 09:50:15 -05" "2020-05-10 10:15:15 -05"
[25] "2020-05-10 10:41:15 -05" "2020-05-10 10:53:15 -05" "2020-05-10 11:15:15 -05"
[28] "2020-05-10 11:27:15 -05" "2020-05-10 12:02:15 -05" "2020-05-10 12:11:15 -05"
[31] "2020-05-10 02:10:15 -05" "2020-05-10 03:01:15 -05" "2020-05-10 03:52:15 -05"
[34] "2020-05-10 05:01:15 -05" "2020-05-10 05:03:15 -05" "2020-05-10 05:05:15 -05"
[37] "2020-05-10 05:41:15 -05" "2020-05-10 06:25:15 -05" "2020-05-10 06:33:15 -05"
[40] "2020-05-10 06:41:15 -05" "2020-05-10 06:47:15 -05" "2020-05-10 06:48:15 -05"
[43] "2020-05-10 06:52:15 -05" "2020-05-10 08:08:15 -05" "2020-05-10 08:10:15 -05"
[46] "2020-05-10 09:15:15 -05" "2020-05-10 09:17:15 -05" "2020-05-10 09:43:15 -05"
[49] "2020-05-10 09:44:15 -05" "2020-05-10 09:46:15 -05" "2020-05-10 09:50:15 -05"
[52] "2020-05-10 09:52:15 -05" "2020-05-10 09:52:15 -05" "2020-05-10 09:54:15 -05"
[55] "2020-05-10 09:54:15 -05" "2020-05-10 09:55:15 -05" "2020-05-10 09:56:15 -05"
[58] "2020-05-10 09:56:15 -05" "2020-05-10 09:58:15 -05" "2020-05-10 09:59:15 -05"
[61] "2020-05-10 10:01:15 -05" "2020-05-10 10:05:15 -05" "2020-05-10 10:06:15 -05"
[64] "2020-05-10 10:08:15 -05" "2020-05-10 10:09:15 -05" "2020-05-10 10:09:15 -05"
[67] "2020-05-10 10:11:15 -05" "2020-05-10 10:12:15 -05" "2020-05-10 10:13:15 -05"
[70] "2020-05-10 10:14:15 -05" "2020-05-10 10:16:15 -05" "2020-05-10 10:19:15 -05"
[73] "2020-05-10 10:19:15 -05" "2020-05-10 10:20:15 -05" "2020-05-10 10:20:15 -05"
[76] "2020-05-10 10:20:15 -05" "2020-05-10 10:23:15 -05" "2020-05-10 10:23:15 -05"
[79] "2020-05-10 10:25:15 -05" "2020-05-10 10:26:15 -05" "2020-05-10 10:27:15 -05"
[82] "2020-05-10 10:27:15 -05" "2020-05-10 10:28:15 -05" "2020-05-10 10:28:15 -05"
[85] "2020-05-10 10:30:15 -05" "2020-05-10 10:33:15 -05" "2020-05-10 10:36:15 -05"
[88] "2020-05-10 10:36:15 -05" "2020-05-10 10:37:15 -05" "2020-05-10 10:37:15 -05"
[91] "2020-05-10 10:38:15 -05" "2020-05-10 10:38:15 -05" "2020-05-10 10:39:15 -05"
[94] "2020-05-10 10:39:15 -05" "2020-05-10 10:40:15 -05" "2020-05-10 10:40:15 -05"
[97] "2020-05-10 10:40:15 -05" "2020-05-10 10:41:15 -05" "2020-05-10 10:41:15 -05"
[100] "2020-05-10 10:42:15 -05" "2020-05-10 10:44:15 -05" "2020-05-10 10:45:15 -05"
[103] "2020-05-10 10:46:15 -05" "2020-05-10 10:48:15 -05" "2020-05-10 10:49:15 -05"
[106] "2020-05-10 10:51:15 -05" "2020-05-10 10:52:15 -05" "2020-05-10 10:52:15 -05"
[109] "2020-05-10 10:52:15 -05" "2020-05-10 10:52:15 -05" "2020-05-10 10:53:15 -05"
[112] "2020-05-10 10:53:15 -05" "2020-05-10 10:53:15 -05" "2020-05-10 10:54:15 -05"
[115] "2020-05-10 10:54:15 -05" "2020-05-10 10:55:15 -05" "2020-05-10 10:55:15 -05"
[118] "2020-05-10 10:56:15 -05" "2020-05-10 10:57:15 -05" "2020-05-11 08:50:15 -05"
[121] "2020-05-11 08:56:15 -05" "2020-05-11 08:59:15 -05" "2020-05-11 10:23:15 -05"
[124] "2020-05-11 10:24:15 -05" "2020-05-11 10:25:15 -05" "2020-05-11 10:31:15 -05"
[127] "2020-05-11 10:32:15 -05" "2020-05-11 10:32:15 -05" "2020-05-11 10:45:15 -05"
[130] "2020-05-11 10:49:15 -05" "2020-05-11 10:55:15 -05" "2020-05-11 10:59:15 -05"
[133] "2020-05-11 11:00:15 -05" "2020-05-11 11:03:15 -05" "2020-05-11 11:07:15 -05"
[136] "2020-05-11 11:09:15 -05" "2020-05-11 11:10:15 -05" "2020-05-11 11:12:15 -05"
[139] "2020-05-11 11:16:15 -05" "2020-05-11 11:33:15 -05" "2020-05-11 12:47:15 -05"
[142] "2020-05-11 12:48:15 -05" "2020-05-11 12:59:15 -05" "2020-05-11 12:59:15 -05"
[145] "2020-05-11 01:13:15 -05" "2020-05-11 04:48:15 -05" "2020-05-11 04:58:15 -05"
[148] "2020-05-11 05:24:15 -05" "2020-05-11 07:47:15 -05" "2020-05-11 09:21:15 -05"

This is original miChat.txt file open in block of notes.....

9/5/2020 8:04 p. m. - Eduardo Camargo: https://youtu.be/Y6XCrVOUXN4
Al final un Congreso Populista??? Again???
9/5/2020 8:07 p. m. - Jose Luis Olivas: Totalmente populista
9/5/2020 8:08 p. m. - Veronica Leon: oh no!🤦‍♀️
9/5/2020 8:11 p. m. - Eduardo Camargo: Hay que buscar a Richard Rubio Gariza para hacerlo reaccionar..!?
9/5/2020 8:13 p. m. - Carlos Sabogal Marmanillo: Richard ya es un reaccionario de los israelitas
9/5/2020 8:17 p. m. - Héctor Cortez F: Se eliminó este mensaje
9/5/2020 8:54 p. m. - Jose Luis Olivas:
9/5/2020 9:44 p. m. - Edita: Disculpen...pero quiénes son???🤦🏻‍♀🤦🏻‍♀
9/5/2020 9:51 p. m. - Antonio Magino:
9/5/2020 10:02 p. m. - Jose Luis Olivas: En la foto son El Niño Terrible de la Bombonera e icono de la U, Roberto Challe y Perico Leon icono aliancista que fallecio hoy
9/5/2020 10:13 p. m. - Edita: Ohh..q penita...QDP...gracias José Luis...😔
9/5/2020 10:17 p. m. - Jose Luis Olivas: 😉👍
9/5/2020 11:38 p. m. - Antonio Magino:
10/5/2020 1:04 a. m. - Carlos Sabogal Marmanillo: 👏🏽👏🏽👏🏽👏🏽buenazas 👍🏾👍🏾👍🏾👍🏾🙂
10/5/2020 7:04 a. m. - Lourdes Rodas Entel:
10/5/2020 7:24 a. m. - Veronica Leon:
10/5/2020 7:25 a. m. - Gino Garibotto Sandoval: Feliz día de la madre a todas las compañeras del grupo, que pásenlo bonito, dentro de lo posible. 💓🌷🌹💐🌻🌼🌸
10/5/2020 7:38 a. m. - Carlos Sabogal Marmanillo: Soy *Carlos Sabogal *

I don't know how to solve this problem....

Timestamp Format Option

My timestamp is parsed incorrectly since the format is MM/dd/yy, hh:mm
I think you should add an option to parse with different timestamp format.

Time Conversion not working properly

When I read a text file using the rwa_read, I get a warning telling me that the Time Conversion did not work properly. I have looked at the text file in detail and it doesn't seem different from other text files that I am working with. When imported, the file has 39882 observations of 9 variables.
I'm not sure what I am doing incorrectly. A snip of the text file showing the dates is attached.
Thank you.

Add new time formats

Hi Johannes!

It's Pablo, we exchanged emails some time ago regarding an issue I was facing with numbers I didn't have saved on my phone! I've faced two time formats which are not included in the package and had to be specify with the format parameter and that you may consider adding.

format = "yyyy-MM-dd, HH:mm:ss" Example: "2016-11-25, 18:50:48"
format = "dd/MM/yy, HH:mm" Example: "21/07/18 5:27 p. m." I didn't manage to read the 'a. m.' and 'p. m.' properly for this one.

On the other hand, I've just updated to version 0.2.1 and I'm getting Error in fetch(key) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.6/Resources/library/rwhatsapp/help/rwhatsapp.rdb' is corrupt when running ?rwa_read

Congrats again on this great package! Good job!

Cbind all - argument lengths

I tried working with messages from two groups. The rwa_read() worked well with one group whereas with the second group I was getting this message.

Error in cbind_all(x) : Argument 2 must be length 4382, not 4381

The export process was exactly the same for both. Any thoughts.

Unidentified Emojis

Many of the emojis are not identified, for example, <U + 0001F913> <U + 0001F914> <U + 0001F644>

Time Conversion error

"Warning message:
In rwa_read(history) :
Time conversion did not work correctly. Provide a custom format..."

The abovementioned error message occurs when the script is run and in spite of cloning the Github Repo following the updates for date-time issues.

South African data looks as follows:

[2019/11/21, 13:58:48] Zach Wolpe: thanks for the great info 👏🏻 really helpful 👌🏻 perhaps the downsizing of the research sector is an opportunity as they need to streamline their research budget
[2019/11/21, 13:59:15] Zach Wolpe: if you have some free time we'd love to meet up & get your perspective
[2019/11/21, 14:10:45] Sipho Gwabanda: Definitely, so guys keeping their jobs are having to take on a lot more coverage. Which is a fantastic opportunity for you guys.

Thanks for the assistance!

emoji on Mac

Emoji would not show up using Rstudio 1.2.5001 with R 3.6.1 on Mac Os.

Thank you

Chat Androide

Hola, exporto el chat de whatsapp de Android me de da error de formato

"18/12/19 15:57 - MARCELO gomez: Estoy de acuerdo!",
"18/12/19 15:58 - Débora Valeska: Yo te voto 🙋🏽",
"18/12/19 16:04 - Lopez W EsCristO: Yo pienso lo mismo")

Show in New WindowClear OutputExpand/Collapse Output
Time conversion did not work correctly. Provide a custom format or add an issue at

chat <- rwa_read("chat00 .txt", format= "dd/MM/yyyy ' ' HH:mm ")

Another Timeformat issue

Hello,
when extracting the whatsapp database and using whatsapp viewer, it seems to save the messages in the following format:

01.04.2015 - 16:00:27; ME: Und funktioniert der Toaster auch? :) Hattest du sonst heute noch Vorlesungen?
01.04.2015 - 16:06:34; he: Ne zum Glück nicht aber recht muss ich mir wirklich anhören😁

which gives the time conversion error when using format = "dd.MM.yyyy - HH:mm:ss". Is there an error from my part?

Great package and thanks for the help!

Time conversion

rwa_read function did not work for iranian (persian) time format.

Date time format for French conversations

I encountered an issue to parse WhatsApp data with French locale.

Warning message:
In rwa_read(x = "Discussion WhatsApp avec XXX.txt") :
  Time conversion did not work correctly. Provide a custom format or add an issue at www.github.com/JBGruber/rwhatsapp.

This is what French data look like:

26/09/2019 à 22:39 - Les messages envoyés dans cette discussion et les appels sont désormais protégés avec le chiffrement de bout en bout. Appuyez pour plus d'informations.
26/09/2019 à 22:39 - XXX: Salut :)
26/09/2019 à 22:39 - AB: Salut XXX :)

I tried to specified format = "dd/MM/yyyy à hh:mm" (and others) in rwa_read() but it did not work...
Do you have an idea of the correct format to use?
If found, maybe it could be added to the default formats in rwa_parse_time().

Multi-line chats (i.e., linebreaks) not properly accounted for

Sorry to bother you again, tried to figure it out myself to provide a pull request, but I'm not that skilled.

Problem: is.na(author) filters out rows where !is.na(text)

Sometimes, this persists for multiple rows, so my initial (crude, I apologize) solution doesn't work. I also tried to make it work for the emojis, and definitely haven't

fix_newline_messages <- function(parsed_chat){
    
    for (row in 1:length(parsed_chat$author)) {
        
        prev <- row-1
        if (is.na(parsed_chat$author[row]) & !is.na(parsed_chat$text[row])) {
            #Fix Text, split newline with ";"
            parsed_chat$text[prev] <- paste0(parsed_chat$text[prev], "; ", parsed_chat$text[row])

            #Fix Author as well if you want...although not preferred for ease of filtering
            #parsed_chat$author[row] <- parsed_chat$author[prev]
        }

        if (is.na(parsed_chat$author[row]) & !is.na(parsed_chat$emoji[row])) {
            #Fix Emoji
            parsed_chat$emoji[prev] <- append(parsed_chat$emoji[prev], parsed_chat$emoji[row])
            #Fix Emoji Name
            parsed_chat$emoji_name[prev] <- append(parsed_chat$emoji_name[prev], parsed_chat$emoji_name[row])

        }
    }
    return(parsed_chat)
}

Basically, the emojis thing doesn't work, and I tried with str_sub but that was obviously wrong as well.

Thanks so much for the help by the way.

Here's some cases to copy-paste, hopefully it saves like 2.2 seconds ;)

chat %>% filter(is.na(author) & !is.na(text))
chat %>% filter(is.na(author) & !emoji == "NULL")

Datetime doesn't parse

Hi there,

myfile='myfile.txt'
Timestamp.Format <- "%Y-%m-%d, %I:%M %p"
chat <- rwa_read(myfile, tz="Canada/Eastern", format=Timestamp.Format)

I'm feeding the raw whatsapp text file, of course, and ending up with the time column as NA. I've verified that my timestamp format is correct, based on other parsers succeeding for this part. They end up with different linecounts, so I can't just simply join them.

I'm pretty sure that out of all the parsers, yours gets the highest accuracy with respect to properly pulling message text, sender, and not failing due to messages with carriage returns.
Just need the time to work!

Thanks!

jbgruber / rwhatsapp Goto Github PK

rwhatsapp's Introduction

Hi there 👋

rwhatsapp's People

Contributors

Stargazers

Watchers

Forkers

rwhatsapp's Issues

chat0.txt

chat1.txt

test.Rmd

Recommend Projects

Recommend Topics

Recommend Org

Jobs

`chat0.txt`

`chat1.txt`

`test.Rmd`