Chat with your data in your languages.


Start by watching a 10-min tutorial on YouTube!

Nov. 9: Switching to the new GPT-4 Turbo model! Nov. 1: (v0.98.2): Just upload your data, RTutor can generate a comprehensive EDA report. Oct 28 (v0.98): Ask questions about the code, result, error, or statistics in general! Upload a second file. Oct 23 (v0.97): GPT-4 becomes the default. Using ggplot2 is now preferred. Consectitive data manipulation is enabled and tracked.


Also try Chatlize.ai, a general platform for analyzing data through chats. Multiple files with different formats. Python support.


Quick start:

  • Explore the data at the EDA tab first. Then start analyzing the data using simple requests such as distributions, basic plots, or simple models. Gradually add complexity by further refinement or adding variables.
  • The default model is now GPT-4, which is slower and expensive, but more accurate. With that, previous questions and code chunks become the context for your new request. For example, you can simply say "Change background color to white." to refine the plot generated by the previous chunk. You can also process your data step by step across code chunks. Go back to any previous chunk and continue from there.
  • Prepare and clean your data in Excel first! Name columns properly. ChatGPT tries to guess the meaning of column names, even abbrievated ones. RTutor can only analyze traditional statistics data, where rows are observations and columns are variables. For complex data, try https://chatlize.ai.
  • Once uploaded, your data is automatically loaded into RTutor as a data frame called df. Check if the data types of the columns are correct. Change if needed, especially when numbers are used to code for categories. Data types make a big difference in analyses and plots!
  • A second file can be uploaded as df2. To use this file, you have to specify it in your prompts. You can merge it with the first file.
  • Use the chat box to ask questions about the code, result, error, or statistics in general.
  • Before sending your request to OpenAI, we add "Generate R code" before it, and append something like "Use the df data frame. Note that highway is numeric, ..." afterward. If you are not using any data (plot a function or simulations), choose "No data" from the Data dropdown.
  • Your data is not sent to OpenAI. Nor is it stored in our webserver after the session. Before asking general questions, explain the background and the the columns, like emailing a clueless statistician.
  • Be skeptical. The generated code can be logically wrong even if it produces results without error.

              
Results:

                



Default dataset: df


2nd dataset: df2 (Must specify, e.g. 'create a piechart of X in df2.')