Start by watching an 8-min YouTube video!

5/14/2024: GPT-4o becomes default. Nov. 1, 2023: (v0.98.2): Generate a comprehensive EDA report. Oct 28, 2023 (v0.98): Ask questions about the code, result, error, or statistics! Upload a second file. Oct 23, 2023 (v0.97): GPT-4 becomes the default. Using ggplot2 is now preferred. Consectitive data manipulation is enabled.
See GitHub for source code, bug reports, and instructions to install RTutor as an R package. As a small startup, we are open to partnerships with both academia and industry. We can do demos and seminars via Zoom if time permits.
Also try, a more general platform for analyzing data through chats. Multiple files with different formats. Python support.

Quick start:

  • Explore the data at the EDA tab first. Then start with simple requests such as distributions, basic plots. Gradually add complexity.
  • The default model is now GPT-4 Turbo, which is slower and expensive, but more accurate. In the same session, previous questions and code chunks become the context for your new request. For example, you can simply say "Change background color to white" to refine the plot generated by the previous chunk. You can also clean your data step by step.
  • To analyze a new dataset, or to start over, click the Reset button first.
  • Prepare and clean your data in Excel first. Name columns properly. ChatGPT tries to guess the meaning of column names, even if they are abbrievated.
  • RTutor can only analyze traditional statistics data, where rows are observations and columns are variables. For complex data, try
  • Once uploaded, your data is automatically loaded into R as a data frame called df. You do NOT need to ask RTutor to load data. Check if the data types of the columns are correct. Change if needed, especially when numbers are used to code for categories.
  • An additional file can be uploaded as df2 to be analyze togehter. To use it, you must specify 'df2' in your prompts.
  • Use the Q&A box to ask questions about the code, result, or error messages. You can ask for methods to use or develop a plan.
  • Before sending your request to OpenAI, we do prompt engineering based on the uploaded data. We add "Generate R code" to the beginning, and append something like "Use the df data frame. Note that highway is numeric, ..." afterward. If you are not using any data (plot a function or simulations), choose "No data" from the Data dropdown.
  • Your data is not sent to OpenAI. Nor is it stored in our webserver after the session. If you explain the background of the data and the meaning of the columns, you can ask general questions like asking a clueless statistician.
  • Be skeptical. The generated code can be logically wrong even if it produces results without error.



Default dataset: df

2nd dataset: df2 (Must specify, e.g. 'create a piechart of X in df2.')