Cogram—a Codex-powered tool that stops you from needing to constantly look up data science code (main reason I lose patience with data science myself!), letting you "focus on the problems that matter".
This demo looks quite compelling, and I like the low-key UX approach taken - type prompt in plain comments...It will be interesting to explore how they can support/automate the more routine tasks and allow us to use our brain to actually better think about questions & data...
Cogram is a coding assistant for data scientists that makes coding faster and easier. Cogram can write code from plain-language instructions or complete entire blocks of code. Cogram makes use of the Codex language model created by OpenAI.
How does Cogram work?
Cogram processes the context the user provides in the form of code or instructions in plain language, and then uses the OpenAI Codex language model to generate matching source code. The source code is returned to the user in their Notebook or the Cogram SQL app.
Cogram integrates with Jupyter Notebook, supporting the Python programming language. Cogram also supports SQL generation from plain language in the standalone Cogram SQL app. If you have integrations in mind, we’d love to hear from you at email@example.com.
The Codex language model used by Cogram was evaluated on a test-set of coding problems specified in plain language. The model generated the correct code for 30% of problems given a single attempt and for up to 70% of problems with a sampling strategy [Chen et al, 2021, Evaluating Large Language Models Trained on Code, arXiv:2107.03374].
How to best use Cogram
Cogram works especially well when you give it context: in an empty Jupyter Notebook, start out by writing a few lines of code manually before beginning to use Cogram. It also helps to split your workflow into smaller subproblems, and using meaningful names for functions, classes and variables.
In Jupyter Notebook, Cogram uses the content of your notebook above the current cursor position as context. In the Cogram SQL app, you can provide the database schema and an instruction of what you’d like your query to achieve as context.
Cogram makes use of the Codex language model developed by OpenAI. Codex was trained on large amounts of publicly available source code and natural language. An overview of the Codex model is available at here. A technical description is provided by Chen et al, 2021, Evaluating Large Language Models Trained on Code, arXiv:2107.03374.
In rare cases, Cogram can recite verbatim from the corpus of publicly available code and natural language that OpenAI’s Codex model was trained on. Typically, this happens when no or very little context is provided to Cogram. The Codex training set includes publicly available code repositories subject to licensing, such as the GNU General Public License (GPL). As a result, in the rare cases that Cogram recites verbatim from the training corpus, generated code may also be subject to licensing requirements. An investigation of verbatim recitation of the Codex model, in the context of the GitHub Copilot tool, is available here. Verbatim recitation was found in less than 0.1% of test cases. Code generated by Cogram should be vetted, just like code written by a human.
We use telemetry to improve Cogram. All data is transmitted and stored securely, anonymised, and never shared with other users or other companies. We collect information such as the size of the context, how many suggestions are requested, what settings are used, and whether or not a suggestion is accepted. At no point do we inspect, store, or process, the textual content of the context.
The context that the user provides is processed by Cogram and transmitted anonymously to the Codex API. The Codex API is provided by OpenAI and OpenAI may use the context to improve Codex. Details of how OpenAI uses the context are available here.
Can GitHub Copilot introduce insecure code in its suggestions?
There’s a lot of public code in the world with insecure coding patterns, bugs, or references to outdated APIs or idioms. When GitHub Copilot synthesizes code suggestions based on this data, it can also synthesize code that contains these undesirable patterns. This is something we care a lot about at GitHub, and in recent years we’ve provided tools such as Actions, Dependabot, and CodeQL to open source projects to help improve code quality. Similarly, as GitHub Copilot improves, we will work to exclude insecure or low-quality code from the training set. Of course, you should always use GitHub Copilot together with testing practices and security tools, as well as your own judgment.
Does GitHub Copilot produce offensive outputs?
The technical preview includes filters to block offensive words and avoid synthesizing suggestions in sensitive contexts. Due to the pre-release nature of the underlying technology, GitHub Copilot may sometimes produce undesired outputs, including biased, discriminatory, abusive, or offensive outputs. If you see offensive outputs, please report them directly to firstname.lastname@example.org, so that we can improve our safeguards. GitHub takes this challenge very seriously and we are committed to addressing it with GitHub Copilot.
How will advanced code generation tools like GitHub Copilot affect developer jobs?
Bringing in more intelligent systems has the potential to bring enormous change to the developer experience. We expect this technology will enable existing engineers to be more productive, reducing manual tasks and helping them focus on interesting work. We also believe that GitHub Copilot has the potential to lower barriers to entry, enabling more people to explore software development and join the next generation of developers.
How is the data that GitHub Copilot collects used?
In order to generate suggestions, GitHub Copilot transmits part of the file you are editing to the service. This context is used to synthesize suggestions for you. GitHub Copilot also records whether the suggestions are accepted or rejected. This telemetry is used to improve future versions of the AI system, so that GitHub Copilot can make better suggestions for all users in the future. In the future we will give users the option to control how their telemetry is used. More information about our use of telemetry can be found here.
Is the transmitted data secure?
All data is transmitted and stored securely. Access to the telemetry is strictly limited to individuals on a need-to-know basis. Inspection of the gathered source code will be predominantly automatic, and when humans read it, it is specifically with the aim of improving the model or detecting abuse.
Will my private code be shared with other users?
No. We use telemetry data, including information about which suggestions users accept or reject, to improve the model. We do not reference your private code when generating code for other users.
Privacy & Data Protection
Please see the GitHub Copilot telemetry terms and About GitHub Copilot Telemetry. More information on how GitHub processes and uses personal data is available in our Privacy Statement.
Why is the technical preview restricted, and not available to everyone?
GitHub Copilot requires state-of-the-art AI hardware. During the technical preview, we are offering GitHub Copilot to a limited number of testers for free. When we launch a commercial product, we will make it available as broadly as possible.
Will there be a paid version?
If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale.
What development environments are supported?
We currently support Visual Studio Code, Neovim, and JetBrains IDEs like PyCharm and IntelliJ IDEA.
Click to Action
Stop searching for code or looking up docs. Focus on the problems that matter.